Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI Government United States

US Launches Task Force To Open Government Data For AI Research (wsj.com) 14

An anonymous reader quotes a report from The Wall Street Journal: The Biden administration launched an initiative Thursday aiming to make more government data available to artificial intelligence researchers, part of a broader push to keep the U.S. on the cutting edge of the crucial new technology. The National Artificial Intelligence Research Resource Task Force, a group of 12 members from academia, government, and industry led by officials from the White House Office of Science and Technology Policy and the National Science Foundation, will draft a strategy for potentially giving researchers access to stores of data about Americans, from demographics to health and driving habits. They would also look to make available computing power to analyze the data, with the goal of allowing access to researchers across the country. The task force, which Congress mandated in the National Artificial Intelligence Initiative Act of 2020, is part of an effort across the government to ensure the U.S. remains at the vanguard of technological advancements.

Many researchers, particularly in academia, simply don't have access to these computational resources and data, and this is hampering innovation. One example: The Transportation Department has access to a set of data gathered from vehicle sensors about how people drive, said Erwin Gianchandani, senior adviser at the National Science Foundation and co-chairman of the new AI task force. "Because you have very sensitive data about individuals, there are challenges in being able to make that data available to the broader research community," he said. On the other hand, if researchers could get access, they could develop innovations designed to make driving safer. Census data, medical records, and other data sets could also potentially be made available for research by both private companies and academic institutions, officials said. They said the task force will evaluate how to make such data available while protecting Americans' privacy and addressing other ethical concerns.

This discussion has been archived. No new comments can be posted.

US Launches Task Force To Open Government Data For AI Research

Comments Filter:
  • Shouldn't they have to prove they've developed an Artificial Intelligence? I would be a little less excited about AI doing the job than a really good algorithm. I can't trust AI, but if an open algorithm is used (and applied 'as is'), I'd be more comfortable letting it process government data. At least I can depend on people being corrupt. Who knows about AI...
    • an Artificial Intelligence

      "An?" This is not Neuromancer; "AI" is a commodity - like stupidity.

    • by Okian Warrior ( 537106 ) on Saturday June 12, 2021 @11:56AM (#61480040) Homepage Journal

      Shouldn't they have to prove they've developed an Artificial Intelligence? I would be a little less excited about AI doing the job than a really good algorithm. I can't trust AI, but if an open algorithm is used (and applied 'as is'), I'd be more comfortable letting it process government data. At least I can depend on people being corrupt. Who knows about AI...

      I do a lot of AI research, and finding good training data is really hard.

      Here's a suggestion for anyone who wants to enter the field: pick a problem that is difficult for computers but that humans find easy, then try to find a dataset to train on, then think through the steps needed to solve the problem.

      Text can sometimes be difficult. I've been on Kaggle and seen text challenges where the data contains snips of HTML tags (not the entire tag, just snips here and there), Project Gutenberg has typos in text and encoding, there's no easy way to distinguish between narrative text and other forms (dictionary, poetry, or inventory lists) which are not narrative, online text from reviews or posts has a ton of abbreviations, mistypos, leet-speek, and has a tone and tenor that isn't representative of normal speech.

      The simplest image sources are probably the zipcode digit recognizing images, which are scanned and already hand-labelled so we know what the correct answer is... which is fine, except that for actual V1-style recognition you need greyscale and not binary (1=white, 0=black) images. You can get greyscale versions, but these are greyscale interpolations of the binary original scans! The data is shot through with quantization noise.

      An acquaintance was kind enough to send me a set of high-res topo images (example [duckduckgo.com]) from mars. The images are X-Y-Z, black-and-white with Z being the ground-level altitude. Craters are obvious in profile, the base of a crater is lower than the surrounding land (and mostly flat). (Mostly - some nuance applies.)

      Craters are circles, and a human has no problem identifying the location and size. Craters can overlap, and a human has no problem telling which crater came first, and whether it's old or young depending on the weathering of the edge.

      ...but just try to come up with an algorithm that detects circles that are invariant to position, size, completeness of edge (overlap), and thickness of edge.

      ...and isn't an heuristic specific to circles, and would apply to any other trained feature like our visual system does.

      I feel lucky to have the Mars image data to play with - having a conceptually simple problem with really good data helps eliminate a lot of proposed algorithms for how AI really works.

      But finding good data is surprisingly hard.

      (I'm well aware of the myriad AI data corpora online. Many have defects in some form or another, as listed.)

      (Apropos of nothing: I scraped slashdot for all comments score 3+, thinking it would have typos and usage representative of quality typing. It mostly is, but the narrative thread doesn't match well. Most comments are responses to other comments, and have missing conceptual bits that need the previous comment for context, and the tree-structure allows multiple responses missing the context and son on.)

    • Handing out confidential government data to anyone who wants it sounds like it would really help advertising companies make money. In the absense of evidence to the contrary you should assume that any time "Rich People" benefit from government action, it's deliberate.
      • by jythie ( 914043 )
        I see it as a 'devil in the details' thing. Over the years I've gotten to work with a bunch of government data sets and they can be really useful for AI work, but there are only certain sets that there is currently a procedure for obtaining. Some are pre-anonymized, which means someone on the government side has to do some work, others require meeting certain requirements (like HIPPA), which also require someone on the government side to do some work... thus there is a lot of data that could be useful but
  • It's not your data, it's the data of American citizens.
  • Doesn't pass the sniff test for me, anyway.
  • Facefirst, a company that wanted to implement vaccination status lists over its ubiquitous network of facial recognition reporting, would love to add this data to their "massive centrally managed database" of Americans' biometric data. Perfect for bolstering in their "watchlisting as a service" offering.
  • to provide a conduit to leak government info to marketers so it can be sold legally.
    • Sounds like a project desgined to provide a conduit to leak government info to marketers so it can be sold legally.

      Also sounds like a project designed to disseminate database information about the population and develop tools for analyzing it - creating a surveillance state.

      Something like the rules change Obama made on his way out, expanding access to raw surveillance from "NSA only and anonymizes it before feeding it to other agencies" to "NSA gives raw feeds to 16 other agencies". (This was alleged to be

      • yea that to. Some suspect evil intent, when the sure bet is incompetence. You often end up in the same place anyway.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...