Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Privacy Youtube

AI Trains On Kids' Photos Even When Parents Use Strict Privacy Settings 33

An anonymous reader quotes a report from Ars Technica: Human Rights Watch (HRW) continues to reveal how photos of real children casually posted online years ago are being used to train AI models powering image generators -- even when platforms prohibit scraping and families use strict privacy settings. Last month, HRW researcher Hye Jung Han found 170 photos of Brazilian kids that were linked in LAION-5B, a popular AI dataset built from Common Crawl snapshots of the public web. Now, she has released a second report, flagging 190 photos of children from all of Australia's states and territories, including indigenous children who may be particularly vulnerable to harms. These photos are linked in the dataset "without the knowledge or consent of the children or their families." They span the entirety of childhood, making it possible for AI image generators to generate realistic deepfakes of real Australian children, Han's report said. Perhaps even more concerning, the URLs in the dataset sometimes reveal identifying information about children, including their names and locations where photos were shot, making it easy to track down children whose images might not otherwise be discoverable online. That puts children in danger of privacy and safety risks, Han said, and some parents thinking they've protected their kids' privacy online may not realize that these risks exist.

From a single link to one photo that showed "two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural," Han could trace "both children's full names and ages, and the name of the preschool they attend in Perth, in Western Australia." And perhaps most disturbingly, "information about these children does not appear to exist anywhere else on the Internet" -- suggesting that families were particularly cautious in shielding these boys' identities online. Stricter privacy settings were used in another image that Han found linked in the dataset. The photo showed "a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating" during the week after their final exams, Han reported. Whoever posted that YouTube video adjusted privacy settings so that it would be "unlisted" and would not appear in searches. Only someone with a link to the video was supposed to have access, but that didn't stop Common Crawl from archiving the image, nor did YouTube policies prohibiting AI scraping or harvesting of identifying information.

Reached for comment, YouTube's spokesperson, Jack Malon, told Ars that YouTube has "been clear that the unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse." But Han worries that even if YouTube did join efforts to remove images of children from the dataset, the damage has been done, since AI tools have already trained on them. That's why -- even more than parents need tech companies to up their game blocking AI training -- kids need regulators to intervene and stop training before it happens, Han's report said. Han's report comes a month before Australia is expected to release a reformed draft of the country's Privacy Act. Those reforms include a draft of Australia's first child data protection law, known as the Children's Online Privacy Code, but Han told Ars that even people involved in long-running discussions about reforms aren't "actually sure how much the government is going to announce in August." "Children in Australia are waiting with bated breath to see if the government will adopt protections for them," Han said, emphasizing in her report that "children should not have to live in fear that their photos might be stolen and weaponized against them."
This discussion has been archived. No new comments can be posted.

AI Trains On Kids' Photos Even When Parents Use Strict Privacy Settings

Comments Filter:
  • by aldousd666 ( 640240 ) on Tuesday July 02, 2024 @03:36PM (#64595839) Journal
    So your kid is in the training data. It does NOT make it more likely that some. random will spit out images of your kid. It certainly doesn't make it more likely for any person in particular to show up in a deep fake. It's just much ado about nothing. If your kids picture was in the newspaper, they can get into the training set that way, so this is moot.
    • by Anonymous Coward
      Until it's YOUR kid, then you'll want to take an exe to every AI you can get your hands on.
    • Re: (Score:2, Insightful)

      "No big deal that people's kids end up in training data and can be found in the real world" --person who doesn't have kids

      What next? No big deal that their kids' images end up in AI generated kiddie porn because their actual kids were not directly raped?

      But scummy AI companies need everyone's data for free and that's really important for $reasons so everyone just stop worrying about it. AI is important, bro.

      • Re: (Score:3, Interesting)

        by aldousd666 ( 640240 )
        I have 2 kids. Thanks. They're fine. And if they're in the training data, it won't make a lick of difference in their lives. I also happen to know a little about how diffusion works and dude(or dudette), just get over it. This is a non-issue.
        • You have no idea what difference it will make in their lives.

          Btw, are they under 18?

          • > You have no idea what difference it will make in their lives.

            It will make absolutely zero difference because an image occurring a single time in a five billion image training dataset doesn't do shit. They're in a lot more danger from a human photoshopping the image directly.

    • Social media groups, including youtube channels, should be private by default. Any by children cannot change their private settings until they're 16 unless approved by parents or guardians with warnings required for reading beforehand. This is why it best for everybody to avoid major US social medias cuz their abnormal sub-human big tech executives won't stop nuturing their pervert tendencies by violating people's privacies from growing up as incels. Even after amputating their limbs and genitals, they stil
    • by AmiMoJo ( 196126 )

      You would think that the AI companies would be more concerned about this. Their products already produce some very creepy stuff.

    • 'So your kid is in the training data. It does NOT make it more likely that some. random will spit out images of your kid."

      No, it just means your kid's data is in a permanent repository somewhere, one security breach away from being stolen.

  • big fines with pay outs will put an stop to this!

  • If you think they violated a law by training their AI on your kid's school picture or whatever then go ahead and sue them. If you win, great, we'll adapt to that as a shared reality, if you lose, then you'll need to a lot more choosy about when you allow people to take pictures of your kid. Either way, let's get some decisions made and move on with what we can do and get finished with legally defining it!
    • here here. people seem to think that these bots are like databases where you can just ask them to spit out an image of someone by name. That will work on famous people sometimes if there is enough duplicate or similar data in the set to do it, but if there's just a copy of your kids photos, even a hundred of them a year, you'll never see your kid in the output. even if you ask for them by name.
      • Definitely they are able to mimic the style of an artist, musician, or writer. I've asked ChatGPT for Dall-E for images that look like the Amiga artist I like name "M.A.D.E". It does a halfway decent job. That's not just a specific person, it's their art style that they likely trade on professionally. These bots can imitate or sometimes absolutely nail a given style or genre in art. It's clear that it's because it was trained on that stuff. Now the question that has to be answered is: was that fair game? Di
  • If AI can only train on people who specifically opted in, then it's going to skew very heavily toward the "beautiful ones" because they are the ones who have already signed away their name, image, and likeness rights to work in the entertainment industry. Then a different subset of snowflakes will cry foul for not being represented.

  • This whole story is fake-concern bullshit.

  • So you're telling me they found 190 pictures of children in a dataset of 5B pictures? shocking!
  • ...put a photo of my kids online. If they want to post their baby pictures when they turn 18, great. I can't imagine being an adult and having my entire childhood up there for any old person to see, completely without my permission.
    • There's one picture of me on the Internet, so far as I am aware. But it is hopefully obscure enough it won't be copied and will eventually no longer exist, never having been indexed or scanned by a facial recognition engine.

      Of course, that's only as good as the security on my government's ID databases, and I suspect they have zero difficulty sharing those between different parts of government anyway.

      Regardless, the less public information there is about you out there, the less potential it will be inconven

  • Brings to mind of being "harvested" without your knowledge to feed and grow some monolithic bloby abomination that may or may not begin to show some real semblance of sentience in the future. And this secret harvesting is done from pictures taken in happy, wholesome settings. Now imagine those behind this are now emboldened to collect DNA samples from various drink cups and hot dog wrappers from the trashcans at a fun fair and start creating franken abominations with whatever cloning/genetic modification t
  • Put a picture on the public internet, it gets looked at. News at 11:00. Oh, the parents were "careful"? What that means is not explained. Maybe it means that they didn't put the pic on a website, but just on Facebook, which is known for respecting it's users' privacy. /s

    I don't see any problem with AI training on public data. Public is defined as "if anyone can see it on the Internet". If you don't want stuff seen by the public (or by an AI), then you have to put it behind a login, on a provider that actua

  • Soon everyone will have their own LLM bot and everyone will be training it on images, sound, video, etc.
    I don't see a way this won't happen just like the cassette recorder and the camcorder had everyone documenting life with lots of random strangers included.

If you have a procedure with 10 parameters, you probably missed some.

Working...