AI Trains On Kids' Photos Even When Parents Use Strict Privacy Settings 33
An anonymous reader quotes a report from Ars Technica: Human Rights Watch (HRW) continues to reveal how photos of real children casually posted online years ago are being used to train AI models powering image generators -- even when platforms prohibit scraping and families use strict privacy settings. Last month, HRW researcher Hye Jung Han found 170 photos of Brazilian kids that were linked in LAION-5B, a popular AI dataset built from Common Crawl snapshots of the public web. Now, she has released a second report, flagging 190 photos of children from all of Australia's states and territories, including indigenous children who may be particularly vulnerable to harms. These photos are linked in the dataset "without the knowledge or consent of the children or their families." They span the entirety of childhood, making it possible for AI image generators to generate realistic deepfakes of real Australian children, Han's report said. Perhaps even more concerning, the URLs in the dataset sometimes reveal identifying information about children, including their names and locations where photos were shot, making it easy to track down children whose images might not otherwise be discoverable online. That puts children in danger of privacy and safety risks, Han said, and some parents thinking they've protected their kids' privacy online may not realize that these risks exist.
From a single link to one photo that showed "two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural," Han could trace "both children's full names and ages, and the name of the preschool they attend in Perth, in Western Australia." And perhaps most disturbingly, "information about these children does not appear to exist anywhere else on the Internet" -- suggesting that families were particularly cautious in shielding these boys' identities online. Stricter privacy settings were used in another image that Han found linked in the dataset. The photo showed "a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating" during the week after their final exams, Han reported. Whoever posted that YouTube video adjusted privacy settings so that it would be "unlisted" and would not appear in searches. Only someone with a link to the video was supposed to have access, but that didn't stop Common Crawl from archiving the image, nor did YouTube policies prohibiting AI scraping or harvesting of identifying information.
Reached for comment, YouTube's spokesperson, Jack Malon, told Ars that YouTube has "been clear that the unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse." But Han worries that even if YouTube did join efforts to remove images of children from the dataset, the damage has been done, since AI tools have already trained on them. That's why -- even more than parents need tech companies to up their game blocking AI training -- kids need regulators to intervene and stop training before it happens, Han's report said. Han's report comes a month before Australia is expected to release a reformed draft of the country's Privacy Act. Those reforms include a draft of Australia's first child data protection law, known as the Children's Online Privacy Code, but Han told Ars that even people involved in long-running discussions about reforms aren't "actually sure how much the government is going to announce in August." "Children in Australia are waiting with bated breath to see if the government will adopt protections for them," Han said, emphasizing in her report that "children should not have to live in fear that their photos might be stolen and weaponized against them."
From a single link to one photo that showed "two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural," Han could trace "both children's full names and ages, and the name of the preschool they attend in Perth, in Western Australia." And perhaps most disturbingly, "information about these children does not appear to exist anywhere else on the Internet" -- suggesting that families were particularly cautious in shielding these boys' identities online. Stricter privacy settings were used in another image that Han found linked in the dataset. The photo showed "a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating" during the week after their final exams, Han reported. Whoever posted that YouTube video adjusted privacy settings so that it would be "unlisted" and would not appear in searches. Only someone with a link to the video was supposed to have access, but that didn't stop Common Crawl from archiving the image, nor did YouTube policies prohibiting AI scraping or harvesting of identifying information.
Reached for comment, YouTube's spokesperson, Jack Malon, told Ars that YouTube has "been clear that the unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse." But Han worries that even if YouTube did join efforts to remove images of children from the dataset, the damage has been done, since AI tools have already trained on them. That's why -- even more than parents need tech companies to up their game blocking AI training -- kids need regulators to intervene and stop training before it happens, Han's report said. Han's report comes a month before Australia is expected to release a reformed draft of the country's Privacy Act. Those reforms include a draft of Australia's first child data protection law, known as the Children's Online Privacy Code, but Han told Ars that even people involved in long-running discussions about reforms aren't "actually sure how much the government is going to announce in August." "Children in Australia are waiting with bated breath to see if the government will adopt protections for them," Han said, emphasizing in her report that "children should not have to live in fear that their photos might be stolen and weaponized against them."
People overestimate the influence of their pics (Score:4, Interesting)
Re: (Score:1)
Re: People overestimate the influence of their pic (Score:1)
You probably shouldn't be putting pictures of your kid on the internet period. If it's on the internet, there's a good chance rsilvergun has fapped to it and catalogued it already.
Re: People overestimate the influence of their pi (Score:1)
It is way too dangerous to put photographs of your little children online
The safest way to have photos of your children, is to keep them private, within your own collection and keep them on your own hard drives
Do not back those photos up, on your Google photos, since then they will be in the pool of shareable data
Re: (Score:2, Insightful)
"No big deal that people's kids end up in training data and can be found in the real world" --person who doesn't have kids
What next? No big deal that their kids' images end up in AI generated kiddie porn because their actual kids were not directly raped?
But scummy AI companies need everyone's data for free and that's really important for $reasons so everyone just stop worrying about it. AI is important, bro.
Re: (Score:3, Interesting)
Re: (Score:1)
You have no idea what difference it will make in their lives.
Btw, are they under 18?
Re: (Score:2)
> You have no idea what difference it will make in their lives.
It will make absolutely zero difference because an image occurring a single time in a five billion image training dataset doesn't do shit. They're in a lot more danger from a human photoshopping the image directly.
Re: (Score:1)
Re: (Score:2)
You would think that the AI companies would be more concerned about this. Their products already produce some very creepy stuff.
Re: (Score:3)
'So your kid is in the training data. It does NOT make it more likely that some. random will spit out images of your kid."
No, it just means your kid's data is in a permanent repository somewhere, one security breach away from being stolen.
big fines with pay outs will put an stop to this! (Score:2)
big fines with pay outs will put an stop to this!
Re: (Score:3)
This is a legal issue that needs working out asap (Score:2)
Re: (Score:2)
Re: (Score:2)
If AI can only look at people that opt in (Score:3)
If AI can only train on people who specifically opted in, then it's going to skew very heavily toward the "beautiful ones" because they are the ones who have already signed away their name, image, and likeness rights to work in the entertainment industry. Then a different subset of snowflakes will cry foul for not being represented.
Wont SOMEBODY think of the CHILDREN!!! (Score:2, Troll)
This whole story is fake-concern bullshit.
So you're telling me (Score:2)
Re: (Score:3, Funny)
No, they found your kid pics your mom posted.
Never.. (Score:1)
Re: (Score:2)
There's one picture of me on the Internet, so far as I am aware. But it is hopefully obscure enough it won't be copied and will eventually no longer exist, never having been indexed or scanned by a facial recognition engine.
Of course, that's only as good as the security on my government's ID databases, and I suspect they have zero difficulty sharing those between different parts of government anyway.
Regardless, the less public information there is about you out there, the less potential it will be inconven
Just all kinds of creepy (Score:2)
Put a picture on the public internet (Score:2)
Put a picture on the public internet, it gets looked at. News at 11:00. Oh, the parents were "careful"? What that means is not explained. Maybe it means that they didn't put the pic on a website, but just on Facebook, which is known for respecting it's users' privacy. /s
I don't see any problem with AI training on public data. Public is defined as "if anyone can see it on the Internet". If you don't want stuff seen by the public (or by an AI), then you have to put it behind a login, on a provider that actua
Get used to it (Score:2)
Soon everyone will have their own LLM bot and everyone will be training it on images, sound, video, etc.
I don't see a way this won't happen just like the cassette recorder and the camcorder had everyone documenting life with lots of random strangers included.