


Developer Builds Tool That Scrapes YouTube Comments, Uses AI To Predict Where Users Live (404media.co) 34
An anonymous reader quotes a report from 404 Media: If you've left a comment on a YouTube video, a new website claims it might be able to find every comment you've ever left on any video you've ever watched. Then an AI can build a profile of the commenter and guess where you live, what languages you speak, and what your politics might be. The service is called YouTube-Tools and is just the latest in a suite of web-based tools that started life as a site to investigate League of Legends usernames. Now it uses a modified large language model created by the company Mistral to generate a background report on YouTube commenters based on their conversations. Its developer claims it's meant to be used by the cops, but anyone can sign up. It costs about $20 a month to use and all you need to get started is a credit card and an email address.
The tool presents a significant privacy risk, and shows that people may not be as anonymous in the YouTube comments sections as they may think. The site's report is ready in seconds and provides enough data for an AI to flag identifying details about a commenter. The tool could be a boon for harassers attempting to build profiles of their targets, and 404 Media has seen evidence that harassment-focused communities have used the developers' other tools. YouTube-Tools also appears to be a violation of YouTube's privacy policies, and raises questions about what YouTube is doing to stop the scraping and repurposing of peoples' data like this. "Public search engines may scrape data only in accordance with YouTube's robots.txt file or with YouTube's prior written permission," it says.
The tool presents a significant privacy risk, and shows that people may not be as anonymous in the YouTube comments sections as they may think. The site's report is ready in seconds and provides enough data for an AI to flag identifying details about a commenter. The tool could be a boon for harassers attempting to build profiles of their targets, and 404 Media has seen evidence that harassment-focused communities have used the developers' other tools. YouTube-Tools also appears to be a violation of YouTube's privacy policies, and raises questions about what YouTube is doing to stop the scraping and repurposing of peoples' data like this. "Public search engines may scrape data only in accordance with YouTube's robots.txt file or with YouTube's prior written permission," it says.
Re: (Score:2)
"Each message should stand on its merits. Slashdot used to have mostly AC comments and things were good then."
No one cares about merit anymore. /. has long been dominated by tribalism, have you read a SuperKendall post?
And are you suggesting that things are not good now because of lack of AC comments? LOL
"There's no reason we need to have usernames associated with each message."
Yes there is.
And people aren't going to be doxed based on information in their posts, they're going to be doxed with software that
Re: (Score:2)
When AC was available, I wouldn't bother logging in because I didn't want the session cookie on every computer I use.
Re: (Score:1)
"There's no reason we need to have usernames associated with each message."
Yes there is.
Agree. If there are no consequences for what one posts, the board devolves into a haven for trolls. We have pseudonyms (although some use real names). To which becomes attached some reputation (for better or worse). Some may value that and regulate their behavior accordingly. That's how most social groups operate either on line or in meat space.
Re: (Score:2, Interesting)
Re: good (Score:1)
Re: good (Score:4, Insightful)
YouTube garbage v Slashdot garbage = GIGO (Score:2)
The AC is obviously mentally broken. What's your excuse for propagating its trash? Just because AC can't think, that makes you want to feed it more vacuum?
On the story, I have to confess that I have seen the YouTube comments. Calling the website YouTube-Tools is a sad joke. At least YouTube-Fools (low-hanging fruit) would be closer to "truth in advertising". GIGO and the YouTube comments are stinking garbage on their best days.
I sort of understand why the increasingly EVIL google keeps YouTube going. There
Re:good (Score:5, Interesting)
Half true.
Here's the thing. What youtube should have been doing from the start is not publishing the poster names unless they are moderators/admins on the channel, and give everyone else (including youtube chat) channel-specific usernames like google does with google docs. The user can then opt-in to personalizing the display of their name and icon on that channel. The channel owner always sees the account name so serial harassment can be blocked or hidden. For the sake of streaming, this has to be enabled per-channel, so most people won't bother.
As for "why would you ever post personal information in a youtube comment" , some people want to support their local creators, and some people like to relate to things.
I think it's dangerous to scrape data this way, and google's API's should be throttled to only being able to do API calls on your account and content you control. eg if I wanted to scrape all the comments posted by my viewers, I can, but I can't scrape the content posted outside my channel, and likewise youtube doesn't return that data past a certain point anyway (I used google's API's during some of the gamergate controversy to see what comments were posted on a video before comments were disabled on it, and indeed, it does have a limit even then, but it shouldn't be returning anything for videos that comments are turned off.)
Re:good (Score:5, Informative)
The SSO was never motivated by some altruistic radical simplification of web users login experiences, it was the next evolution of the portal craze where companies competed to be the entrance door to the web so they could monetize it. When you provide identity services to the world, you get to spy on some of what people do in private places that you wouldn't have known about otherwise.
Burying the lead (Score:1)
Burying the what? (Score:3)
Lede. Burying the lede.
And an LLM isn't learning from the YT comments, at least unless the geniuses at openAI are involved. If anyone will make another tragic AI error, it's Sam Altman.
And that's the story for you?
Re: (Score:2)
Lede. Burying the lede.
Pull! Bang! Whooosh! Finding the lede while totally missing the joke.
This is interesting ... the LLM may not be learning from You Tube comments, but it seems like the Slashdot readers have learned, from their training material, that "any thinking thing exposed to the equivalent of a billion years of reading YT comments tautologically means it’s brain is gone. Goo." is a comment to be taken seriously and criticized on that basis.
Or maybe it was intended seriously, and I'm wrong, in which case I'll a
Re: (Score:2)
The much bigger story is this is proof LLM based systems don’t have thinking capability nor anywhere near sentience because any thinking thing exposed to the equivalent of a billion years of reading YT comments tautologically means its brain is gone. Goo.
Yeah, but you gotta admit it’s one hell of a trainer to put you on the path the becoming One with your inner sarcastic bastard.
AIs tongue will be so sharp it’ll name its own codebase “razor cunt”. Humans fucking with it won’t stand a chance.
Re: Burying the lead (Score:2)
Is Robonia a Real Place (Score:2)
Is Robonia a real place, or is the AI doing AI things?
Also, I wonder how it does location much beyond continent. What happens when an Swede makes a post from Germany about American politics?
The real outrage in this is that MY tax dollars will probably be pissed away on this horseshit.
Re: (Score:2)
Agreed. A real threat needs more. No AI is going to place me within hundreds of miles of my location without data other than my post history. Altman-esque levels of hype.
You don't need an AI for that (Score:4, Funny)
For example using a computer model it's child's Play to figure out I am actually an agent of SMERSH.
Re: (Score:2)
SMERSH
Sexy Men Exposing Rear Seeking Humiliation ?
Re: (Score:2)
Now do Slashdot (Score:2)
Is there a version that works on Slashdot comments?
We could DOX everyone here.
Re: Now do Slashdot (Score:2)
Re: (Score:2)
Hey you can't scrape our stuff! (Score:3)
Says company that scrapes the entire internet
Do NOT make one for slashdot !! (Score:4, Funny)
The last thing I need is for people to know where my parent's basement is.
Re: (Score:3)
South Park did it first. (Score:1)
Didn't South Park have an episode in which a character was complaining that his own statements were being used against him?
robots.txt, LOL! (Score:1)
Re: robots.txt, LOL! (Score:2)
Sure, back when what it "protected" you against were search engine crawlers, its use was somewhat limited. You either benefited from search engines indexing your stuff, and if it was stuff you didn't want indexed, it probably shouldn't have been publicly accessible in the first place.
But that changed with LLMs, where being crawled is generally 100% to the disadvantage of the creator and publisher. LLM crawlers can and apparently do ignor
"predict" (Score:2)
If it already exists, it's not a prediction, it's a guess.