A South Korean Chatbot Showed How Sloppy Tech Companies Can Be With User Data (slate.com) 11

Posted by EditorDavid on Sunday April 04, 2021 @07:55PM from the careless-whispers dept.

A "Science of Love" app analyzed text conversations uploaded by its users to assess the degree of romantic feelings (based on the phrases and emojis used and the average response time). Then after more than four years, its parent company ScatterLab introduced a conversational A.I. chatbot called Lee-Luda — which it said had been trained on 10 billion such conversational logs.

But because it used billions of conversations from real people, its problems soon went beyond sexually explicit comments and "verbally abusive" language: It also soon became clear that the huge training dataset included personal and sensitive information. This revelation emerged when the chatbot began exposing people's names, nicknames, and home addresses in its responses. The company admitted that its developers "failed to remove some personal information depending on the context," but still claimed that the dataset used to train chatbot Lee-Luda "did not include names, phone numbers, addresses, and emails that could be used to verify an individual." However, A.I. developers in South Korea rebutted the company's statement, asserting that Lee-Luda could not have learned how to include such personal information in its responses unless they existed in the training dataset. A.I. researchers have also pointed out that it is possible to recover the training dataset from the AI chatbot. So, if personal information existed in the training dataset, it can be extracted by querying the chatbot.

To make things worse, it was also discovered that ScatterLab had, prior to Lee-Luda's release, uploaded a training set of 1,700 sentences, which was a part of the larger dataset it collected, on Github. Github is an open-source platform that developers use to store and share code and data. This Github training dataset exposed names of more than 20 people, along with the locations they have been to, their relationship status, and some of their medical information...

[T]his incident highlights the general trend of the A.I. industry, where individuals have little control over how their personal information is processed and used once collected. It took almost five years for users to recognize that their personal data were being used to train a chatbot model without their consent. Nor did they know that ScatterLab shared their private conversations on an open-source platform like Github, where anyone can gain access.
What makes this unusual, the article points out, is how the users became aware of just how much their privacy had actually been compromised. "[B]igger tech companies are usually much better at hiding what they actually do with user data, while restricting users from having control and oversight over their own data."

And "Once you give, there's no taking back."

A South Korean Chatbot Showed How Sloppy Tech Companies Can Be With User Data

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 11 Comments Log In/Create an Account

Comments Filter:

âoeExposedâ??? (Score:2)

by Frosty Piss ( 770223 ) * writes:

So this chatbot âoeexposedâ personal data *already* âoeexposedâ on the Intertubes...
- Re: (Score:3)
  
  by AmiMoJo ( 196126 ) writes:
  
  It's not clear where they got this training data from, but often it is either scraped from websites or from things like email archives.
  While much of it might have been public that doesn't mean it doesn't need sanitizing first. Just because someone is mentioned by name in a public setting doesn't mean you want their name to be part of the training data.
- Re: Exposed??? (Score:1)
  
  by anachronous diehard ( 1169155 ) writes:
  
  No, not previously exposed. Per TFA, the training set was more than 10 billion chat logs which people sent to the Science of Love app and paid about $4.50 each for an analysis of whether the counterparty had romantic feelings for them.
A CS bedtime story (Score:3)

by NoNonAlphaCharsHere ( 2201864 ) writes: on Sunday April 04, 2021 @09:02PM (#61237012)

A.I. researchers have also pointed out that it is possible to recover the training dataset from the AI chatbot. So, if personal information existed in the training dataset, it can be extracted by querying the chatbot.

Once upon a time, a luser asked me "can you give me a list of all the item numbers that aren't in the system?", and I said "sure, but it's going to take the lifetime of the universe to print out".

More "personal public information". (Score:1)

by Pierre Pants ( 6554598 ) writes:

Unless they somehow obtained a LOT of truly personal correspondence.
"Github is an open-source platform" (Score:2)

by doragasu ( 2717547 ) writes:

No, it isn't.
Imagine The Day (Score:2)

by MrKaos ( 858439 ) writes:

When the discussion you overhear in a cafe is "You put your *real* identity data online - Are You Crazy!?"
meh (Score:1)

by worldofsimulacra ( 4734477 ) writes:

My real personal data has been easy to find online since about 1997; I've had the same 2 or 3 usernames across about 30 different platforms since 1999, and I can count on one hand the passwords I've used with slight variations of symbols and numbers added, or not. Literally no one has ever messed with me, probably because I'm a nobody and I have nothing anyone would want. The only theft that's ever happened to me was some gamer managed to buy 30 dollars in Runescape gold via my bank account somehow, which b
- Re: (Score:2)
  
  by ImprovOmega ( 744717 ) writes:
  
  I only ever give *actual* personal information to sites where it is legally required (government or financial basically), though I've had to give my real address when I get stuff shipped to me. I've got it so hashed up that I started receiving AARP mailers in my 30's (those are meant for folks 55+). So my "information" is getting passed around, but it's pretty dirty. My birthdate must be somewhere between 1950 and 1985, but good luck divining the real one among all of the fakes.
Security is important (Score:1)

by Mayketor ( 7684108 ) writes:

This article exposes a growing concern, especially among those who live their personal life on the Web. If you're concerned about privacy breaches while dating online, I recommend using websites from https://seniordatingxp.com/milf-dating-sites/ [seniordatingxp.com], as they make sure that every service on their list has top-notch security. You can find a girlfriend quite easily online nowadays, so it makes sense to choose the best options to avoid data leaks.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A South Korean Chatbot Showed How Sloppy Tech Companies Can Be With User Data (slate.com) 11

A South Korean Chatbot Showed How Sloppy Tech Companies Can Be With User Data More Login

A South Korean Chatbot Showed How Sloppy Tech Companies Can Be With User Data

âoeExposedâ??? (Score:2)

Re: (Score:3)

Re: Exposed??? (Score:1)

A CS bedtime story (Score:3)

More "personal public information". (Score:1)

"Github is an open-source platform" (Score:2)

Imagine The Day (Score:2)

meh (Score:1)

Re: (Score:2)

Security is important (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot