Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Privacy Businesses Your Rights Online

Researchers Find 'Anonymized' Data Is Even Less Anonymous Than We Thought (vice.com) 23

Corporations love to pretend that 'anonymization' of the data they collect protects consumers. Studies keep showing that's not really true. From a report: Last fall, AdBlock Plus creator Wladimir Palant revealed that Avast was using its popular antivirus software to collect and sell user data. While the effort was eventually shuttered, Avast CEO Ondrej Vlcek first downplayed the scandal, assuring the public the collected data had been "anonymized" -- or stripped of any obvious identifiers like names or phone numbers. "We absolutely do not allow any advertisers or any third party...to get any access through Avast or any data that would allow the third party to target that specific individual," Vlcek said. But analysis from students at Harvard University shows that anonymization isn't the magic bullet companies like to pretend it is.

Dasha Metropolitansky and Kian Attari, two students at the Harvard John A. Paulson School of Engineering and Applied Sciences, recently built a tool that combs through vast troves of consumer datasets exposed from breaches for a class paper they've yet to publish. "The program takes in a list of personally identifiable information, such as a list of emails or usernames, and searches across the leaks for all the credential data it can find for each person," Attari said in a press release. They told Motherboard their tool analyzed thousands of datasets from data scandals ranging from the 2015 hack of Experian, to the hacks and breaches that have plagued services from MyHeritage to porn websites. Despite many of these datasets containing "anonymized" data, the students say that identifying actual users wasn't all that difficult. "An individual leak is like a puzzle piece," Harvard researcher Dasha Metropolitansky told Motherboard. "On its own, it isn't particularly powerful, but when multiple leaks are brought together, they form a surprisingly clear picture of our identities. People may move on from these leaks, but hackers have long memories."

This discussion has been archived. No new comments can be posted.

Researchers Find 'Anonymized' Data Is Even Less Anonymous Than We Thought

Comments Filter:
  • Old news (Score:5, Informative)

    by Shotgun ( 30919 ) on Monday February 03, 2020 @03:26PM (#59686426)

    I worked for a company called Maxpoint. Their whole spiel was combing through data sources to draw a picture of a user in order to target advertising. From that clear eyed perspective, I am not being a doomsayer when I tell you that privacy is dead. It is just a simple "water is wet" fact. The only thing that is going to ever get even a modicum of privacy back is a law that prohibits companies from being able to share data, even if you agreed to it in those multi-page legalese agreements. Even then, Maxpoint's thing was digital "zip codes". There is an incredible amount of data that can reliably be inferred about you from where you live.

    It looks like the company has changed its name [valassis.com]

    • We could give personal data the same protection we give intellectual "property". You go to an aartist's show you can't make a recording or copy and sell it. But the artist can scan thee room, use facial identification software to identify who was there and who they were sitting/standing next to and sell it to anyone they want. Suppose your identity was your personal property and sharing it required a release just like any other intellectual "property".
      • by youngone ( 975102 ) on Monday February 03, 2020 @04:43PM (#59686728)
        Your idea is not a bad one, except for the fact that massively wealthy corporations can continue to do what they want with your intellectual property, and if you sue them they will bankrupt you with endless court proceedings.
        There is also the point that intellectual property laws are not for ordinary people like you and me, or even the artists who create the content, but they are actually for the corporations.
        Corporations write the laws, then they pay your politicians to pass them, then they use their economic power to pressure countries all over the world to adopt them, in the interests of "harmony" between jurisdictions.
        That is how we have wound up with copyright terms that exceed the life of the author.
      • We could give personal data the same protection we give intellectual "property". You go to an aartist's show you can't make a recording or copy and sell it. But the artist can scan thee room, use facial identification software to identify who was there and who they were sitting/standing next to and sell it to anyone they want. Suppose your identity was your personal property and sharing it required a release just like any other intellectual "property".

        If it's in a EULA, then it won't ever truly be effective (for end-users).

      • by hey! ( 33014 )

        You know this idea goes way back; it's been examined by ethicists and privacy law experts for years, and the consensus seems to be that treating personal information as property is not really adequate to protect privacy.

        The problem is that property is alienable (i.e., you can *sell* it, or rent it, or license it), but your *interest* in that information is inescapable. You can't really foresee what a vendor will do to you with your private data; in fact if we rely on *property rights* it's technically none

    • by Cederic ( 9623 )

      It's already illegal in the UK to de-anonymise data.

      Doesn't mean that it doesn't happen, but the law is clear.

      • by Shotgun ( 30919 )

        Sadly, that would be as effective as the US law that banned the export of PGP. . . and for the same reason. :-(

    • by epine ( 68316 )

      There is an incredible amount of data that can reliably be inferred about you from where you live.

      There are any number of leaks around the world that identify me as an articulate, cerebral tightwad with a Wikipedia:Amazon ratio of 50:1.

      Maybe this personal factoid qualifies as "incredible" and maybe it doesn't.

  • how anonymous your BookFace or Google data is...

  • Didn't I read an article that said something like DOB and zip code alone could almost always identify you perfectly?

    • Didn't I read an article that said something like DOB and zip code alone could almost always identify you perfectly?

      Depends on the population of your zip code I suppose. If you're in zip 79936 - with a population of ~114,000 - you are probably pretty anonymous. If you're in zip 99790 - with a population of 1 - I think the DOB is probably unnecessary ...

      • Even with 114,000 people you'd only have an expected collision rate of approximately (114000 people / 365 (days/year) / (80 years life expectancy) ) ~= 4 people per zipcode-birthday. Obviously some more and some less, but that's not terribly far to winnow down to a unique individual.
  • Who are these mysterious "we" people who thought that corporations collected your data "anonymously". Do these same people also believe in Santa Claus and the Easter Bunny?
    • It comes up when assuring investors that these companies are not violating HIPAA, FIRPAA, or EU privacy laws.It also comes up when they are asked, with or without a subpoena, to provide personal tracking data for law enforcement. They avoid the responsibility of protecting the data, or tracking it robustly, by anonymizing it. Unfortunately, some of them anonymize it _after_ they've collected it, not during it: I had a fascinating chat with a company several years who only realized when we spoke that that ac

  • Don't give or sell my "anonymized" health data anywhere
    Don't give or sell my "anonymized" car data anywhere
    Don't give or sell my "anonymized" cell phone data anywhere
    etc...

    Companies don't give a solitary shit about user privacy.

  • Who is "we"? (Score:5, Informative)

    by gweihir ( 88907 ) on Monday February 03, 2020 @04:44PM (#59686734)

    Because anybody that actually looked into this has known for a long time. Anonymization breaks down when you have multiple anonymized data sets with the same people in them or at least significant overlap. This is neither new, nor is it surprising in any way.

  • "Anonymized" search data, sufficient to track down specific individuals.

    This has never been 'anonymous'.
  • It's trivially easy to correlate data across multiple sources. Some universities do a terrific job of selling undergrad work as if it were a novel thesis. The sales job appears to be the better story here.
    • The way to combat the loss of privacy, incidentally, is to pollute the data with misinformation. Use misspellings, different email addresses, etc. This is very effective and the only tool individuals have.
  • "Less than 'we' thought" for a value of "we" that entirely excludes anyone with a working brain and half a clue.

    "Anonymized" has always meant "we don't think the bad guys can get there in just one step"—a motivated adversary with the perseverance to amass two or more leaks being entirely out of scope.

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...