Researchers Find 'Anonymized' Data Is Even Less Anonymous Than We Thought (vice.com) 23
Corporations love to pretend that 'anonymization' of the data they collect protects consumers. Studies keep showing that's not really true. From a report: Last fall, AdBlock Plus creator Wladimir Palant revealed that Avast was using its popular antivirus software to collect and sell user data. While the effort was eventually shuttered, Avast CEO Ondrej Vlcek first downplayed the scandal, assuring the public the collected data had been "anonymized" -- or stripped of any obvious identifiers like names or phone numbers. "We absolutely do not allow any advertisers or any third party...to get any access through Avast or any data that would allow the third party to target that specific individual," Vlcek said. But analysis from students at Harvard University shows that anonymization isn't the magic bullet companies like to pretend it is.
Dasha Metropolitansky and Kian Attari, two students at the Harvard John A. Paulson School of Engineering and Applied Sciences, recently built a tool that combs through vast troves of consumer datasets exposed from breaches for a class paper they've yet to publish. "The program takes in a list of personally identifiable information, such as a list of emails or usernames, and searches across the leaks for all the credential data it can find for each person," Attari said in a press release. They told Motherboard their tool analyzed thousands of datasets from data scandals ranging from the 2015 hack of Experian, to the hacks and breaches that have plagued services from MyHeritage to porn websites. Despite many of these datasets containing "anonymized" data, the students say that identifying actual users wasn't all that difficult. "An individual leak is like a puzzle piece," Harvard researcher Dasha Metropolitansky told Motherboard. "On its own, it isn't particularly powerful, but when multiple leaks are brought together, they form a surprisingly clear picture of our identities. People may move on from these leaks, but hackers have long memories."
Dasha Metropolitansky and Kian Attari, two students at the Harvard John A. Paulson School of Engineering and Applied Sciences, recently built a tool that combs through vast troves of consumer datasets exposed from breaches for a class paper they've yet to publish. "The program takes in a list of personally identifiable information, such as a list of emails or usernames, and searches across the leaks for all the credential data it can find for each person," Attari said in a press release. They told Motherboard their tool analyzed thousands of datasets from data scandals ranging from the 2015 hack of Experian, to the hacks and breaches that have plagued services from MyHeritage to porn websites. Despite many of these datasets containing "anonymized" data, the students say that identifying actual users wasn't all that difficult. "An individual leak is like a puzzle piece," Harvard researcher Dasha Metropolitansky told Motherboard. "On its own, it isn't particularly powerful, but when multiple leaks are brought together, they form a surprisingly clear picture of our identities. People may move on from these leaks, but hackers have long memories."
Old news (Score:5, Informative)
I worked for a company called Maxpoint. Their whole spiel was combing through data sources to draw a picture of a user in order to target advertising. From that clear eyed perspective, I am not being a doomsayer when I tell you that privacy is dead. It is just a simple "water is wet" fact. The only thing that is going to ever get even a modicum of privacy back is a law that prohibits companies from being able to share data, even if you agreed to it in those multi-page legalese agreements. Even then, Maxpoint's thing was digital "zip codes". There is an incredible amount of data that can reliably be inferred about you from where you live.
It looks like the company has changed its name [valassis.com]
Make Your Identity Your Personal Property (Score:3, Insightful)
Re:Make Your Identity Your Personal Property (Score:5, Insightful)
There is also the point that intellectual property laws are not for ordinary people like you and me, or even the artists who create the content, but they are actually for the corporations.
Corporations write the laws, then they pay your politicians to pass them, then they use their economic power to pressure countries all over the world to adopt them, in the interests of "harmony" between jurisdictions.
That is how we have wound up with copyright terms that exceed the life of the author.
Re: (Score:2)
We could give personal data the same protection we give intellectual "property". You go to an aartist's show you can't make a recording or copy and sell it. But the artist can scan thee room, use facial identification software to identify who was there and who they were sitting/standing next to and sell it to anyone they want. Suppose your identity was your personal property and sharing it required a release just like any other intellectual "property".
If it's in a EULA, then it won't ever truly be effective (for end-users).
Re: (Score:2)
You know this idea goes way back; it's been examined by ethicists and privacy law experts for years, and the consensus seems to be that treating personal information as property is not really adequate to protect privacy.
The problem is that property is alienable (i.e., you can *sell* it, or rent it, or license it), but your *interest* in that information is inescapable. You can't really foresee what a vendor will do to you with your private data; in fact if we rely on *property rights* it's technically none
Re: (Score:2)
It's already illegal in the UK to de-anonymise data.
Doesn't mean that it doesn't happen, but the law is clear.
Re: (Score:2)
Sadly, that would be as effective as the US law that banned the export of PGP. . . and for the same reason. :-(
Re: (Score:2)
There are any number of leaks around the world that identify me as an articulate, cerebral tightwad with a Wikipedia:Amazon ratio of 50:1.
Maybe this personal factoid qualifies as "incredible" and maybe it doesn't.
Just imagine (Score:2)
how anonymous your BookFace or Google data is...
Feel like I've read this before (Score:2)
Didn't I read an article that said something like DOB and zip code alone could almost always identify you perfectly?
Re: (Score:2)
Didn't I read an article that said something like DOB and zip code alone could almost always identify you perfectly?
Depends on the population of your zip code I suppose. If you're in zip 79936 - with a population of ~114,000 - you are probably pretty anonymous. If you're in zip 99790 - with a population of 1 - I think the DOB is probably unnecessary ...
Re: (Score:2)
Who is "we"? (Score:2)
Re: (Score:2)
It comes up when assuring investors that these companies are not violating HIPAA, FIRPAA, or EU privacy laws.It also comes up when they are asked, with or without a subpoena, to provide personal tracking data for law enforcement. They avoid the responsibility of protecting the data, or tracking it robustly, by anonymizing it. Unfortunately, some of them anonymize it _after_ they've collected it, not during it: I had a fascinating chat with a company several years who only realized when we spoke that that ac
I have said this so many times (Score:1)
Don't give or sell my "anonymized" health data anywhere
Don't give or sell my "anonymized" car data anywhere
Don't give or sell my "anonymized" cell phone data anywhere
etc...
Companies don't give a solitary shit about user privacy.
Who is "we"? (Score:5, Informative)
Because anybody that actually looked into this has known for a long time. Anonymization breaks down when you have multiple anonymized data sets with the same people in them or at least significant overlap. This is neither new, nor is it surprising in any way.
AOL leak from 15 years ago (Score:2)
This has never been 'anonymous'.
Fresh from the NSS library of clueflueness (Score:1)
Re: (Score:1)
less that "we" thought (Score:2)
"Less than 'we' thought" for a value of "we" that entirely excludes anyone with a working brain and half a clue.
"Anonymized" has always meant "we don't think the bad guys can get there in just one step"—a motivated adversary with the perseverance to amass two or more leaks being entirely out of scope.