Slashdot Log In
EU Recommends Slashing Search Data Retention
Posted by
Zonk
on Tuesday April 08, @03:18AM
from the tracks-in-the-snow dept.
from the tracks-in-the-snow dept.
Wayland writes "The European Union's Article 29 Working Group has completed its PDF report on data protection and search engines. The group recommends that search engines only be allowed to hold onto search data for six months. 'To hang onto data for longer, search engine operators will need to show that such data is "strictly necessary" to offer the service. Google and others have long said that they need to retain data in order to refine search results, prevent click fraud, and launch new services like spell check (which, in Google's case, was built from user search data). In addition, the data that is kept will need to be guarded more closely. The working group concluded that IP addresses could be used to identify individuals; if not by the search engine itself, then by law enforcement or after a subpoena.'"
Related Stories
Firehose:EU recommends slashing search data retention by Anonymous Coward
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.

Tracking and identifying a piece of data. (Score:4, Insightful)
In six months you can intermingle the data items so much there's no way of proving you're actually storing the data and you'd still have what you need of that data.
How does law track the identity line of a data item? Data has no memory and leaves no trace.
Reply to This
DataProtection Act (Score:5, Informative)
Briefly, so long as data is personally identifiable you must show that you are not retaining it longer than necessary. If I summarise or analyse data and remove information which makes it personally identifiable - names, addresses, telephone numbers, email accounts - then it is not covered.
IMHO the US stands in need of a Data Protection Act, as an amendment to the Constitution. The present Adnmninistration seems to be looking for ways of keeping track of its citizens which avoid the Constitution. Technically in Europe it is probably illegal to send personal data via GMail - because it is exporting it to a country that does not meet European standards for personal data protection.
Reply to This
Parent
Re:Tracking and identifying a piece of data. (Score:4, Informative)
'Personal' information is any information that can be linked to a person. This can be an (IP-)address, phone number, birth date and other data that is generally seen as being personal, but also information like the URL's visited by a person, or the e-mails sent to a person. The 6 months start counting as soon as a system no longer absolutely needs the data for its day-to-day operation.
As an example, http-logs showing which ip-address visited what URL can maximum be retained for 6 months. If you send out snail-mails to a bunch of subscribers, then you are obligated to delete the address of your subscriber maximum 6 months after he unsubscribes (or after he dies). If you still need the personal data (e.g. you need people's addresses to be able to send them invoices as long as they still have a contract with your company) then you are of course allowed to store that data. It also means that any statistics that you need to make on customer related data, will have to be made before that data is deleted, and the statistics cannot contain any information which would allow them to be tied to a person.
Another part of the data protection law mandates that a person has to be informed of every storage of his personal data, and has to right to look into that data and update it if there's errors in it.
All in all, the law ensures that Europeans can be pretty certain that their (online) privacy isn't invaded (as long as they surf only European websites).
Reply to This
Parent
Re: (Score:3, Informative)
(1) It would be making a law retro-active (with respect to historical documents)
(2) It is implicit in usenet that this information is being published and is made public (Ignorance is no excuse, one could say). Usenet i
Re: (Score:3, Informative)
This isn't SE-exclusive (Score:3, Insightful)
Reply to This
Re:This isn't SE-exclusive (Score:5, Insightful)
Reply to This
Parent
Re: (Score:2)
Re: (Score:2, Informative)
"strictly necessary" (Score:4, Insightful)
If that is the law to follow, they will make it "strictly necessary" by adding features using that data, I guess. Just making it a bit harder is a lot of lawmaking for little effect.
Reply to This
Re: (Score:2)
RTFA, lemming (Score:5, Insightful)
RTFA, lemming. The summary _again_ is inflammatory crap, yes, what else is new? But that's not what TFA says.
They're _not_ required to delete data completely, they're required to delete data that can identify you personally. Like IP, grouping between those searches, etc.
They do _not_ need that to refine their searches. If I search for, say, "Oracle auto-tuning", that's that. I expect the same result regardless of what my IP is, regardless of whether I searched for "WebSphere XA configuration" before, or "Fluffy tail buttplugs" or whatever. You can tune the search with just the search string. You don't need to track me for that.
_That_ is the friction between the EU and Google: that Google wants to keep that kind of identifiable information like the pair of IP and timestamp. Google has been playing bullshit handwaving games along the lines of "but we really need the IPs", then "but some people change IPs, so it won't identify them for ever", then "wait, would it be ok if we changed a bit or two of the IP?" along with a good helping of "but we'll keep it for 18 month before changing those bits anyway!"
And seeing Google protest at every step when they're told to stop tracking google, and, yes, exactly such bullshit fallacies as that they really need that IP to refine the search algorithm... is kinda funny. I guess "do no evil" was for when they were small and cuddly. Now that they're the 800 pound gorilla of the online advertising market, heh, turns out that they get as big a boner as any other PHBs out of trying to rape people's private data for a quick buck.
But, hey, I'm willing to be educated. _You_ tell me how deleting the IP information is gonna make search engines tank. Exactly which search algorithm relies on knowing my IP? No, seriously.
They can keep their statistic history for as long as they want to, but they can't keep your personal data. It's that simple, so let's stop handwaving strawman scenarios. They can (and should) keep information like "Shares of Moraelin Buttplugs Corp peaked at 1.50 Euro a share last year." But they have no reason to retain info like "Freddy Krueger lives on 22 Elm Street, and bought 2 shares of Dr Kevorkian's Suicide Clinic last year," just because he bought those 2 shares last year.
A financial advisor's or stock broker's job is to trade on the stock market. It's _not_ to collect your personal data and sell it to the highest bidder. It's not their job to data-mine your private information. It's that simple: stick to selling those shares.
Mind you, even for data mining, there's a fine line between information and trivia. Stuff like "which team won the most games last year" is information. You can make an informed prediction for this year based on it. Stuff like "which team won the most games on a Wednesday, in rain, under artificial light" is trivia.
Similarly, "people from Germany buy more economic games than those in the USA" is information. Stuff like "people living on odd numbered houses, and on streets whose name ends in a 'e', and are born on a rainy thursay, buy more economic games" is useless trivia.
"50% of the gamers are between 25 and 50 years old" is information. You can decide a target demographic based on that. "People born on a Tuesday the 14'th have the most gamers, at a whole 0.01% of the total" is trivia. Even if you figured out how to make games especially fit for people born on a Tuesday the 14'th, it's too thin a slice to individually bother with. Etc.
Going too deep into details, slices your data too thin, and produces meaningless trivia.
There simply is _no_ sane justification for the kinds of personal information that especially the USA PHB's try to collect. Other than spamming you personally
Reply to This
Parent
Re:RTFA, lemming (Score:4, Interesting)
While your post was very informative, the best I can tell it summarizes to is that google has no reason to keep individual IP data because such data is useless for anything other than marketing and selling to other people.
So, with that in mind, and not taking a stance on whether it's still too personal even with a good reason, lets look at some data mining techniques.
Say for example, you have a region of the midwest united states, the exact middle of the bible belt. For those unfamiliar with the term, that means a place where the christianity is high, and preached loudly, often, and to anyone within earshot, and the ability to be nonchristian is relatively low. But say you have this group of people, and a lot of their searches are of religious material. You could use them as sort of a "expert" group, giving a little more weight to their likes and dislikes as a whole to adjust pagerank for their area of study, religion. This allows for pages that may be far down the list, but accurate and factual, to be pulled up a bit so the rest of the world might find them, and if they truly are good, then they'll stay up there afterwards.
If not, then the page will drop back down in rankings again and it will have earned the low rank it has.
You could not do this without some form of IP/region tracking, and it increases with the accuracy you track IPs. If you track single people, you can get more meaningful data, for example, you can narrow your "expert" group to, for example, pastor brian, sister marian, and sister margarette, and leave out their neighbors druid matterson and buddhist huy ngyen.
This decreases false positives from your expert group and also allows you to more refine where each person might have a good sense of judgement.
That hopefully explains the IP section a bit.
As for timestamps, I only have two theories about them, and both are equally likely.
The first of which is the timestamps are used, in combination with the search terms, to help them optimize the load balancing they use. Since i'm sure they cycle systems onto and off the grid the internet uses, as the systems rebuild databases or do maintenance, you could use such data to tell for example, when you could most likely take the Yak-Yodeling server offline to re-do it's database and crawl pages, and have people get search results from a slightly out of date backup, and minimize the impact from it.
The other option is that there are some results that are time sensitive. Without linking IP data to geographic data, if you notice that an ip range searches for "resturaunts" + dinner at a certain time of day, and you get a search for resturaunts, you might give preference for dinner selections at that time of day, because you could assume they are looking to go eat.
Anyway I hope that clears up a bit on how such specific data is usable and important. Could it be usable in other forms that didn't identify IP? Probably. but it would serve no practical purpose, because as long as they have some system for converting an IP to a unique identifier to identify a group of searches, they will always have a way to reverse or bruteforce the originating IP, given the time and interest on the half of whoever wants it.
Reply to This
Parent
Re: (Score:3, Insightful)
We're SOL (Score:2)
And the way it's looking, law makers are dragging their feet on this type of thing just so the government has this massive grey
Re: (Score:3, Funny)
Privacy-conscious search engines? (Score:3, Interesting)
Reply to This
Re: (Score:2, Funny)
Cryptonomicon, by Neal Stephenson.
Re: (Score:2)
Re:Privacy-conscious search engines? (Score:4, Informative)
For example, Facebook was immune from investigation into what they were doing with personal data. The established a London office (to sell adverts to EU people) and then they were investigated.
(Of course, Google could still keep the data of everyone else. It depends if it's easy for them to do this -- it probably is.)
Reply to This
Parent
How come EU is always more consumer-protectionist (Score:5, Insightful)
EU seems to protect its citizens and consumers from the rapacious hungry corporates more than US, as beacon of freedom, does.
Whether it is kicking Microsoft's ass all the way back to US, or
Forcing Apple to unblock its iTunes service in France, or
Cheaper medicine and medicare that keeps the private insurers at bay, or
Privacy laws and zealous courts (in germany) that force the government to disband its secret spyware projects, or
Libel laws that force newspapers to pay huge penalties to citizens for reckless lie mongering about their private lives, or
Airplane laws that force airlines to pay financial compensation to passengers for ditching them, or
Laws that jail CEOs and even the board for criminal conviction of corporations,...
While US zealously preserves corporate rights and treats them above human beings, allowing and authorizing torture, etc.
How come the so-called stiff-lip society values human freedoms so much, when the so-called Beacon of Democracy incarcerates its own citizens without trial.
And that too many EU nations don't even have constitutions that embody something like our First Amendment, etc.
Reply to This
Re: (Score:3, Insightful)
Re:How come EU is always more consumer-protectioni (Score:4, Interesting)
I always laugh when I hear Americans talk about 'liberals' as being left-wing, given that that particular ideology is generally regarded as being at the centre of the political spectrum in Europe.
One of the things I notice on Slashdot is that there's a backlash whenever a government ever tries to legislate, especially when it's the EU trying to improve consumer protection - the general idea being that they should keep their collective noses out of other people's business.
I find it odd that Americans (as Slashdotters predominantly are), whose society prides itself on being democratic, would rather take power away from their democratic institutions and hand it to undemocratic corporations. The free market theoretically exists to control the amount of power that a corporation can accumulate, but I've found that Slashdotters oppose state intervention even in instances where the free market does not operate properly (i.e. monopoly situations).
It could be that this is because the US electoral system doesn't perform as it should. The usual example I use is the US Electoral College, where the presidential election is skewed by the first-past-the-post system used entirely out of context, and is provided for by the constitution. In cases where the electoral system is flawed, why should you trust a government any more than a corporation?
The GP mentions the issue of EU countries' constitutions - I live in the UK where there is no constitution, and ultimate power is invested in parliament, which makes it much easier to dispose of anachronisms in our voting systems.
Of course I might be on the wrong track entirely. It occurs to me that the most common sense I ever hear from politicians comes from two places: the UK House of Lords and the EU Commission - both unelected bodies. It's possible that politicians are more able to act in the public good when they don't have to worry about the next election.
Reply to This
Parent
Re:How come EU is always more consumer-protectioni (Score:5, Interesting)
In most European countries ( and in effect the EU itself ) there is a plethora of political parties that are likely to come into power. With so many competing parties there is a large chance at least one of your competitors will point out your shady behavior, and it is thus easier to try to outdo them in positive ways rather than malicious ones.
In contrast, in the US the entire electoral system more or less favors a two party system, where the winner takes it all. In such a system you gain a lot by attacking a single enemy. If you're a democrat all you need to do is to break things for the republicans, and vice versa. Such tactics don't work if you have 5-6 potential candidates because if you try to fuck over 4 of your opponents you run the risk that they will conspire against you. The american system is very easily corrupted since once you have influence with the two main parties there is little to stop you, while gaining control of a 6-7 party parliament without anybody crying foul is more tricky.
Simply put, in the EU political parties compete for power, in the US there is more of a cartel or monopoly. You can also notice these trends if you look at individual EU countries. Britain has more of a one party system, and consequentially their politics are a lot more "american" than many other European ones.
It is also rather possible that the EU is merely better because it is relatively new at the moment, and that with time it will become corrupted as third parties learn to manipulate it. Time will tell...
Reply to This
Parent
Re:The state vs freedom of information (Score:4, Insightful)
It is not the state controlling access - it is the state, acting on my behalf, to ensure that large organizations (including the state itself) are not entitled to use my personal information against me. If you are not covered by such protection then anyone can use your information to do you untold damage and there is nothing you can do about it.
Reply to This
Parent