Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Privacy The Internet

EU Recommends Slashing Search Data Retention 93

Wayland writes "The European Union's Article 29 Working Group has completed its PDF report on data protection and search engines. The group recommends that search engines only be allowed to hold onto search data for six months. 'To hang onto data for longer, search engine operators will need to show that such data is "strictly necessary" to offer the service. Google and others have long said that they need to retain data in order to refine search results, prevent click fraud, and launch new services like spell check (which, in Google's case, was built from user search data). In addition, the data that is kept will need to be guarded more closely. The working group concluded that IP addresses could be used to identify individuals; if not by the search engine itself, then by law enforcement or after a subpoena.'"
This discussion has been archived. No new comments can be posted.

EU Recommends Slashing Search Data Retention

Comments Filter:
  • by Anonymous Coward on Tuesday April 08, 2008 @03:40AM (#22997626)
    not the referrer field. referrer is used to identify which page linked to the visited page; not who visited. and I use RefControl [mozilla.org] just because I am a paranoid. but I see your point.
  • by iamacat ( 583406 ) on Tuesday April 08, 2008 @03:48AM (#22997656)
    What we need is an alternative search engine located in a country with very strict privacy laws, permissive copyright laws and outside of reach of most US subpoenas (except ones that meet that country's standards). If it becomes popular with security-paranoid geeks, it has a shot at 0.01% of Google's money, which should be enough to sustain a medium-sized company. Any recommendations?
  • by Anonymous Coward on Tuesday April 08, 2008 @04:11AM (#22997756)
    It's already used. Last year's data is extremely helpful in predicting this year's searches, and debugging any changes you've made before a season hits. If this law passes, expect the quality of search engines in the EU to tank, as there will be no way to compare year against year or predict how things will be affected if a search system changes (and they always change).

    It's one thing to require anonymization of data (which I find reasonable). However it's quite another to say you've got to delete the data entirely. How well would financial markets work with only 6 months of history? Search engines won't fare much better.

    Another thing to think about; This will contribute to rare languages dying out the internet. Consider a language spoken by %0.1 of the population (many such languages exist). With a limited time history, the amount of data usable for training and evaluation will be limited. Compared to the "big" languages, it may not be enough data to make a high quality system. Languages with a 10% or higher share will have 100x the data (and will be the ones that politicians care about enough to make sure they work). Of course, politicians are not statisticians, so they likely won't understand data sparsity issues like this. The smaller languages are already at a great disadvantage on the internet compared to more popular languages, and this will only make the issue worse.
  • by Sander_ ( 55929 ) on Tuesday April 08, 2008 @04:44AM (#22997886)
    It is all great and whatnot, as been mentioned above. Is it only me that sees this as the state wants to control who can access information? There have been several attempts to legislate that providers should retain data for the benefit of a state, and now they are also trying to legislate how much information should be freely accessible?

    -A
  • by Hal_Porter ( 817932 ) on Tuesday April 08, 2008 @05:45AM (#22998124)
    The grass is always greener on the other side I suppose. And libel/privacy laws don't seem such a good idea when you see the downside of them. Neither does criminalizing people like David Irving.

    Any Americans fancy a citizenship-swap?
  • by BlueParrot ( 965239 ) on Tuesday April 08, 2008 @06:03AM (#22998220)
    I think the main difference is the system by which people come to power.

    In most European countries ( and in effect the EU itself ) there is a plethora of political parties that are likely to come into power. With so many competing parties there is a large chance at least one of your competitors will point out your shady behavior, and it is thus easier to try to outdo them in positive ways rather than malicious ones.

    In contrast, in the US the entire electoral system more or less favors a two party system, where the winner takes it all. In such a system you gain a lot by attacking a single enemy. If you're a democrat all you need to do is to break things for the republicans, and vice versa. Such tactics don't work if you have 5-6 potential candidates because if you try to fuck over 4 of your opponents you run the risk that they will conspire against you. The american system is very easily corrupted since once you have influence with the two main parties there is little to stop you, while gaining control of a 6-7 party parliament without anybody crying foul is more tricky.

    Simply put, in the EU political parties compete for power, in the US there is more of a cartel or monopoly. You can also notice these trends if you look at individual EU countries. Britain has more of a one party system, and consequentially their politics are a lot more "american" than many other European ones.

    It is also rather possible that the EU is merely better because it is relatively new at the moment, and that with time it will become corrupted as third parties learn to manipulate it. Time will tell...
  • by teh kurisu ( 701097 ) on Tuesday April 08, 2008 @06:25AM (#22998314) Homepage

    I always laugh when I hear Americans talk about 'liberals' as being left-wing, given that that particular ideology is generally regarded as being at the centre of the political spectrum in Europe.

    One of the things I notice on Slashdot is that there's a backlash whenever a government ever tries to legislate, especially when it's the EU trying to improve consumer protection - the general idea being that they should keep their collective noses out of other people's business.

    I find it odd that Americans (as Slashdotters predominantly are), whose society prides itself on being democratic, would rather take power away from their democratic institutions and hand it to undemocratic corporations. The free market theoretically exists to control the amount of power that a corporation can accumulate, but I've found that Slashdotters oppose state intervention even in instances where the free market does not operate properly (i.e. monopoly situations).

    It could be that this is because the US electoral system doesn't perform as it should. The usual example I use is the US Electoral College, where the presidential election is skewed by the first-past-the-post system used entirely out of context, and is provided for by the constitution. In cases where the electoral system is flawed, why should you trust a government any more than a corporation?

    The GP mentions the issue of EU countries' constitutions - I live in the UK where there is no constitution, and ultimate power is invested in parliament, which makes it much easier to dispose of anachronisms in our voting systems.

    Of course I might be on the wrong track entirely. It occurs to me that the most common sense I ever hear from politicians comes from two places: the UK House of Lords and the EU Commission - both unelected bodies. It's possible that politicians are more able to act in the public good when they don't have to worry about the next election.

  • by JeremyDuffy ( 1024241 ) on Tuesday April 08, 2008 @07:02AM (#22998502) Homepage
    Here's a better question. Why do they need the key for even 60 seconds let alone 6 months? They can serve up your results, store only search statistic information for the betterment of their services and not keep ANY personally identifying information at all! Seriously, does anyone know what possible reason they could have to store the information other than to profile you and sell you crap?
  • Re:RTFA, lemming (Score:4, Interesting)

    by Umuri ( 897961 ) on Tuesday April 08, 2008 @08:41AM (#22998984)
    I'll take a stab at your little conundrum here.
    While your post was very informative, the best I can tell it summarizes to is that google has no reason to keep individual IP data because such data is useless for anything other than marketing and selling to other people.

    So, with that in mind, and not taking a stance on whether it's still too personal even with a good reason, lets look at some data mining techniques.

    Say for example, you have a region of the midwest united states, the exact middle of the bible belt. For those unfamiliar with the term, that means a place where the christianity is high, and preached loudly, often, and to anyone within earshot, and the ability to be nonchristian is relatively low. But say you have this group of people, and a lot of their searches are of religious material. You could use them as sort of a "expert" group, giving a little more weight to their likes and dislikes as a whole to adjust pagerank for their area of study, religion. This allows for pages that may be far down the list, but accurate and factual, to be pulled up a bit so the rest of the world might find them, and if they truly are good, then they'll stay up there afterwards.

    If not, then the page will drop back down in rankings again and it will have earned the low rank it has.

    You could not do this without some form of IP/region tracking, and it increases with the accuracy you track IPs. If you track single people, you can get more meaningful data, for example, you can narrow your "expert" group to, for example, pastor brian, sister marian, and sister margarette, and leave out their neighbors druid matterson and buddhist huy ngyen.

    This decreases false positives from your expert group and also allows you to more refine where each person might have a good sense of judgement.

    That hopefully explains the IP section a bit.

    As for timestamps, I only have two theories about them, and both are equally likely.

    The first of which is the timestamps are used, in combination with the search terms, to help them optimize the load balancing they use. Since i'm sure they cycle systems onto and off the grid the internet uses, as the systems rebuild databases or do maintenance, you could use such data to tell for example, when you could most likely take the Yak-Yodeling server offline to re-do it's database and crawl pages, and have people get search results from a slightly out of date backup, and minimize the impact from it.

    The other option is that there are some results that are time sensitive. Without linking IP data to geographic data, if you notice that an ip range searches for "resturaunts" + dinner at a certain time of day, and you get a search for resturaunts, you might give preference for dinner selections at that time of day, because you could assume they are looking to go eat.

    Anyway I hope that clears up a bit on how such specific data is usable and important. Could it be usable in other forms that didn't identify IP? Probably. but it would serve no practical purpose, because as long as they have some system for converting an IP to a unique identifier to identify a group of searches, they will always have a way to reverse or bruteforce the originating IP, given the time and interest on the half of whoever wants it.

"And remember: Evil will always prevail, because Good is dumb." -- Spaceballs

Working...