Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Google to Anonymize Users' Search Data

Journal written by leamanc (961376) and posted by samzenpus on Thu Mar 15, 2007 06:31 AM
from the poof-you're-gone dept.
Google's official blog states they are on an effort to anonymize their search data after 18-24 months. After previously fighting turning over search data to the feds, it looks like they are striking another blow to the "think of the children" crowd. Any bets on whether MSN or Yahoo! will follow suit?
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • The real WTF is.. (Score:2, Interesting)

    by b100dian (771163) on Thursday March 15 2007, @06:34AM (#18360067)
    (http://b100dian.lx.ro/)
    ..the "off the record" button, in the first place!
  • Uhm (Score:3, Interesting)

    by giorgiofr (887762) on Thursday March 15 2007, @06:34AM (#18360069)
    All they have to do is erase the logs every day or just not keep them. It doesn't "take an effort". Anonymous proxies have been doing this for years.
    • Re:Uhm (Score:5, Insightful)

      by Rakishi (759894) on Thursday March 15 2007, @06:50AM (#18360147)
      And anonymous proxies do not need to make money or provide much of a service unlike google, logs are very useful for such things.
      [ Parent ]
    • All they have to do is erase the logs every day or just not keep them. It doesn't "take an effort". Anonymous proxies have been doing this for years.

      I know where you're coming from, but that would kinda fuck with their targetting advertising business model dontcha think?
      [ Parent ]
      • Re:Uhm by jacquesm (Score:3) Thursday March 15 2007, @06:53AM
        • Re:Uhm (Score:4, Insightful)

          by daeg (828071) on Thursday March 15 2007, @07:43AM (#18360481)
          I'm between the two extremes of agreeing with you and agreeing that data needs to be retained. As any of us who have taken a statistics class (or four) can tell you, you don't need access to the whole sample to provide accurate data. So, say, for instance, the Google engineers were working on a specific niche of the web, say, dog lovers. If I were designing something to better suit dog lovers, my first step would be pulling a report on the common search patterns of people that search for dog-related topics.

          Historical data that identifies a unique user is extremely useful. I do the same thing with our Intranet search and report tools. If I want to improve something, oftentimes the logs will give a very telling tale. (This accounting department employee searched for "expense", then "expense excel", then "expense spreadsheet", then "expense log", finally getting his document. I can then add the keywords 'excel' 'spreadsheet' to the actual document entry.) That said, you don't actually need to know who the unique user is, for all intents and research purposes, User5486734067 is just as useful as an IP+Cookie.
          [ Parent ]
          • Re:Uhm by jacquesm (Score:2) Thursday March 15 2007, @08:16AM
            • Re:Uhm by daeg (Score:2) Thursday March 15 2007, @09:13AM
              • Re:Uhm by jacquesm (Score:2) Thursday March 15 2007, @09:26AM
              • Re:Uhm by Rakishi (Score:2) Thursday March 15 2007, @10:10AM
          • Anonymous is just as scary by tim90402 (Score:1) Thursday March 15 2007, @10:33AM
          • 1 reply beneath your current threshold.
        • Re:Uhm by Rakishi (Score:2) Thursday March 15 2007, @10:05AM
    • 2 replies beneath your current threshold.
  • Mine already is (Score:3, Informative)

    by solevita (967690) on Thursday March 15 2007, @06:36AM (#18360085)
    Although I did have to install the AnonymizeGoogle Firefox plugin to get it.
    • Re:Mine already is (Score:5, Informative)

      by solevita (967690) on Thursday March 15 2007, @07:24AM (#18360345)
      Ignore that post above - I'm a moron. I meant to say CustomizeGoogle Firefox plugin .Get it here [customizegoogle.com].

      I guess that's what happens when you Slashdot before caffeine. I'm sorry.
      [ Parent ]
      • Re:Mine already is by number11 (Score:2) Thursday March 15 2007, @10:10AM
      • Re:Mine already is (Score:4, Informative)

        by solevita (967690) on Thursday March 15 2007, @08:40AM (#18361031)
        Your IP usually isn't the problem, especially in my case where my ISP sends it all through their regional proxy anyway. What CustomizeGoogle does is randomize your Google UID. Take another look at the recent AOL breach - people weren't suffering privacy loss due to their IP address, but rather because AOL gave each and every user a number that could be tracked through the system. Thanks to CustomizeGoogle, that won't happen to me and my searches.
        [ Parent ]
      • 1 reply beneath your current threshold.
  • How about (Score:2)

    by squoozer (730327) on Thursday March 15 2007, @06:36AM (#18360087)
    (http://www.crazysquirrel.com/index.jspx)

    anonymizing it straight away! That would be an even quicker solution to the problem.

  • 0 months? (Score:2)

    by pr0nbot (313417) on Thursday March 15 2007, @06:41AM (#18360093)
    Why not anonymise the data after zero months? Are they required by law not to?
    • Re:0 months? by Barny (Score:2) Thursday March 15 2007, @06:49AM
    • Re:0 months? (Score:5, Insightful)

      by cdrudge (68377) <cdrudgespam@v e r i z o n . net> on Thursday March 15 2007, @06:49AM (#18360137)
      (http://slashdot.org/)
      My guess is they don't do it immediately is because there is internal business value in mining the data. User patterns, length of stay, etc. After 18 or 24 months, the internal value has dropped significantly as things change quickly. I would have thought that the value would have dropped even quicker then that, say after 6 months or maybe a year.
      [ Parent ]
      • Re:0 months? by steelfood (Score:2) Thursday March 15 2007, @09:51AM
    • Re:0 months? by Rakishi (Score:2) Thursday March 15 2007, @06:52AM
      • 1 reply beneath your current threshold.
    • by xxxJonBoyxxx (565205) on Thursday March 15 2007, @07:30AM (#18360381)

      Why not anonymise the data after zero months?
      Because Google's primarily a media company, like NBC, only with much finer detail about what you want to see. Like any media company, Google finds demographic data incredibly valuable because it allows them to "connect" you with the "correct" advertisers. There's no way in hell Google would let people be completely anonymous; it goes against their business plan. (I'd also bet three years from now we'll find through some court case that backup tapes somewhere really extend "anonymous after 18 months" to 4-5 years.)
      [ Parent ]
    • Re:0 months? by ACMENEWSLLC (Score:1) Thursday March 15 2007, @10:36AM
    • 1 reply beneath your current threshold.
  • by Anonymous Coward on Thursday March 15 2007, @06:49AM (#18360143)
    Google should not be collecting any of that huge pile of information AT ALL, not just anonymising it after 18 months. As the AOL case showed, search queries can be used to identify individuals even after AOL anonymized them, so it's not IP addresses they are recording, it's PEOPLE.

    There is no need to collect the IP addresses of searchers that haven't opted in to Google's personalized search. There is no law, that requires it.

    There is no need to store the IP addresses of individual visitors to websites when Google analytics is used on a web page.

    There is no need to store IP addresses of pages delivered to adsense viewers. Clicks maybe for a short time to prevent click fraud, but viewers, no.

    None of this information should be recorded, and further the EU privacy directive should be enforced to ensure that none of that information is recorded. The law says we have privacy, Google should be forced to comply with that law.

    • Re:Shouldn't be collecting that info anyway by GweeDo (Score:3) Thursday March 15 2007, @07:26AM
      • No Consent (Score:4, Interesting)

        by Anonymous Coward on Thursday March 15 2007, @08:02AM (#18360663)
        Exactly, it's to Google's MONETARY benefit that they record this information. The EU Privacy law says THEY CANNOT RECORD MORE PERSONAL INFORMATION THAN IS NEEDED FOR A TRANSACTION. Now that it's clear that search data is personally identifiable, the EU Privacy law should be used to FORCE GOOGLE TO QUIT IT.

        "The moment you sent your request out over the internet in plain text to a third party (that is a corporation out to make money you know) you lost that."

        Not so, the law says we have to consent and we didn't consent!

        And what about when that party isn't Google? Google analytics is not on Google's site, it's embedded on third party sites, Google's adsense is on other people's site too. I didn't consent to handing my data to Google when I surfed to third parties site, Google took that data and recorded it in violation of EU privacy laws.

        This has also been sued for before resulting in Doubleclick backing down over exactly this issue.

        http://archives.cnn.com/2000/TECH/computing/01/28/ double.click.lawsuit.idg/ [cnn.com]

        "A California woman has filed suit against DoubleClick, accusing the U.S.-based online advertising company of unlawfully obtaining and selling consumers' personal information, according to a statement issued by her attorney's office."

        "Hariett M. Judnick filed the suit in Marin County Superior Court in California, on behalf of the "general public of the state of California," the statement said.
        The suit alleges that DoubleClick employs Internet cookies to identify users and track their movements on the Internet. The company tracks and records the sites an individual visits, as well as the information transmitted on the sites, such as names, ages, addresses, shopping patterns and financial information."

        [ Parent ]
      • Re:Shouldn't be collecting that info anyway by rtb61 (Score:2) Thursday March 15 2007, @08:09AM
    • Re:Shouldn't be collecting that info anyway by kalirion (Score:2) Thursday March 15 2007, @08:36AM
    • Re:Shouldn't be collecting that info anyway by mysticgoat (Score:2) Thursday March 15 2007, @09:24AM
      • 1 reply beneath your current threshold.
  • According to TFA (Score:5, Insightful)

    by ReallyEvilCanine (991886) on Thursday March 15 2007, @06:50AM (#18360149)
    (http://stuckinthecube.blogspot.com/)
    Google plan to make it "more anonymous". Like pregnancy, data either ARE anonymous or they ain't. You can't qualify an absolute, and "anonymous" is an absolute condition indicating lack of information.
  • It's there servers (Score:2, Troll)

    by tomstdenis (446163) <tomstdenisNO@SPAMgmail.com> on Thursday March 15 2007, @06:53AM (#18360167)
    (http://libtom.org/)
    Stop googling for "jihad death to american president" if you're worried about getting caught.

    I should point out that your google query goes over plaintext HTTP so anyone inbetween can eavesdrop on your queries.

    Tom
    • Re:It's there servers (Score:5, Insightful)

      by solevita (967690) on Thursday March 15 2007, @07:08AM (#18360243)

      Stop googling for "jihad death to american president" if you're worried about getting caught.
      You're correct. The only people that demand privacy are those up to no good. How about I come over to your house later, sit in your bed for a bit, go through your draws and your phone records, take some pictures of you and your friends, ask the neighbours some pressing questions?

      If you've got nothing to hide, you should have no problem with this.
      [ Parent ]
      • Re:It's there servers by tomstdenis (Score:2) Thursday March 15 2007, @07:30AM
        • Re:It's there servers (Score:5, Interesting)

          by Dunbal (464142) on Thursday March 15 2007, @07:49AM (#18360519)
          Ah, the out of context argument. My house is private by the definition that I have locks on the doors and blinds on the windows.

                Funny - my computer is in my house, behind locks and blinds too. Hey Google's computers also are behind lock and key, and they even have security guards and alarm systems. I don't ever remember giving Google permission to disclose any information shared between them and I - oh and heaven forbid I go around giving away the information Google found for me - I'd get sued!

                Why would the whole world automatically be party to the information Google and I shared one evening? My computer sent that information to a specific internet address, and the answer came back specifically to my computer.

                Not so out of context...
          [ Parent ]
          • Re:It's there servers (Score:4, Insightful)

            by tomstdenis (446163) <tomstdenisNO@SPAMgmail.com> on Thursday March 15 2007, @08:08AM (#18360705)
            (http://libtom.org/)
            This is why it pays to have a modicum of computer knowledge.

            Assuming you're not trolling...

            When you send a query to google, it goes over the "internet" in the clear. That is, not encrypted. Anyone who can see it can read it. Well who can read it? Turns out a lot of people. Between me and google are probably 10 different boxes. 5 of which are just my ISPs routers. The other five are boxes on other networks, not even related to Google.

            There is no inherant requirement for privacy like there is with telephones (maybe their ought to be one). But that said, you're giving your data to Google, willingly no less. That gives them every right to record it. You gave them permission by using their service, I guess you never read their TOS [google.ca] which is your fault, not theirs. Think about the analogy in the real world. This is like you handing your drivers license to every stranger you meet, then getting upset when some of them write it down.

            If you don't want your assets [IP, location, name, platform, etc] leaked to Google you should use an anonymous proxy.

            Tom
            [ Parent ]
          • Re:It's there servers by everphilski (Score:2) Thursday March 15 2007, @08:51AM
          • 1 reply beneath your current threshold.
        • Re:It's there servers by QCompson (Score:2) Thursday March 15 2007, @08:54AM
        • Re:It's there servers by tomstdenis (Score:2) Thursday March 15 2007, @08:43AM
          • 1 reply beneath your current threshold.
        • 1 reply beneath your current threshold.
      • Re:It's there servers by Dunbal (Score:3) Thursday March 15 2007, @07:44AM
      • Re:It's there servers by sgholt (Score:1) Thursday March 15 2007, @12:18PM
      • Re:It's there servers by grolschie (Score:2) Thursday March 15 2007, @07:14PM
      • 1 reply beneath your current threshold.
    • Re:It's there servers by garcia (Score:3) Thursday March 15 2007, @07:32AM
    • 2 replies beneath your current threshold.
  • IAO (Score:1)

    by lundqvist (1070102) on Thursday March 15 2007, @06:57AM (#18360181)
    I bet that means the IAO has their project running properly now so they no longer need to use Google Logs ...
  • We still think of the children! (Score:1, Interesting)

    by Anonymous Coward on Thursday March 15 2007, @07:22AM (#18360325)

    After previously fighting turning over search data to the feds, it looks like they are striking another blow to the "think of the children" crowd.
    Anybody who remembers this incident probably also remembers the article 'Google in bed with the CIA' too:

    "Google was a little hypocritical when they were refusing to honor a Department of Justice request for information because they were heavily in bed with the Central Intelligence Agency, the office of research and development," said Steele.
    http://www.prisonplanet.com/articles/october2006/2 71006googlecia.htm [prisonplanet.com]

    Makes me wonder how fast does the CIA anonymize their material? Ha!
  • rom the poof-your-gone dept. (Score:1, Offtopic)

    by 1u3hr (530656) on Thursday March 15 2007, @07:38AM (#18360445)
    "you're gone" [you are]
    • 1 reply beneath your current threshold.
  • by j_heisenberg (464756) on Thursday March 15 2007, @07:40AM (#18360455)
    since that data could be abused in any number of ways, including credit scoring, insurance scoring or leaks of "interesting details" to the press. Probably those would hurt Google's reputation more than any additional income it could generate, but it's still the better policy.
  • If you're worried about privacy, I recommend Firefox [getfirefox.com] and the Customize Google extension [mozilla.org]. I'm also a fan of Googlepedia [mozilla.org].
  • 18-24 months? (Score:2, Insightful)

    by JackMeyhoff (1070484) on Thursday March 15 2007, @07:53AM (#18360543)
    Which is it? 18, 19, 20, 21, 22, 23 or 24?
  • Yeah Right (Score:2)

    by Psx29 (538840) on Thursday March 15 2007, @07:58AM (#18360595)
    This means nothing. If you click the link.."By anonymizing our server logs after 18-24 months..." That's still far too long and is most likely motivated more by logistical concerns in retaining so much data than out of any act of benevolence. However it definately makes good PR to paint this as 'Taking steps to improve privacy'...
    • Re:Yeah Right by Alascom (Score:2) Thursday March 15 2007, @10:30AM
  • by guanxi (216397) on Thursday March 15 2007, @07:59AM (#18360605)
    To quote them:
    "It is difficult to guarantee complete anonymization, but we believe these changes will make it very unlikely users could be identified."

    "Changing the bits of an IP address makes it less likely that the IP address can be associated with a specific computer or user. Cookie anonymization makes it less likely that a cookie can be used to identify a user."

    "[I]t's possible that data retention laws will obligate us to retain logs for longer periods."

    "How many subpoenas for server log data does Google receive each year?
    As a matter of policy, we don't provide specifics on law enforcement requests to Google."


    I don't think it will mean much unless they publish their anonymization technique. Even Google seems to have doubts about it, and considering the resources of some attackers (e.g., national governments), if the anonymization can be broken it will be.

    But Google's anonymization does not have to be perfect: Google isn't the only place your google.com activity is recorded: There's your personal computer, possibly your ISP, other sites (referrer links show Google search terms), etc. As long as Google makes their anonymity difficult enough to break that it's significantly easier to go elsewhere for the information, they've done their job. If you need to be anonymous, I hope you are taking other steps.

    I, for one, welcome the merciful intentions of our benign new overlords.

  • Um... (Score:1)

    by superbus1929 (1069292) on Thursday March 15 2007, @08:20AM (#18360841)
    (http://www.superbusnet.com/)
    Didn't AOL get into a lot of trouble for this?

    Personally... we knew this was going to happen. Anyone that's surprised is a fool.
    • Not exactly by nova_ostrich (Score:1) Thursday March 15 2007, @10:07AM
  • List of nifty little phrases that have bitten their speakers in the ass:

    • They will never bomb Berlin
    • Read my lips, no new taxes
    • I did not have sex with that woman
    • Mission accomplished
    • Don't be evil

    Now Google brings us:

    Let's just be less evil, now that we've been caught.

  • well (Score:1)

    by DuroSoft (1009945) on Thursday March 15 2007, @08:46AM (#18361093)
    (http://www.crush0meter.com/)
    The 'think of the children crowd' should be very pleased by this - children search for sketchy things all the time... and then their parents get blamed for it.

    'Twould be better if it all stayed anonymous, in my opinion
    • 1 reply beneath your current threshold.
  • Hash the IP addresses? (Score:2, Insightful)

    by sherriw (794536) on Thursday March 15 2007, @09:42AM (#18361797)
    Personally I think it's all a load of BS. If they really cared about our privacy, and if all they really needed my IP addy for is to aggregate my searches to 'better serve me', then all they have to do is one-way hash my IP addy. Then they can still tie all my searches together, and my gmail and such, but they wouldn't be able to back track it. And the govn't could demand all they want... you want the IP of the user who searched this? Here it is Mr. Bush... go nuts: x867:%dsgfk435j>67&*g[fg

    So forgive me if I don't get all thankful for Google's big gesture. Heh.
    • Re:Hash the IP addresses? (Score:5, Insightful)

      by santiago (42242) on Thursday March 15 2007, @10:39AM (#18362785)
      (http://mapache.org/)
      There's 2^32 IP addresses under IPv4. If Google is doing the hashing, then they know the hash function. How long do you think it would take them to brute-force break the hash by hashing every possible IP address and creating a map from the hashed values back to the originals? Express your answer in microseconds.

      (If your solution is to increase the space of inputs by adding a variable salt value, please explain how this allows them to use the resulting hashes for aggregation.)
      [ Parent ]
  • 127.0.0.1 (Score:4, Funny)

    by supun (613105) on Thursday March 15 2007, @09:49AM (#18361871)
    Just hard code the function that grabs "HTTP_REMOTE_ADDR" to return "127.0.0.1." That way the feds will think all the kiddie p0rn searches came from the computer they are using.
  • Does that mean ... (Score:1)

    by digitig (1056110) on Thursday March 15 2007, @10:25AM (#18362453)
    ...I can stop adding "-lolita" when searching for "Nabukov"?
  • by Nom du Keyboard (633989) on Thursday March 15 2007, @11:27AM (#18363663)
    Why is Google getting any favorable press at all for this? They never should have been doing it in the first place.
  • 18 months? (Score:2)

    by J'raxis (248192) on Thursday March 15 2007, @11:53AM (#18364161)
    (http://www.jraxis.com/)

    There is absolutely no reason for them to retain logs linking searches to IP addresses for even 18 seconds, let alone 18 months -- this isn't "improving Google" for any of their users, no matter how much they claim it is.

    Keeping search history for logged-in users is one thing; I can see how some users could find that useful, just like browser history autocomplete. Perhaps they want to keep logs of non-logged-in users around for something like geographical targeting, but there's no reason they can't process out the IP information immediately, or on a quick rolling schedule such as every 24 hours. Or, just keep the /24 or /16 form of the IP address; that effectively anonymizes the data but still provides enough information for geo-targeting or other forms of aggregation. If they want to track the flow of requests (a user searched this, then that, clicked here, then...), they can use their cookie for that, or do something like generate a hash of each IP's hostname* and track requests by the hash.

    "18-24 months, however, is about the right length of time that this data could be useful for the government for purposes of intelligence gathering or criminal prosecution, however.

    * Hashing the IP itself is useless as there aren't enough IPs (4,294,967,296 in theory, much less in practice due to all the reserved /8s) to make reversing the hash back to the IP difficult. However, the domain of valid hostnames is incredibily large (any alphanumeric string up to 256 characters), such that one can be reasonably confident the hostname cannot be computed from the hash.

  • I don't understand (Score:2)

    by JustNiz (692889) on Thursday March 15 2007, @12:28PM (#18364755)
    Why don't they just not save search data in the first place?
  • by Jane Q. Public (1010737) on Thursday March 15 2007, @01:23PM (#18365517)
    I am so sick of the myriad bad laws and regulations passed because they were supposedly "good for the children".

    Bollocks.

    People have been creating a world with a lid that is so "screwed down" by "authority" that if the trend continues, children will be growing up in a living hell, in which they are not allowed to think for themselves even after becoming adults.

    Is this good for them? Is it good for *anybody*??

    I think not.
    • 1 reply beneath your current threshold.
  • by talledega500 (994228) on Thursday March 15 2007, @03:15PM (#18367065)
    for search: http://www.blackboxsearch.com/ [blackboxsearch.com] for full proxy. http://www.mysecureisp.com/ [mysecureisp.com]
  • by pcause (209643) on Thursday March 15 2007, @03:52PM (#18367613)
    Google is gathering a huge trove of informaiton about us and this shows it is not anonymous. Search is only part of what they have. The more Google services you use the more you let them build a very detailed profile of you. And the more you do that the less privacy you have.

    They know what you search for, who you IM and email and about what, where you have appointments and what you bought. You essentially have no privacy.

    If you value your privacy do not use any single provider and spread your searches, IM, email and purchases accross multiple service providers. The government can use its powers to get your data and correlate it, but no commercial entity should have the equivalent power. Commercial interests of Google or any other provider run counter to protecting your privacy.
  • Re:right.... (Score:5, Insightful)

    by ag0ny (59629) <{moc.yn0ga} {ta} {yn0ga}> on Thursday March 15 2007, @06:46AM (#18360117)
    (http://www.baruchito.com/)
    Why would Google have to comply with EU regulations? :?

    Maybe because they do business in Europe?
    [ Parent ]
    • Re:right.... by mikkelm (Score:1) Thursday March 15 2007, @09:13AM
    • 1 reply beneath your current threshold.
  • Re:right.... (Score:5, Informative)

    by skrolle2 (844387) on Thursday March 15 2007, @07:28AM (#18360369)
    http://eur-lex.europa.eu/LexUriServ/LexUriServ.do? uri=CELEX:32006L0024:EN:NOT [europa.eu]

    The data retention directive only applies to ISPs, and only deals with who you "communicate" with. It does not explicitly say that a record of which websites you visit should be retained, and it explicitly says that the content of the communication must not be retained.

    However, as for all EU directives, it only contains the baseline of regulation. Directives are never law themselves, but have to be implemented in each respective member state by each respective legislative body. These, in turn, are free to implement whatever they want ABOVE the baseline, so some member states may have longer retention periods for this data, some member states may require ISPs to retain additional data.

    The deadline for this directive is September this year, but if you read it, a few member states have reserved the option to postpone parts of the directive, typically of the internet-related traffic. This basically means that they recognize the difficulties in implementing it, and want more time to think about on how to do it, or possibly obstruct it.

    What all of this boils down to is that maybe, sometime in the future, if you have an European ISP, they may be required to store all the URLs that you access. Google search data is transmitted as querystring parameters that are part of the URL, which means that your search data may be stored by your ISP, in a non-anonymized way. There's nothing in this possible future that Google has to comply with, as long as they are not an European ISP.
    [ Parent ]
    • Re:right.... by skrolle2 (Score:2) Thursday March 15 2007, @09:38AM
    • 1 reply beneath your current threshold.
  • Re:I for one... (Score:2)

    by Dunbal (464142) on Thursday March 15 2007, @07:53AM (#18360549)
    "Goldfish porn" and "Kinky sofa covers"

          Funny you mention that, I was searching just the other day for "sofa porn" and "kinky Goldfish covers"...
    [ Parent ]
  • 7 replies beneath your current threshold.