Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Internet Mozilla Privacy Technology

Research To "Reveal the Unseen World of Cookies" 108

An anonymous reader writes "The Guardian newspaper has teamed up with Mozilla to research the monitoring of online behavior through cookies and other web trackers. After downloading the Collusion add-on for Firefox, you can generate a visual representation of all the cookies that have been downloaded which are linked to the sites you have visited. This shows quite an interesting picture. The Guardian staff then want the data from Collusion to be uploaded to their site, after which they say 'we can build up a picture of this unseen world. When we've found the biggest players, we'll start tracking them back — finding out what data are they monitoring, and why.'"
This discussion has been archived. No new comments can be posted.

Research To "Reveal the Unseen World of Cookies"

Comments Filter:
  • Great Idea (Score:3, Interesting)

    by thesaintar ( 865954 ) on Monday April 16, 2012 @10:25AM (#39700435) Homepage
    I hope implementing it in the right way (with publicly accessible statistical and analysis methods) will shed some light into how we're being tracked. Is there an equivalent of Collusion for Chrome?
    • by WrongSizeGlass ( 838941 ) on Monday April 16, 2012 @10:27AM (#39700445)

      Is there an equivalent of Collusion for Chrome?

      I believe it's called Google Ads ;-)

    • When we've found the biggest players, we'll start tracking them back — finding out what data are they monitoring, and why.

      I can answer this entire thing in 2 seconds. Porn, so they can sell it to you. In that order.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        Who goes on the internet to BUY porn?!

        • Who goes on the internet to BUY porn?!

          Well a fair amount of people obviously do or you wouldn't get so much advertising, would you? Advertisers don't do it for the fun of it.

      • When we've found the biggest players, we'll start tracking them back — finding out what data are they monitoring, and why.

        And then we'll sell the info back to them!

    • Is there an equivalent of Collusion for Chrome?

      Yes

    • Seems to be broken or for some mysterious reason incompatible with my AdBlock Plus, NoScript and Ghostery addons. Hm...
  • by GameboyRMH ( 1153867 ) <gameboyrmh.gmail@com> on Monday April 16, 2012 @10:30AM (#39700479) Journal

    On Firefox, disable HTML5/DOM storage, install CookieMonster 1.5 and BetterPrivacy.

    • by zarlino ( 985890 )

      On Google Chrome, the first thing to do is disallowing third-party cookies:
      Settings -> Under the Hood -> Content Settings -> Block third-party cookies and site data

  • Pot kettle spy. (Score:5, Insightful)

    by FatLittleMonkey ( 1341387 ) on Monday April 16, 2012 @10:36AM (#39700529)

    we'll start tracking them back — finding out what data are they monitoring, and why.

    Well, here's my contribution;

    The Guardian page in the link has six trackers:
    24/7 Real Media
    Audience Science
    ForeSee
    Maxymiser
    Optimizely
    Quantcast

    I don't know what any of them do, and I blocked them all. Fuck 'em.

    • I actually see 9:

      24/7 Real Media
      Audience Science
      ForeSee
      Google Adsense
      Maxymiser
      Omniture
      Optimizely
      Quantcast
      Twitter Button
      • I was gonna ask if guardian was included in their own stats ?
      • by FatLittleMonkey ( 1341387 ) on Monday April 16, 2012 @11:22AM (#39700979)

        Story of my life. I brag about having 6, and the other guy has 9.

        • by Ihmhi ( 1206036 )

          Go out and buy a couple cases of boosters and then you won't have to deal with the guy who brags about how many decks he has in his backpack./p

        • Guy below has 12, so Mr "I've got 9" ain't all that either. Hope that cheers you up!
      • by bfree ( 113420 )

        You missed some more!

        googleapis
        simplifydigital
        guim
        llnwd
        ophan
        ytimg
        youtube
        quantserve
        wunderloop
        revsci
        cogmatch
        imrworldwide

        I'll leave it as an exercise for the reader to de-dupe the above list (e.g. quantserve Vs quantcast and ytimg Vs youtube) and decide for themselves which ones are innocuous.

        I didn't even bother to let any of them run any javascript to discover what else they might try to sneak in. I'm also willing to bet I missed something.

        You have to love the "obfuscation" and attempts to get p

        • by pjt33 ( 739471 )

          That's not an attempt to get past blocking. It's a necessity to get the HTML parser.

          • by pjt33 ( 739471 )

            To get past the HTML parser.

          • Nope.

            document.write("<script type='text/javascript'>");

            works just fine. You're thinking of the closing tag:

            document.write("</scr"+"ipt>");

            is a necessity to get past the HTML parser.

      • You will all see different cookies because they are coming from various machines on the net. Upstream intermediates are inserting them on the fly.
    • I fully agree with blocking them... I use ghostery, check something like Time.com (Technologizer specifically). There's about twenty separate trackers there. Unfortunately, disabling some of these will actually break functionality. In this case, there is a Apple II anniversary slide show on technologizer's site and I could not advance through the slides until I enabled the tracking.
    • Re:Pot kettle spy. (Score:5, Interesting)

      by Anonymous Coward on Monday April 16, 2012 @11:53AM (#39701261)

      Hi,

      I'm the Guardian journalist working on this.

      Unsurprisingly, if you install Collusion after reading an article on The Guardian, you tend to log cookies that our website sets. So we're noticing quite a few of the trackers we use on guardian.co.uk turn up in the project. :)

      We're ok with that - better to be open that our website uses cookies for registration, analytics and advertising (just like most others!), than pretend or hide away the fact. Actually, we did another article on the same day showing how we use them: http://www.guardian.co.uk/technology/2012/apr/13/new-law-cookies-affect-internet-browsing.

      The ones in that list above are a mix of third-party advertising cookies, analytics and A/B testing (so I'm learning!).

      When it comes to the data we're going to try and get from the Collusion info - we can't really infer much about what behaviours have been tracked from the exported data. However, it gives us a nice long JSON string that associates certain cookies as being set when visiting certain sites. At the moment we're using that to find out how many instances of each type of tracker we're seeing across multiple sites.

      We're then going to take the most prolific ones and find out more about what they do, who owns them, how they work, etc. However, we're going to be using old-fashioned journalism to do that - research and phone calls.

      However, I was thinking of putting up open documents like this: https://docs.google.com/document/d/1lCp8H9i-MJwyORj_MOZflH6BCt9j6HIbQkyS2536knM/edit
      so you could see where I'd got to and put me right if I was going off track (as it were). Good idea? Bad idea?

      Joanna.

      • by Anonymous Coward

        Urgh. Sorry. Forgot to log in. o_O

      • You know already who the "Big Players" are - Google, Facebook, Microsoft, your choice of a couple more related ones.

        Then it descends into all these little companies. I would expect that some of them are subsidiaries of the big guys etc.

        The ideal goal of each of these "thingies" (cookies, flash objects, etc etc) is to nail down who visits down to a unique user if possible.

        So just copy the Ghostery block list, maybe the AdBlock block list, your choice of a couple more tools.

        If you want a "market share per ad

        • So just copy the Ghostery block list, maybe the AdBlock block list, your choice of a couple more tools.

          Guardian does seem to be re-inventing the wheel a bit. Ghostery (Evidon/Better-Adertising/Direct-Advertising-Assoc) already has not just a public list of tracking companies, but a page of info about each one.

          Whereas Collision seems more about displaying the connections ("collisions") between known trackers that you personally encounter, not collecting new info for a data dump.

          I like the Guardian, and I appreciate the journo sticking her head in the lions den, but it seems to me she&they would achieve mo

      • by bfree ( 113420 )

        However, I was thinking of putting up open documents like this: docs.google.com/blah so you could see where I'd got to and put me right if I was going off track (as it were). Good idea? Bad idea?

        Putting this stuff on google is like asking the NSA to host wikileaks ... bad idea.

        • What would be the best place to put it bfree? I'm very happy to take suggestions for alternative ways of opening up my note-taking.
          • by bfree ( 113420 )

            You seem to have access to a website you could already publish it on no?

            Failing that for whatever reason you could put it in a wiki on branchable [branchable.com]? No I'm not affiliated to them in any way but they were the first "good" answer which jumped to my mind.

            More obscure but perhaps extra appropriate for the topic at hand, you could publish it on a "hidden service" on tor?

        • Eh? If Ms Geary puts it anywhere public online, google can see it anyway. (As can the actual NSA.) So unless you're saying that Google will censor her work, your comment makes no sense.

    • How did you block them?

      I was thinking about adding null direction to 127.0.0.1 in /etc/hosts file, but perhaps there is a better way?

      • Firefox + Ghostery

        +ABP +NoScript +WOT +no-third-party-cookies...

        I didn't think I was especially paranoid (I have a google account, don't use on-disk or in-mail encryption, etc) until I realised that this isn't how most people think.

  • Cookieculler (Score:5, Informative)

    by MLCT ( 1148749 ) on Monday April 16, 2012 @10:37AM (#39700549)
    Bit of a shoutout for the firefox extension cookieculler.

    I have never found anything that matches cookieculler for features: it doesn't just purely delete cookies, it operates with a white-list based system (the way everything on the web should work). Cookieculler deletes all cookies each time you close the browser, except the ones you have whitelist "protected", that keep login information etc. as you choose.

    Along with noscript, cookieculler is the main reason I stay on firefox.
    • I've found "Ghostery" to be pretty damn good. Blocks them rather than allowing+ deleting them.
    • by emilv ( 847905 )

      How is cookieculler different from setting a default policy in Firefox and then using the built-in whitelist in Firefox to give permissions for certain sites?

      • Re:Cookieculler (Score:5, Informative)

        by MLCT ( 1148749 ) on Monday April 16, 2012 @11:01AM (#39700753)
        Granted firefox can offer something close, but not quite. Cookieculler offers finer control, because you can whitelist the *cookies* rather than the domain. So I can (and do) choose to protect my /. cookie, but not anything else that /. place in my browser (hypothetical example, as /. don't place any other cookies).
  • Protect yourself from tracking websites by this addon that collects all your cookies and sends it to us!

    • by arth1 ( 260657 )

      That was my reaction too.
      Combined with the technology being used, not installing it was a given.

    • It's really about understanding a bit more so that you can then take action to protect yourself if you want to. But yeah... I get the irony. The reason why I still thought it was worth going ahead with the protect was twofold: the aims of the Collusion team to educate and inform AND that all the information sent to us is anonymous. I would love to say we could identify people by the sites they visit, but in aggregate it seems like everyone likes internet shopping and porn. :)
  • by Anonymous Coward

    Anyone else read the title and thought people were taking a deeper look at why those delicious baked goods are so tantalizing?

  • I read the title, and get all excited ... and then read the summary to find they're not talking about the Girl Scouts, Nabisco, or other things that might involve sugar and chocolate chips.

    And now that I got my hopes up, I'm going to go see what's in the vending machine. There's usually animal crackers, at the very least.

  • Internet marketing (Score:5, Interesting)

    by Roberticus ( 1237374 ) on Monday April 16, 2012 @10:44AM (#39700599)

    If average folks become aware of how many cookies get set (along with getting a user-friendly way* of turning them off), that could have a huge and entertaining effect on the world of Internet marketing**.

    For example, right now, I can assume enough website visitors have JavaScript enabled to make it almost 100% (and not worth writing HTML for the case where they don't). But if I can only reasonably assume, say, 50% of my visitors/email through-clickers/etc. have cookies active, that plays havoc with my reporting.

    * "User-friendly" defined as "something my dad can do without asking me for help".
    ** I spend all day every workday in this world.

  • Facebook (Score:5, Informative)

    by Lucky75 ( 1265142 ) on Monday April 16, 2012 @10:48AM (#39700637)
    You'd be shocked at how many cookies come from facebook across multiple sites. I use an extension called Ghostery (https://addons.mozilla.org/en-US/firefox/addon/ghostery/) to block most of them.
    • Spoiler: It's practically every site.

    • by Zocalo ( 252965 )
      Yeah, social media sites are particularly obnoxious; you'll often get one cookie for every site that has one of their "Like", "+1" or whatever buttons on a page. Analytics sites are another obvious example where this is going to happen more often that not. "Screw 'em" was my response too, but I went for a deny all by default and whitelist approach rather trying to manage them on a per domain basis.

      I've been doing that for a while now as it's much simpler and, once you've gone through the initial setup
    • You'd be shocked at how many cookies come from facebook across multiple sites. I use an extension called Ghostery (https://addons.mozilla.org/en-US/firefox/addon/ghostery/) to block most of them.

      I use Ghostery plus RequestPolicy [requestpolicy.com] which gives you control over every single external request that a web page makes. It is like a noscript for cross-site references of any kind.

      • +1 for RequestPolicy, although I have to say when I restart my browser, then immediately find my self staring at a FUBAR page, I usually just hit "temporarily allow all requests" and get on with life, tracked as I may be. I do log out of facebook each time and delete facebook.com cookies, but I suspect that facebook still tracks me on other domains they control. I am like a tiny tiny person shaking a tiny tiny fist at the giant.

        • I use AdBlock Plus to nix the Facebook tracking. At the cost of seeing "Like" buttons everywhere I go (yes, that's a joke), these filters or some similar will do the trick:

          ||facebook.com^$third-party,domain=~facebook.net|~fbcdn.com|~fbcdn.net
          ||facebook.net^$third-party,domain=~facebook.com|~fbcdn.com|~fbcdn.net
          ||fbcdn.com^$third-party,domain=~facebook.com|~facebook.net|~fbcdn.net
          ||fbcdn.net^$third-party,domain=~facebook.com|~facebook.net|~fbcdn.com

          You will occasionally see a button when the image is hosted

  • Yo Dawg (Score:3, Funny)

    by Z80xxc! ( 1111479 ) on Monday April 16, 2012 @10:58AM (#39700725)
    Yo dawg... I heard u dislike being tracked, so we put a tracker in your trackers so you could be tracked while we track.
  • by dmomo ( 256005 ) on Monday April 16, 2012 @11:11AM (#39700839)

    It will be interesting to see not only the results of this analysis, but also how they came any conclusions that they do.

    Many cookies are used only to store a unique identifier. They data about a user many websites actually store is housed and maintained on their server, keyed by the unique id. This could include "pages visited", "duration of visit", "browser/system specs/settings" along with any derived demographic data.

    It would be hard (though not necessarily impossible) to determine this from a cookie analysis.

  • by dryriver ( 1010635 ) on Monday April 16, 2012 @11:12AM (#39700855)
    I found out using its automated "graph-builder" that the 3 - 4 supposedly "safe" sites I visit most often, actually pass my user data on to Google, Facebook, DoubleClick, Mediaplex, Adroll and other services. Its quite educational to watch the graph go from a blank page to a fairly complex network of interconnections as you continue to browse. Its going to be interesting to see what results from this when the Guardian gets all the aggregate data from Collusion. It does seem indeed that there is such a thing as a "secret world of cookies" out on the internet, and I personally support that this "secret world" be uncovered fully, so we get to see what entities are clandestinely mining our supposedly "private" user information as we surf. --- The whole thing also reminds me of the book "Brandwashed", where the author explains at length how commercial establishments collect all sorts of data on us, and exploit it to sell us more products.
  • No research needed, the truth [wikipedia.org] about the unseen world of cookies has been known since 1968. They're made in a hollow tree by elves [keebler.com].

  • I would like to have a FF plug-in that messes up cookie data to make it useless to the trackers. A little bit of revenge...
  • by Sperbels ( 1008585 ) on Monday April 16, 2012 @11:58AM (#39701339)

    finding out what data are they monitoring, and why

    Well, all the porn websites seem to know that I prefer brunettes over blonds.

    • But do they also know that you buy your underwear from Marks & Spencers? That's the interesting sort of thing I'm hoping we'll find out - what companies are tracking over such varied sites and what information (if any) they then sell back to their clients.
  • by isaac ( 2852 ) on Monday April 16, 2012 @12:53PM (#39701903)

    Cookies are not the only evidence of tracking. Even Flash LSO, HTML5 local storage, etc.

    There's a surprising amount of identifying information in request headers and what's available to javascript. (see http://panopticlick.eff.org/ [eff.org] for a demonstration.) That means, one often needn't accept or store a cookie to be tracked.

    A really comprehensive pro-privacy browser extension would munge request headers and enumeration of fonts, plugins, screen resolutions, etc. to match one of, say, the top 5 most common desktop browser fingerprints - and to change every so often (Changing per request would itself be a trivially detectable signature.)

    -Isaac

  • Gooble gooble gooble.
    I love Cookie Monster. He taught me the best places to hide my cookies as a kid. ..Huh, what's that? Wrong type of cookies? Oh....

    Ghostery started tipping me off to how much stuff I was missing. I'm in the process of whitelisting sites, which is a pain with all the underlying stuff lying around.

  • That's the one people should be the most concerned with. When I first started using NoScript, I was stunned at how many supposedly reputable sites were using javascript pulled from ten or twenty different unrelated sites. There's just NO good excuse for that at all.
    • That's the one people should be the most concerned with. When I first started using NoScript, I was stunned at how many supposedly reputable sites were using javascript pulled from ten or twenty different unrelated sites. There's just NO good excuse for that at all.

      Agreed - quite amazing. And how insidious FaceBook is...

  • ScoreCard Research Beacon. Without my consent.
  • It's not compatible with 3.6, which I prefer over the UI of later versions.

    Wonder how many data points that will lose them.

"If it's not loud, it doesn't work!" -- Blank Reg, from "Max Headroom"

Working...