Delete Cookies, Inflate Net Traffic Estimates 217
eldavojohn writes "In my browser, I regularly go to the tools menu and clear my private data. This includes my cookies. As a result, people like me who destroy cookies by the thousands may be inflating estimates of Web traffic by up to 150 percent. People have good reasons for clearing out cookies — we've heard about bad cookies before (and I think the FCC is still investigating the issue). But every time you delete cookies, many of the sites you've visited count you as a new visitor next time."
On the other hand... (Score:2, Informative)
150%? (Score:5, Informative)
I don't do it because it is a pain to constantly log back in everywhere. But I seriously doubt more than 2% of the non-slashdot crowd does it.
FTC, not FCC (Score:4, Informative)
Small businesses (Score:1, Informative)
And believe me, you're not making a rich man richer, you're making a middle-class man better able to support his family.
CookieSafe is my current favourite (Score:4, Informative)
Cookiesafe allows me to keep my permanent cookies to a minimum, yet allow me all the functionality of session cookies. Of course, it does inflate the stats as the article mentions. In my previous job I worked with stats quite a bit (using WebSideStory/Hitbox), and it is such an inexact science that it ranks right up there with Lies and Damn Lies.
https://addons.mozilla.org/en-US/firefox/addon/249 7 [mozilla.org]
Anyone have other suggested software they prefer?
Re:FTC, not FCC (Score:4, Informative)
http://yro.slashdot.org/article.pl?sid=06/11/15/1
Re:So what? (Score:4, Informative)
This is why there is research out there to use methods other than cookies and IP addresses to identify users -- see this article [slashdot.org] from last September.
I'm sure this concept can get some VC if companies begin distrusting current traffic anlayses -- it would be a useful adjunct to traditional traffic monitoring.
Re:Visitors vs. Unique Visitors..anyone? (Score:3, Informative)
Unfortunately IP address doesn't work. NAT can put anywhere from a couple (small home network) to thousands (corporate networks) of individual machines behind a single IP address. The common ISP practice of using dynamic addresses can result in a single machine having anywhere from one address for years at a time to a different address every hour. Most web-statistics companies have abandoned IP addresses as a valid identifier.
Most of them do in fact rely on cookies of one sort or another. Most rely on browser cookies, a few are using Flash or media-player cookies. All of them suffer from the fact that cookie deletion or filtering in the browser corrupts the statistics. Blocking of cookies completely is the easiest form to deal with, the server-side code can check whether cookies were in fact set and simply discard data from browsers that don't accept cookies. Cookie deletion, or forcing cookies to have session lifetimes, is harder to deal with since to the server it looks like the cookies are good but in reality they can't provide information about visitors, only sessions. The worst are one-shot cookies, where the browser will let a new cookie be set but then won't permit it to be modified or removed. The big problem with them is that any test will overlap to some degree with normal cookie behavior, so you end up having to balance how much corruption you're getting relative to how much good data you're throwing out by mistake.
Most web-statistics firms are working to avoid the worst of the problems by moving their machines into the DNS namespace of the sites they're collecting statistics on. That helps get around third-party cookie behavior in browsers, and should work until browsers either start having extensive host-specific block lists or start allowing cookie filtering based on IP address instead of URL hostname.
I always considered the intricacies an interesting puzzle, and wringing every bit of validity possible out of the system a challenge. Management, unfortunately, doesn't want to hear about the intricaties, they just want to hear that there's no problems, everything's fine and the numbers they're giving their customers are perfect. Customers, even more unfortunately, don't want to hear about any problems, they just want to hear that the numbers they're getting are perfect. Sooner or later the cluebat will get applied.
Re:FTC, not FCC (Score:3, Informative)
Privacy is an illusion (Score:3, Informative)
There's a few fingerprinting companies out there, track you by stuff plugins give away(dates, versions, etc.. anything the plugin will give up). I've even heard of a company using the time offset from your computer from your web browser(which passes the time back in milliseconds since 1970, IIRC) and combined with some other methods it really helps you track people down. Not to mention you can combine all this with your IP address and you're pretty good. But deleting cookies doesn't really help you, it's more of a minor inconvenience to the small companies who don't really care to track you that much, and a tiny hurdle to larger companies who do care and who are already doing it and some that even know you before the cookie. (Don't accept cookies? Check for that, and IP address, flash version, time offset(if it's possible), what plugins are installed via navigator.plugins and you're pretty close to a positive ID. Of course there are many other ways and I don't know any of them. So, delete your cookies if you want, but realize it's not much of a help.
Adblock is, and ultimately those who really want to track you probably can.
Re:FTC, not FCC (Score:4, Informative)
The worst part is that they didn't fire Cookie Monster him until the letter Q and the number 4 pulled their sponsorship. Of course, I think he didn't need to go on Bert and Ernie's talk radio program either because they're hypocrites themselves.
Re:150%? (Score:3, Informative)
1. Whitelist sites whose cookies you want to keep.
2. Blacklist cookies from some sites (doubleclick, anyone?).
3. Set most other cookies to be killed after you exit FF.
I know Firefox lets you do that anyway, but the difference is that Cookiesafe lets you do it easily.
Re:Visitors vs. Unique Visitors..anyone? (Score:3, Informative)
A single TCP-connection is identified by a quad: ip and port for the two destinations.
So, you only really need a new source-port for every internal user who visits the same site.
NAT is implemented by maintaining an internal table of what external ips/ports should be mapped to which internal ip/port. An example:
Practical result ?
You can use a single external IP for a group of websurfers, the size of the group has a limit, you run into trouble the moment more than 65000 of your internal users want to visit the same website simultaneously. With simultaneously being defined as within the timeout of the NAT-table (typically 1-5 minutes)
Atleast a million websurfers can easily hide behind a single IP using this technique. 10 million if they're not hugely active, or if they don't visit the same sites all the time. Not that there's any reason to. Ips aren't *that* hard to come by.
You could increase this by another order of magnitude or two by also taking sequence-numbers into the NAT-tables. Two different users connecting to the same service at the same time are likely to get sequence-numbers different enough that the two connections can be recognized based on this. This ain't really a good idea though, because if you did this, you could get unlucky and have two connections accidentaly get sequence-numbers close to oneanother.
Besides, you don't really have a *reason* for hiding a billion websurfers behind a single IP, now do you ?
Comment removed (Score:3, Informative)