Facebook Kills Dataset of Crawled Public Profiles 158
holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.
Very interesting (Score:3, Informative)
I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
Anyways, just check what he did with it; very interesting: (FTA)
http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html [typepad.com]
There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome
Re:For an Interesting Exercise in Head Asplosion (Score:4, Informative)
Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.
Re:Robots.txt is insufficient. (Score:3, Informative)
So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.
Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.
Re:For an Interesting Exercise in Head Asplosion (Score:3, Informative)
Finding something on the web does not give you the legal authority to publish and redistribute it.
Nonsense.
Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.
Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is
If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.
Re:Yes, by all means, let's stamp out... (Score:3, Informative)
Re:For an Interesting Exercise in Head Asplosion (Score:2, Informative)
unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.
A crawler may or may not by pass that...
Re:On what grounds? (Score:3, Informative)
If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.
The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.
You are mostly correct that a lawyer would know these things and how they are done in court. Therefore, yes, almost always a lawyer is required, if for no other reason than to get through the proper procedural format of the court process. You want to do it yourself? You better spend some time learning how it is done, what is required to win and how to get there. Without that education, it is like taking someone that doesn't know computer programming and having them debug a program in an Assembler language.
Don't have the time to learn all this stuff? Well, that is why we have lawyers.
Re:For an Interesting Exercise in Head Asplosion (Score:3, Informative)
They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is
Here. [74.125.95.132]