Facebook Kills Dataset of Crawled Public Profiles

Facebook Kills Dataset of Crawled Public Profiles 158

Posted by CmdrTaco on Wednesday March 31, 2010 @11:19AM from the creepy-crawlies dept.

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.

Facebook Kills Dataset of Crawled Public Profiles

This discussion has been archived. No new comments can be posted.

Search 158 Comments Log In/Create an Account

Comments Filter:

Very interesting (Score:3, Informative)

by Bearhouse ( 1034238 ) writes: on Wednesday March 31, 2010 @11:40AM (#31688552)

I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
Anyways, just check what he did with it; very interesting: (FTA)
http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html [typepad.com]
There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

Re:For an Interesting Exercise in Head Asplosion (Score:4, Informative)

by Tobor the Eighth Man ( 13061 ) writes: on Wednesday March 31, 2010 @11:52AM (#31688706)

Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.

Re:Robots.txt is insufficient. (Score:3, Informative)

by truthsearch ( 249536 ) writes: on Wednesday March 31, 2010 @11:58AM (#31688772) Homepage Journal

So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.
Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.

Re:For an Interesting Exercise in Head Asplosion (Score:3, Informative)

by Rantastic ( 583764 ) writes: on Wednesday March 31, 2010 @12:25PM (#31689148) Journal

Finding something on the web does not give you the legal authority to publish and redistribute it.
Nonsense.
Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.
Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is
If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.

Re:Yes, by all means, let's stamp out... (Score:3, Informative)

by thePowerOfGrayskull ( 905905 ) writes: <[moc.liamg] [ta] [esidarap.cram]> on Wednesday March 31, 2010 @12:28PM (#31689192) Homepage Journal

Removing names isn't necessarily enough. The recent netflix case shows that [securityfocus.com]. I think it's interesting that nobody catches the broader implications of that discussion -namely that whether they're "anonymizing" data for purposes of providing it for research, or selling it for marketing... the ability to reverse engineer patterns to undo it remains a risk. -

Re:For an Interesting Exercise in Head Asplosion (Score:2, Informative)

by crashumbc ( 1221174 ) writes: on Wednesday March 31, 2010 @12:48PM (#31689458)

unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.
A crawler may or may not by pass that...

Re:On what grounds? (Score:3, Informative)

by cdrguru ( 88047 ) writes: on Wednesday March 31, 2010 @01:59PM (#31690466) Homepage

If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.
The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.
You are mostly correct that a lawyer would know these things and how they are done in court. Therefore, yes, almost always a lawyer is required, if for no other reason than to get through the proper procedural format of the court process. You want to do it yourself? You better spend some time learning how it is done, what is required to win and how to get there. Without that education, it is like taking someone that doesn't know computer programming and having them debug a program in an Assembler language.
Don't have the time to learn all this stuff? Well, that is why we have lawyers.

Re:For an Interesting Exercise in Head Asplosion (Score:3, Informative)

by clone53421 ( 1310749 ) writes: on Wednesday March 31, 2010 @02:46PM (#31691208) Journal

They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is
Here. [74.125.95.132]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Facebook Kills Dataset of Crawled Public Profiles 158

Facebook Kills Dataset of Crawled Public Profiles More Login

Facebook Kills Dataset of Crawled Public Profiles

Very interesting (Score:3, Informative)

Re:For an Interesting Exercise in Head Asplosion (Score:4, Informative)

Re:Robots.txt is insufficient. (Score:3, Informative)

Re:For an Interesting Exercise in Head Asplosion (Score:3, Informative)

Re:Yes, by all means, let's stamp out... (Score:3, Informative)

Re:For an Interesting Exercise in Head Asplosion (Score:2, Informative)

Re:On what grounds? (Score:3, Informative)

Re:For an Interesting Exercise in Head Asplosion (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot