Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Social Networks Your Rights Online

Facebook Kills Dataset of Crawled Public Profiles 158

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.
This discussion has been archived. No new comments can be posted.

Facebook Kills Dataset of Crawled Public Profiles

Comments Filter:
  • Very interesting (Score:3, Informative)

    by Bearhouse ( 1034238 ) on Wednesday March 31, 2010 @11:40AM (#31688552)

    I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
    Anyways, just check what he did with it; very interesting: (FTA)
    http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html [typepad.com]
    There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

  • by Tobor the Eighth Man ( 13061 ) on Wednesday March 31, 2010 @11:52AM (#31688706)

    Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.

  • by truthsearch ( 249536 ) on Wednesday March 31, 2010 @11:58AM (#31688772) Homepage Journal

    So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.

    Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.

  • by Rantastic ( 583764 ) on Wednesday March 31, 2010 @12:25PM (#31689148) Journal

    Finding something on the web does not give you the legal authority to publish and redistribute it.

    Nonsense.

    Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.

    Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

    If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.

  • Removing names isn't necessarily enough. The recent netflix case shows that [securityfocus.com]. I think it's interesting that nobody catches the broader implications of that discussion -namely that whether they're "anonymizing" data for purposes of providing it for research, or selling it for marketing... the ability to reverse engineer patterns to undo it remains a risk. -
  • by crashumbc ( 1221174 ) on Wednesday March 31, 2010 @12:48PM (#31689458)

    unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.

    A crawler may or may not by pass that...

  • Re:On what grounds? (Score:3, Informative)

    by cdrguru ( 88047 ) on Wednesday March 31, 2010 @01:59PM (#31690466) Homepage

    If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.

    The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.

    You are mostly correct that a lawyer would know these things and how they are done in court. Therefore, yes, almost always a lawyer is required, if for no other reason than to get through the proper procedural format of the court process. You want to do it yourself? You better spend some time learning how it is done, what is required to win and how to get there. Without that education, it is like taking someone that doesn't know computer programming and having them debug a program in an Assembler language.

    Don't have the time to learn all this stuff? Well, that is why we have lawyers.

  • by clone53421 ( 1310749 ) on Wednesday March 31, 2010 @02:46PM (#31691208) Journal

    They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

    Here. [74.125.95.132]

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...