Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Social Networks Your Rights Online

Facebook Kills Dataset of Crawled Public Profiles 158

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.
This discussion has been archived. No new comments can be posted.

Facebook Kills Dataset of Crawled Public Profiles

Comments Filter:
  • by John Hasler ( 414242 ) on Wednesday March 31, 2010 @11:24AM (#31688298) Homepage

    ...you'd be flaming them for invading your "privacy".

  • by Chirs ( 87576 ) on Wednesday March 31, 2010 @11:29AM (#31688398)

    I see very little problem with an automated scan that respects robots.txt.

    By not blocking automated access to the profiles, facebook is squarely at fault.

  • by Anonymous Coward on Wednesday March 31, 2010 @11:30AM (#31688404)

    If Facebook had released this information we would be flaming?

    They did and we still are.

    (yes)

  • by jeffb (2.718) ( 1189693 ) on Wednesday March 31, 2010 @11:32AM (#31688448)

    ...all the researchers who do everything in the open and with proper anonymization.

  • chilling effect (Score:5, Insightful)

    by Anonymous Coward on Wednesday March 31, 2010 @11:33AM (#31688464)
    Don't see Facebook going after Google, even though the data that they posses is ostensibly the same as Warden's. The primary diff that i see is that warden was offering analysis and results for free- not trying to monetize it. Maybe that's what made them mad.
  • by Altus ( 1034 ) on Wednesday March 31, 2010 @11:35AM (#31688486) Homepage

    why do you think they threatened him? they want to sell this data themselves.

  • by paeanblack ( 191171 ) on Wednesday March 31, 2010 @11:38AM (#31688532)

    Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

    Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

    If you don't like that reality, keep it in mind next time you vote.

  • by Registered Coward v2 ( 447531 ) on Wednesday March 31, 2010 @11:56AM (#31688748)

    Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

    Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

    If you don't like that reality, keep it in mind next time you vote.

    I'm not sure what he did was not legal; but the article is pretty clear he doesn't have the resources to fight it in a court and so decided to destroy it. Maybe someone with more money and time may someday decide to fight it and the legality of scrapping information will be clarified by a court.

    To me, the real question is how do TOS square with robot files? Given the generally accepted and followed practice of their use; does not forbidding crawling implicitly allow the data to be collected and used as the scrapper sees fit?

    If you view the data as facts; then they are not copyrightable and so aggregating them would be permissible; assuming the TOS is not binding if a scrapper follows the robots.txt instructions. If that is the case, I'd guess a lot more robots.txt files would prohibit scrapping.

    At any rate, I'd say the real world rules are not real clear here, other than the one that says "avoid picking a legal fight with someone who has a ton more money and lawyers than you."

    Personally, I'd be surprised if someone else already has the same data; but rather than publicize it the simply are using it however they see fit.

  • by Anonymous Coward on Wednesday March 31, 2010 @11:57AM (#31688752)

    Or we could do what America did. Violent revolution and genocidal extermination of the existing inhabitants of the lands we wish to own. That works better than voting and is a very, very American thing to do.
     
    Rule of law? You are fucking joking right?

  • by Anonymous Coward on Wednesday March 31, 2010 @12:01PM (#31688800)

    They're not wrong though. People on FB constantly get outraged at new policies, interfaces and features, but I don't know of anyone who has actually left the site. I am just as bad myself; all I've done is remove everything from my profile and just use it as a hub to stay in contact with people all around me, I haven't gone as far as stopping using the site, and I don't think I will. Nor will many people.

  • Yes, but you can collect data and publish it as such. Scientific data, not data in the computer sense.
    He should of kept his mouth shut, compiled the data , and then just submitted it to a number of journal. At that point Facebook needs to go after the journals. Facebook would have a tough time winning. and even if they did when, going after the journals would be bad PR. SO no real win there. There bet bet would be to actually help him after the fact and look at the data to ensure that an "individuals privacy has not been violated"

    The data on social networking sites is amazing and could teach us a lot about human nature.

  • by CoffeeDog ( 1774202 ) on Wednesday March 31, 2010 @12:06PM (#31688880)
    Just because something is publicly available doesn't mean just anyone is free to reproduce and distribute it. In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them. By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.
  • by flabordec ( 984984 ) on Wednesday March 31, 2010 @12:07PM (#31688892) Homepage

    Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

    I'm afraid the average Facebook user is a teen who is more worried with getting a higher score in whatever Flash game she is currently playing than in FB's access policies for computers.

  • by Anonymous Coward on Wednesday March 31, 2010 @12:21PM (#31689080)

    Finding something on the web does not give you the legal authority to publish and redistribute it.

    Why not? Copyright?

    Copyright law (at least in the US) does not cover data.

    Which is probably why Facebook said it was a "contract" violation.

  • by dubbreak ( 623656 ) on Wednesday March 31, 2010 @12:24PM (#31689138)

    Not really a meaningful distinction, as contract law is very much an aspect of the law.

    If he was using an account I could see there being a contract enforceable (e.g. if you except these terms of service we will give you an account). If he was just crawling publicly viewable facebook pages, then what is the consideration? I'd argue there is none and therefor no contract exists. You aren't forced to login to view many pages and it's not like they even have a click through "I agree" TOS on each publicly viewable page. He broke no laws and there is no enforceable contract.

    If facebook doesn't want people crawling pages publicly viewable pages then make them private (loging in required) or at least have a robots.txt that prohibits crawling of those pages.

  • by Anonymous Coward on Wednesday March 31, 2010 @12:55PM (#31689558)

    But if robots.txt disallowed crawling, then Facebook would be able to show that their intent was to not allow this type of data access.

  • by NeutronCowboy ( 896098 ) on Wednesday March 31, 2010 @12:57PM (#31689586)

    Someone ought to mod this up. Facebook's only value is in the information you provide to Facebook about who you are, where you live and who your connections are. As a result, they will defend that little nugget as if their life depended on it - because it does.

"Only the hypocrite is really rotten to the core." -- Hannah Arendt.

Working...