Slashdot is powered by your submissions, so send in your scoop

Facebook Kills Dataset of Crawled Public Profiles 158

Posted by CmdrTaco on Wednesday March 31, 2010 @11:19AM from the creepy-crawlies dept.

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.

This discussion has been archived. No new comments can be posted.

Facebook Kills Dataset of Crawled Public Profiles

Load All Comments

Search 158 Comments Log In/Create an Account

Comments Filter:

For an Interesting Exercise in Head Asplosion (Score:5, Interesting)

by eldavojohn ( 898314 ) * writes: <eldavojohn&gmail,com> on Wednesday March 31, 2010 @11:19AM (#31688234) Journal

Fearing costs, Warden has now destroyed his dataset.
Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

Then Facebook could ask the EFF to protect their user's privacy and information being sold to marketers and corporations (sorry, when you're introduced as "Internet entrepreneur" that means there's profit to be had).

Share
twitter facebook
- Re:For an Interesting Exercise in Head Asplosion (Score:5, Insightful)
  
  by paeanblack ( 191171 ) writes: on Wednesday March 31, 2010 @11:38AM (#31688532)
  
  Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?
  Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.
  If you don't like that reality, keep it in mind next time you vote.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Redundant)
    
    by truthsearch ( 249536 ) writes:
    
    Except Facebook is claiming he violated its terms of service (a contract), not the law.
    - Re:For an Interesting Exercise in Head Asplosion (Score:4, Informative)
      
      by Tobor the Eighth Man ( 13061 ) writes: on Wednesday March 31, 2010 @11:52AM (#31688706)
      
      Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.
      
      Parent Share
      twitter facebook
      - Re:For an Interesting Exercise in Head Asplosion (Score:5, Insightful)
        
        by dubbreak ( 623656 ) writes: on Wednesday March 31, 2010 @12:24PM (#31689138)
        
        Not really a meaningful distinction, as contract law is very much an aspect of the law.
        If he was using an account I could see there being a contract enforceable (e.g. if you except these terms of service we will give you an account). If he was just crawling publicly viewable facebook pages, then what is the consideration? I'd argue there is none and therefor no contract exists. You aren't forced to login to view many pages and it's not like they even have a click through "I agree" TOS on each publicly viewable page. He broke no laws and there is no enforceable contract.
        
        If facebook doesn't want people crawling pages publicly viewable pages then make them private (loging in required) or at least have a robots.txt that prohibits crawling of those pages.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by Svartalf ( 2997 ) writes:
        
        robots.txt requires that a crawling app HONOR said file.
        
        Re: (Score:2)
        
        by tibman ( 623933 ) writes:
        
        robots.txt just gives you advice, nobody is required to follow it. http://en.wikipedia.org/wiki/Robots.txt#Disadvantages [wikipedia.org]
        That is my understanding of the thing anyways.. maybe when it becomes a real standard they can do more with it?
        
        Re: (Score:2)
        
        by dubbreak ( 623656 ) writes:
        
        The point I was trying to make about robots.txt was't that it would make it illegal (I really doubt it does), it's that there wasn't one to even suggest that the pages shouldn't be crawled. There was nothing to prevent access and nothing to suggest one shouldn't.
        
        I don't know about you, but if I don't want people looking at my house I'll grow a huge hedge so people have to trespass to see it. If I don't want people to access my webpages I'll make them private so they have to hack or have an account to see
        
        Re: (Score:2)
        
        by shentino ( 1139071 ) writes:
        
        A robots.txt file is the internet equivalent of a "No Trespassing" sign.
        Access controls are more like locked gates.
        They both make it illegal to enter without permission, but only the second one actually prevents. it.
        
        Re: (Score:2)
        
        by tibman ( 623933 ) writes:
        
        ah ok, gotcha. My mistake. Yes i agree with you, it's too bad the theat of legal action is so scary. I don't blame the guy.
        
        Re: (Score:2)
        
        by gknoy ( 899301 ) writes:
        
        robots.txt requires that a crawling app HONOR said file.
        I believe you mean "suggests" or "requests", rather than "requires". Well behaving robots do obey robots.txt, but even so, if a URL is accessible, it should be either secured or considered publically available.
        Unfortunately, I believe some courts likely have considered modifying URLS to be "hacking", in that it's "unauthorized access" -- simply because the server owners *thought* it was inaccessible, rather than actually protected. I hope such lunacy
        
        Re: (Score:2)
        
        by sgbett ( 739519 ) writes:
        
        This is an interesting point, I think it is fair to say that modifying of a URL is an intentional act and not an accident?
        I'm thinking of a scenario whereby a user modifies a URL say changing a userid to get access to another persons information. Of course the site should prevent this, but if it doesn't can it be said that no liability lies with the user doing the URL modifying?
        Not trolling, interested in viewpoints. Sane ones preferably!
        
        Re: (Score:2)
        
        by AHuxley ( 892839 ) writes:
        
        robots.txt requires that a crawling app HONOR said file.
        Think of it more as a 'do not display to public' flag.
        
        Re: (Score:2)
        
        by dubbreak ( 623656 ) writes:
        
        I'm guessing he created an account at some point and by that agreed to the TOS.
        I assumed otherwise, but yes that could have been the case (I'm assuming neither of us RTFM).
        I still don't see it as a contract.
        That's the fun of law. If you can successfully argue that there was no consideration (something of value exchanged for something else of value), then yeah there was no contract. I think that is a more difficult argument than assuming he didn't create an account. If I were a researcher I would have just accessed public data for my research.
        They are free to cancel his account for violating the terms, of course.
        Yep, assuming he has one. I question what other damages could be assessed anyh
        
        Re: (Score:2)
        
        by Evil Grinn ( 223934 ) writes:
        
        What facebook lets you see without being logged in is extremely limited. It's very unlikely he could collect a useful amount of info without an account.
        
        Re: (Score:2)
        
        by Evil Grinn ( 223934 ) writes:
        
        Ok I RTFA and it does say he did it without logging in. In which case, the information he had was pretty limited and I'd be surprised if Facebook had any reason to fear competition with any scheme of their own to sell the data.
      - Re: (Score:2, Informative)
        
        by crashumbc ( 1221174 ) writes:
        
        unless something has changed, you have to "login" to see anything in Facebook. Even if a page is "public" you can't view it without logging in with your own account.
        A crawler may or may not by pass that...
        
        Re: (Score:2)
        
        by gorzek ( 647352 ) writes:
        
        Copyright is not absolute. Phone books, for instance, are not copyrighted because they are collections of facts--namely, addresses and phone numbers.
        
        Likewise, he could copy all sorts of factual information about the users on Facebook: their names, contact information, friends, etc. He could likely not get away with copying their photos, status updates, and so forth since those can constitute creative works and are thus copyrighted.
        
        Nevertheless, just because something is online doesn't mean it's automati
        
        Re: (Score:2)
        
        by AHuxley ( 892839 ) writes:
        
        Move to a part of the world where collecting "factual information about the users on Facebook" is not a problem.
        Create a crawler and let him do his research.
        It seems in the US a whole complex set of protections has grown up around the sort, indexing and selling of data.
        The idea that a set of lower cost computers can now do their 'revenue' stream might be upsetting.
        What will the crawler find? Spies, random law enforcement, fake users, complex astro turfing, long term federal task forces, honey pots?
        
        Re: (Score:2)
        
        by shentino ( 1139071 ) writes:
        
        http://xkcd.com/501 [xkcd.com]
        
        Re: (Score:2)
        
        by Intron ( 870560 ) writes:
        
        But you don't give them the right to COPY it. He has the right to view it sure. But to make copies? Nope. The stuff is copyrighted. So what he did was not legal.
        From Facebook's terms:
        "You own all of the content and information you post on Facebook, and you can control how it is shared through your privacy and application settings."
        So how can FB sue the guy for copyright? They have no standing.
      - Re: (Score:2)
        
        by elnyka ( 803306 ) writes:
        
        How does the sticking power of TOS test out in court? Do facebook's TOS actually mean anything, if all you need to do to access their site is to type in a URL? I mean there isn't even a clickthrough to have them pretend like they care. Yes, I seriously would like to know.
        Those are excellent questions that need to be resolved either amicably or in a court of law (which is what was going to happen.) The later is expensive. Unless you have some powerful backer$s you can't do it alone (though it begs the question why he didn't contact the EFF in the first place.)
        
        Re: (Score:2)
        
        by shentino ( 1139071 ) writes:
        
        In practice if you're a corporation fighting a user, you always win unless that user has another corporation backing him up (such as the EFF).
        Having the ability to drag someone out in court is powerful incentive for your opponent to fold.
        
        Maybe he did ... (Score:2)
        
        by dougmc ( 70836 ) writes:
        
        It's entirely possible that he did contact the EFF.
        But the EFF can't fight every battle -- they go after the land-breaking ones, the ones that will have the highest benefit/cost ratio. It's not clear that this is such a battle.
        
        Re: (Score:2)
        
        by shentino ( 1139071 ) writes:
        
        If we had a loser pays system the EFF would be able to fight a lot more battles without running its treasury down with non refundable legal expenses.
    - Re: (Score:3, Funny)
      
      by K. S. Kyosuke ( 729550 ) writes:
      
      Except Facebook is claiming he violated its terms of service (a contract), not the law.
      To me, this claim seems to be as legitimate as a public library claiming that I read too many books and threatening to sue me.
    - Re: (Score:2)
      
      by severoon ( 536737 ) writes:
      
      So Facebook is claiming that their ToS is binding for anyone that can view the publicly available profile information they're posting on the web for all to see? (Basically, anyone with a web connection?)
      I shall now claim everyone in existence is bound by my ToS, then, b/c I too have a web page on the intarpip3s. And you are in violation of it, sir/madam. Pay!
  - Re: (Score:3, Insightful)
    
    by Registered Coward v2 ( 447531 ) writes:
    
    Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?
    Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.
    If you don't like that reality, keep it in mind next time you vote.
    I'm not sure what he did was not legal; but the article is pretty clear he doesn't have the resources to fight it in a court and so decided to destroy it. Maybe someone with more money and time may someday decide to fight it and the legality of scrapping information will be clarified by a court.
    To me, the real question is how do TOS square with robot files? Given the generally accepted and followed practice of their use; does not forbidding crawling implicitly allow the data to be collected and used as th
  - Re:For an Interesting Exercise in Head Asplosion (Score:5, Insightful)
    
    by geekoid ( 135745 ) writes: <dadinportland AT yahoo DOT com> on Wednesday March 31, 2010 @12:02PM (#31688820) Homepage Journal
    
    Yes, but you can collect data and publish it as such. Scientific data, not data in the computer sense.
    He should of kept his mouth shut, compiled the data , and then just submitted it to a number of journal. At that point Facebook needs to go after the journals. Facebook would have a tough time winning. and even if they did when, going after the journals would be bad PR. SO no real win there. There bet bet would be to actually help him after the fact and look at the data to ensure that an "individuals privacy has not been violated"
    The data on social networking sites is amazing and could teach us a lot about human nature.
    
    Parent Share
    twitter facebook
  - Re: (Score:3, Informative)
    
    by Rantastic ( 583764 ) writes:
    
    Finding something on the web does not give you the legal authority to publish and redistribute it.
    Nonsense.
    Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.
    Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is
    If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.
    - Re: (Score:3, Informative)
      
      by clone53421 ( 1310749 ) writes:
      
      They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is
      Here. [74.125.95.132]
  - Re: (Score:3, Interesting)
    
    by The Moof ( 859402 ) writes:
    
    but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...
    The summary says the crawler simply indexed public information. Why is this relevant? Well, recently, I noticed that Facebook Apps, all of which I have all disabled and blocked via my privacy settings, have started accessing my information again. Naturally, I assumed something got reset and started hunting for the settings again. Until I found this new block of text in all of their privacy settings:
    When you visit a Facebook-enhanced application or website, it may access any information you have made visible to Everyone Edit Profile Privacy as well as your publicly available information. This includes your Name, Profile Picture, Gender, Current City, Networks, Friend List, and Pages. The application will request your permission to access any additional information it needs.
    So they claim they can't stop people from acquiring and using my 'publicly available' information, because
    - Re: (Score:2)
      
      by tsm_sf ( 545316 ) writes:
      
      It's not two-faced at all. One group is providing Facebook with some form of compensation, and the other is not.
      Since money is more important to Americans than a crying eagle with 'Liberty' down one wing and 'Freedom' down the other, this shouldn't come as a gigantic shock.
    - Re: (Score:2)
      
      by NotBornYesterday ( 1093817 ) writes:
      
      They can't do that to our users. Only we can do that to our users [youtube.com].
  - Re: (Score:2)
    
    by blahplusplus ( 757119 ) writes:
    
    "he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law."
    Which are bought and sold by lobbyists. The law is such a joke because it always kowtow's in some way or another to private interests.
  - Re: (Score:2)
    
    by CAIMLAS ( 41445 ) writes:
    
    Finding something on the web does not give you the legal authority to publish and redistribute it
    At the same time, if he never agreed to the EULA (and they did not require him to do so in order to read the content) then he's probably over-reacting in deleting the data. What laws might he be breaking, here? I'm not aware of any - though he was certainly setting himself up for wanton litigation on account of the bad publicity.
    This isn't wanton publishing of said data. It's a 'derivative work'. Think: someone canvasing an area for who has which kinds of grass is seeded in peoples' yards (and how well its
  - No database copyright (Score:2)
    
    by Animats ( 122034 ) writes:
    
    Finding something on the web does not give you the legal authority to publish and redistribute it.
    The US doesn't have "database copyright". The US has Feist vs. Rural Telephone, which says that "facts" can't be copyrighted. It's legal to scan in a phone book and load the address info into a database. You just can't reproduce the page layout; that's covered by copyright. That decision created the third-party phone book industry and began the era of widespread data mining.
    The EULA issue is harder. I
- Re: (Score:2)
  
  by mwvdlee ( 775178 ) writes:
  
  Assuming all those profiles were indeed publicly available without having to log in to facebook, how could he have ever violated terms of service if he never agreed to any terms of service?
  Am I to assume that anybody that has the misfortune to view a facebook profile without being a facebook member is automagically in violation of facebook's terms of service?
If Facebook had done this... (Score:5, Insightful)

by John Hasler ( 414242 ) writes: on Wednesday March 31, 2010 @11:24AM (#31688298) Homepage

...you'd be flaming them for invading your "privacy".

Share
twitter facebook
- Facebook *did* do this (Score:5, Insightful)
  
  by Chirs ( 87576 ) writes: on Wednesday March 31, 2010 @11:29AM (#31688398)
  
  I see very little problem with an automated scan that respects robots.txt.
  By not blocking automated access to the profiles, facebook is squarely at fault.
  
  Parent Share
  twitter facebook
  - Robots.txt is insufficient. (Score:5, Interesting)
    
    by way2trivial ( 601132 ) writes: on Wednesday March 31, 2010 @11:46AM (#31688632) Homepage Journal
    
    I'm sorry- it is..
    robots.txt allows you to "refuse a specific named bot" or "refuse everyone" or "allow everything" or "allow these directories" or "only allow these directories"
    (want a fascinating read? try robots.txt at your favorite government site- whitehouse.gov used to be fascinating stuff)
    there is no way in robots.txt to permit crawling based on intent of information use like a CC license does
    I can- with photographs, have a creative commons license that sez "use it for anyhting" "use it with credit to me" "free for non-commercial" etc.
    I would WANT google to see my site, I would want bing to see my site- for the purposes of indexing in a search engine.
    I can't say in robots.txt
    "come in and index for search engines and relevance- but you may not use the data to collect information on our membership for marketing to or marketing their info to others"
    If I build a website all about-- coffee- I want the information available to the general public,but from/on my site....
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Informative)
      
      by truthsearch ( 249536 ) writes:
      
      So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.
      Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.
      - You are missing my point (Score:4, Interesting)
        
        by way2trivial ( 601132 ) writes: on Wednesday March 31, 2010 @12:11PM (#31688958) Homepage Journal
        
        and I really think it is worth making.
        Copyright protections are important, the snippet of text that google uses to let people know my site is relevant is easily fair use
        I don't have a problem with it- I welcome it as it's beneficial for both myself and google for it to be there.
        the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.
        and if robots.txt had a 'license' mode, I'd have a much stronger case of protections if I chose to pursue a blatant copying and re-publication of my site.
        robots.txt labels that I wish there were include
        'allow function:indexing'
        'disallow function:total and complete reproduction'
        'disallow function: total and complete reproduction for XXX days'
        (so I can allow wayback machine and equivalents'
        'disallow function: aggregate data collection'
        'disallow function: user data collection'
        'disallow function: email collection'
        looking at amazon, http://www.amazon.com/robots.txt [amazon.com]
        they somewhat do this by putting the information they don't want into the wild in it's own directories
        then disallowing those directories- actually, now that I look at it- it's a neat way to go..
        but I'd still prefer a robots.txt option that different 'intended use of data to be crawled' permissions covered
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by thePowerOfGrayskull ( 905905 ) writes:
        
        the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.
        You mean like google cache? I actually agree with you overall -- it's my data, not yours. You may not publicly exhibit copies of it for your own benefit. It's just that it's a difficult line to draw, in large part because of omnibus monetizing service providers like Google.
        
        Re: (Score:2)
        
        by Hatta ( 162192 ) writes:
        
        Copyright protections are important
        Copyright is irrelevant here. Facts are not copyrightable. This data from Facebook is no different than the collection of data in the phone book. Republishing a page from Facebook or the phone book is illegal. Republishing facts sourced from those pages is not.
    - Re: (Score:2)
      
      by Ksevio ( 865461 ) writes:
      
      User-agent: * Crawl-delay: 10 Sitemap: http://www.whitehouse.gov/feed/media/video-audio
      
      That's not such an interesting read these days it seems.
  - Re: (Score:2)
    
    by Inda ( 580031 ) writes:
    
    Back in the late nineties I wouldn't have thought twice about downloading a whole site. It wasn't unusual. I had a program for doing it, although I believe the popular browser of the day had a feature that saved a good potion.
    
    My, how things have changed. GOML.
- Re: (Score:1, Insightful)
  
  by Anonymous Coward writes:
  
  If Facebook had released this information we would be flaming?
  They did and we still are.
  (yes)
- Re:If Facebook had done this... (Score:5, Interesting)
  
  by 2obvious4u ( 871996 ) writes: on Wednesday March 31, 2010 @11:31AM (#31688434)
  
  Isn't this the golden egg of Facebook, I though this is what they were selling. That data is fascinating, it is completely anonymous, yet at the same time very insightful for marketing purposes. I think Facebook is just upset because they plan on selling the same data that Pete was.
  
  Parent Share
  twitter facebook
  - Re:If Facebook had done this... (Score:5, Interesting)
    
    by NeutronCowboy ( 896098 ) writes: on Wednesday March 31, 2010 @12:49PM (#31689466)
    
    Most likely. Facebook's gold mine isn't even so much the user information itself - it's the networks that they can build out of the relationship data. As of right now, they haven't figured out a way how to make money from it, but they certainly aren't going to let someone take the most valuable aspect of their system - the network information - and put it out in the open.
    Personally, I hope someone does the same work, but uploads the raw data anonymously to a torrent somewhere.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by Late Adopter ( 1492849 ) writes:
    
    Except Pete can't actually sell the data, that would be a derivative work of their copyrighted web-pages. Sure he has the fair-use ability to publish academic studies, but he'd be limited to using the data internally.
- Re:If Facebook had done this... (Score:5, Insightful)
  
  by Altus ( 1034 ) writes: on Wednesday March 31, 2010 @11:35AM (#31688486) Homepage
  
  why do you think they threatened him? they want to sell this data themselves.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by moteyalpha ( 1228680 ) writes:
  
  It seems that many of these data sets are public and easily accessible to analysis. I would find it interesting to simply use various forums like slashdot and have a ranking of who had the most insightful comments by user name. Certainly the data is available as people make it so. It seems that there is a schizophrenic aspect to this, people want to be recognized for what they represent and when they become too famous they get nervous about it.
  I am sure that much of this data is already available in an org
Yes, by all means, let's stamp out... (Score:4, Insightful)

by jeffb (2.718) ( 1189693 ) writes: on Wednesday March 31, 2010 @11:32AM (#31688448)

...all the researchers who do everything in the open and with proper anonymization.

Share
twitter facebook
- - Re: (Score:2)
    
    by geekoid ( 135745 ) writes:
    
    It is, and it's done all the time in the scientific community.
    I don't see why would would think removing peoples names isn't possible.
    - Re: (Score:2, Interesting)
      
      by Anonymous Coward writes:
      
      Even with names removed, data like this can often be traced back to the person. Your name isn't the only unique thing that appears in your facebook profile.
      As an example, how many others share your permutation of friends and fan pages?
    - Re: (Score:3, Informative)
      
      by thePowerOfGrayskull ( 905905 ) writes:
      
      Removing names isn't necessarily enough. The recent netflix case shows that [securityfocus.com]. I think it's interesting that nobody catches the broader implications of that discussion -namely that whether they're "anonymizing" data for purposes of providing it for research, or selling it for marketing... the ability to reverse engineer patterns to undo it remains a risk. -
      - Re: (Score:2)
        
        by thePowerOfGrayskull ( 905905 ) writes:
        
        I pretty much agree with you. In fact, I think we're making much the same point. The level of anonymization that is performed -- either by "white hat" hackers, or by the companies who own rights to the data -- isn't sufficient. Removing PII is, as you say, not proper anonymization; and yet that seems to be all that was done with the FB data.
        The common theme in the replies here on slashdot is that the data was "anonymized" so surely there is no harm in allowing the researcher to keep and/or disseminate
  - Re: (Score:2)
    
    by Rob the Bold ( 788862 ) writes:
    
    You assume such anonymization is actually possible, I somehow doubt it.
    If it can only be done clandestinely, then you should definately doubt it. On the other hand, if it's done above the covers with the lights on, you could evaluate the anonymity of the data yourself. Of course, it would be too late then, you might want to review the anonymizing scheme before the data is posted. Again, an above board researcher is more likely to submit his scheme for peer review or, perhaps even better, use a known good system. If you're really certain that such a claim is bogus, and you
Publicly available (Score:5, Interesting)

by mdsharpe ( 1051460 ) writes: on Wednesday March 31, 2010 @11:32AM (#31688452)

Since this is publicly available information, and all he did was send a program to go grab it (much akin to asking your web browser to download it), does this mean Facebook has essentially threatened him for no more than reading too much of Facebook too quickly? Sounds absurd to me.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by CoffeeDog ( 1774202 ) writes:
  
  Just because something is publicly available doesn't mean just anyone is free to reproduce and distribute it. In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them. By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.
  - Re: (Score:2)
    
    by Trepidity ( 597 ) writes:
    
    You can't copyright facts though, so it's not clear they would own the dataset, depending on how it were created. For example, while Facebook owns the actual literal webpages on facebook.com, it's questionable whether they own the friend graph, which is simply a fact about how people choose to associate themselves.
  - Re: (Score:2)
    
    by cdrguru ( 88047 ) writes:
    
    By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.
    Legal, maybe not. But it happens every day over the entire planet. And there doesn't seem to be any reasonable way to stop it, so it is going to continue forever.
    Redistribution is the key to the new digital un-economy.
  - Re: (Score:2)
    
    by Dhalka226 ( 559740 ) writes:
    
    In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them.
    If Facebook needs to write into their TOS an implied permission to distribute users' data, it says to me that the owners of such data are the users themselves. That being the case, Facebook wouldn't have any standing to make demands about what is done with that data by third parties; that would be the individual users' problem insofar as any of the data might be subject to copyright at all. (Most of i
  - Re: (Score:2)
    
    by Phrogman ( 80473 ) writes:
    
    This is more like going into a public library and writing down a list of all the books they have by title, ISBN, placement on the shelves, publisher etc, and then relating that information to show connections between the books. Its all publicly available information and anyone can walk in and look at it, write it down etc.
    The difference here is that Facebook is providing its services free to the public so that THEY can go grab all this information and turn it into a dataset they can sell to corporations tha
- Re: (Score:2)
  
  by NeutronCowboy ( 896098 ) writes:
  
  Not really. It means that Facebook needs to have some data publicly available for users to browse, but that it can't let people take that data out of the Facebook realm. In other words, Facebook knows exactly what it is doing, and is acting in both cases in its best interest.
  Now, does that mean that Facebook's approach makes sense, and would stand up in court? I doubt it, but I don't have the cash to test that theory. Which in turn means that the outcome was just as predictable: Facebook makes up random rul
- Re: (Score:2)
  
  by prostoalex ( 308614 ) * writes:
  
  Disclaimer: I work for the company mentioned in the article, not in legal role though.
  Privacy is dynamic and "publicly available information" is not set in stone - user could've chosen to hide specific bits of that information a few minutes later, and there doesn't seem to be any update protocol to remove those bits from the scraped DB.
chilling effect (Score:5, Insightful)

by Anonymous Coward writes: on Wednesday March 31, 2010 @11:33AM (#31688464)

Don't see Facebook going after Google, even though the data that they posses is ostensibly the same as Warden's. The primary diff that i see is that warden was offering analysis and results for free- not trying to monetize it. Maybe that's what made them mad.

Share
twitter facebook
gray-market black-market (Score:2)

by h00manist ( 800926 ) writes:

All data that exists, and someone can sell somehow, is for sale somewhere, somehow. That's the law of money, which is rather strong. So forget the right to privacy law, it's not working for a long time now, there is no way to enforce it, just like the law prohibiting drugs, it just doesn't work. I don't know the solution, or if it's good or bad, but that's the situation, like it or not. Wikileaks, for example, is a result of this.
Facebook is evil (Score:1)

by trurl7 ( 663880 ) writes:

Besides the obvious (wasting time, too much info being shared with future employers), their privacy and data policies have gotten worse and worse. Once you sign up with them, they own everything you do. Or at least so they believe. From his writing, this researches was quite open and tried to be as forthcoming as possible. If they had concerns over anonymity, I suspect he would have been happy to discuss the exact data-scrubbing procedure to make sure it's on the level. But instead, these turds reach f
So (Score:2)

by fulldecent ( 598482 ) writes:

(not that it was actually destroyed), but why destroy the dataset? Just post to slashdot, wait for someone to send you a link to chilling effects or eff, then follow up with chilling effects or eff, then release the dataset.
Very interesting (Score:3, Informative)

by Bearhouse ( 1034238 ) writes: on Wednesday March 31, 2010 @11:40AM (#31688552)

I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
Anyways, just check what he did with it; very interesting: (FTA)
http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html [typepad.com]
There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by Bearhouse ( 1034238 ) writes:
  
  ahem, put 'to', of course...
- Re: (Score:2)
  
  by dangitman ( 862676 ) writes:
  
  There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome
  By "NIH syndrome," I assume you're referring to "Not Invented Here." I don't really see what that has to do with this case.
  - Re: (Score:2)
    
    by Bearhouse ( 1034238 ) writes:
    
    Correct on NIH.
    Well, if they were smart, Facebook would already be marketing this data, and/or services based on it, to their users and others.
    One could imagine all kinds of apps; "hey, 20% of your friends are in town 'x', why not go there for a weekend"
    The links to business could be huge, too...
    "Hey, here's a hotel you could stay in..."
    If they proposed those kinds of things, instead of asinine games, then maybe I'd be prepared to take them more seriously, (and not have a problem with their using my 'public
    - Re: (Score:2)
      
      by dangitman ( 862676 ) writes:
      
      The thing is, that I just don't understand why you would use "NIH Syndrome" in this context. That is usually used when somebody in Company X says "Hey, why don't we use this awesome technology to make a better product," but is rebuffed by Company X because the technology was invented by company Y.
      In this example, there is no new technology involved, and Facebook already has the data. What is "not being invented here"? Facebook already invented Facebook, how is Facebook using the data they generated inventin
Facebook does stuff like this a lot (Score:5, Interesting)

by TheSpoom ( 715771 ) writes: <slashdot@u[ ]m00.net ['ber' in gap]> on Wednesday March 31, 2010 @11:41AM (#31688564) Homepage Journal

They did something similar to FB Purity [fbpurity.com], a Greasemonkey script that allows users to filter out apps and other stuff they don't want to see in their feed. Facebook argued that they were misusing their "FB" trademark... eventually they let them continue under the name "fluff busting purity", probably due to the PR backlash that shutting them down would bring.
They've also shut down the Facebook portion of the Web 2.0 Suicide Machine [suicidemachine.org], which runs scripts that allow a user to delete their social profiles as thoroughly as sites will allow. In that case, they argued that the Suicide Machine was violating their "Statement of Rights and Responsibilities"... which isn't even a law! Nonetheless, the Suicide Machine didn't have the financial ability to fight even frivolous claims like that, so they folded that section.
Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

Share
twitter facebook
- Re:Facebook does stuff like this a lot (Score:5, Insightful)
  
  by Anonymous Coward writes: on Wednesday March 31, 2010 @12:01PM (#31688800)
  
  They're not wrong though. People on FB constantly get outraged at new policies, interfaces and features, but I don't know of anyone who has actually left the site. I am just as bad myself; all I've done is remove everything from my profile and just use it as a hub to stay in contact with people all around me, I haven't gone as far as stopping using the site, and I don't think I will. Nor will many people.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ZekoMal ( 1404259 ) writes:
    
    I left the site. Well, I tried to. At first, they told me that I could only "suspend" the account; ie, people could still send me stuff and FB kept ALL of my data. Outraged, I tried to find an alternative.
    Surprise, surprise. After digging through their FAQ I found an obscure part of it that said you could permanently delete. Here's the problem with it. After you agree to permanently delete, it stays up for two weeks. If you log in even once, it undoes the delete option. Furthermore, there is no guarantee an
  - Re: (Score:2)
    
    by CAIMLAS ( 41445 ) writes:
    
    It's probably something to do with the fact that: eh, you can:
    1) leave the site and have them keep all the data, while at the same time not be able to view your friends' profiles again
    2) stay
- Re: (Score:2, Insightful)
  
  by flabordec ( 984984 ) writes:
  
  Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.
  I'm afraid the average Facebook user is a teen who is more worried with getting a higher score in whatever Flash game she is currently playing than in FB's access policies for computers.
  - Re: (Score:2)
    
    by ZekoMal ( 1404259 ) writes:
    
    This. I tried to convince three friends to quit FB, and they were vehemently against it.
    Three different reasons given:
    1. I have nothing to hide, so why not share everything with everyone?
    2. My privacy settings are on, so it's okay.
    3. I don't care, I want to keep in touch with my friends that live in the same dorm that I also text obsessively and eat every meal with.
    My generation is as anti-privacy as they are anti-copyright; they hate the establishment but love giving said establishment all of their data.
Don't worry... (Score:3, Interesting)

by turbotroll ( 1378271 ) writes: on Wednesday March 31, 2010 @12:06PM (#31688876)

Somebody else will do it again, this time anonymously and with an evil robot that hides its tracks. It only takes perl, LWP, MySQL, tor and a little time and imagination to do so.
Fuck you, Zuckerberg.

Share
twitter facebook
- - Re: (Score:2)
    
    by Rob the Bold ( 788862 ) writes:
    
    That's an awfully specific set of tools. Don't you think you could have gotten your point across without resorting to dropping names of your pet tools?
    Sure man. He gets a nickel every time someone uses MySQL.
Haha! (Score:2)

by comm2k ( 961394 ) writes:

The most boring of the clusters, the area around Seattle is disappointingly average.
This is data, not protected by copyright (Score:2)

by digitalgimpus ( 468277 ) writes:

I'm not sure copyright law even applies here. No more than it applies to say Google or Yahoo. He scraped DATA from a publicly accessible website as permitted by the robots.txt file. How is this really any different than what Google or Yahoo does? Perhaps the distribution? Though that's hardly significant in this case as the data is already out there. He just organized the presentation. Sounds to me like Facebook just pushing buttons to try and avoid another privacy controversy. /IANAL //Don't use fac
Statement of Rights and Responsibilities, sec. 3-2 (Score:3, Interesting)

by clone53421 ( 1310749 ) writes: on Wednesday March 31, 2010 @02:35PM (#31691046) Journal

You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.
An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

Share
twitter facebook
- Re: (Score:2)
  
  by Lithdren ( 605362 ) writes:
  
  Putting a line somewhere on your website doesn't mean it applies to everyone who visits your website.
  *Reading this comment intitles the writer of this comment, to compensation of no less then 100,000 USD per reading
  I'll assume the check is in the mail, by your logic.
  - Re: (Score:2)
    
    by clone53421 ( 1310749 ) writes:
    
    You are correct. Simply reading it does not mean that.
    If you plan on caching and reusing the data, however, it does mean that you should check for applicable terms and copyrights.
    If I see a nice picture gallery on a website, I’m welcome to click through and admire the pictures. But if I want to save them and publish them elsewhere, I’d better check the bottom of the page and/or the TOS page for any copyright notices. It’s no different.
    - - Re: (Score:2)
        
        by clone53421 ( 1310749 ) writes:
        
        First of all, even if there is not a copyright on pure information, there can still be a license on its use. You were given the information under the implicit license that you are a web browser and permitted to do what web browsers do: display the information for someone to read, download, etc. If you vastly expand on that functionality or do something altogether different with the information, you are no longer within the implicit license that was given to you when the server gave you the page. Unless perh
        
        Re: (Score:2)
        
        by clone53421 ( 1310749 ) writes:
        
        Answering a question in a creative amusing or entertaining manner is not a creative work. Its an answer to a question.
        Yes, it is. The fact is not creative, but the presentation is, and if you simply copy the presentation verbatim, you have violated the creative work.
        “Christian” may be a simple fact and not copyrightable, just like phone numbers and addresses are simple fact and you cannot copyright the phone book. However in most phone books some listings are larger and use graphics, colours, and/or borders to emphasize them; this layout is creative and can be copyrighted. Similarly the phrase that someone writ
        
        Re: (Score:2)
        
        by clone53421 ( 1310749 ) writes:
        
        My point was that what you are calling “raw data” was in fact the copyrightable presentation of raw data.
        If you scanned the pages of the phone book, digitally cropped out the listing for each number (including colours, fonts, graphics, and borders for any listings containing those), re-alphabetized and re-printed those exact duplicates of the listings at 200% for low-sighted individuals (supposing the actually arrangement on the page would be completely different, since you didn’t necessar
- Re: (Score:2)
  
  by Rob the Bold ( 788862 ) writes:
  
  You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.
  An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.
  But has the guy even signed up? We're not talking the Geneva Convention, here. Could facebook really impose its facebook Constitution on a non member? Sure I understand they'd want to. But wanting and having are two different things, he said, noting the absence of his army of Natalie Portman fembots.
  Do you suggest that this work falls in the realm of unauthorized access? Do you think facebook has specifically authorized Google? There are facebook pages in Google's cache. So does Yahoo! And bing, dogp
  - Re: (Score:2)
    
    by clone53421 ( 1310749 ) writes:
    
    I don’t think it falls under unauthorized access... I think it’s unauthorized use of the information.
    Yeah, it’s a much trickier question since a lot of spiders have implicit authorization to use the information. Googlebot will obviously spider it and index it for Google, and this is such a well-established fact — as is the way to prevent it from doing so by robots.txt — that not actively preventing Googlebot from accessing the page is probably pretty good justification for clai
- Re: (Score:2)
  
  by xenobyte ( 446878 ) writes:
  
  An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.
  No, but it's not a ban either.
  Common sense dictates that if data is publicly accessible and not accompanied by a specific usage limitation, you can mine the data and use it for scientific purposes as fair use. This guy did not charge for his results, nor for the compiled data, so it was textbook fair use.
  Remember, he did not use the collected data directly but only the relationships it inferred. That information is the product of the crawlers compilation, not the data itself, and only the data itself can be
Amusing in light of this story (Score:2)

by Lunix Nutcase ( 1092239 ) writes:

Has anyone else noticed this new banner at the top of Slashdot?
Become a fan of Slashdot on Facebook
It's funny that as much railing on Facebook that is done on Slashdot that Slashdot is advertising for people to become fans of them on Facebook.
It is publicly available (Score:2)

by thetoadwarrior ( 1268702 ) writes:

I fail to see how he did anything wrong. If FB doesn't like it then they can change how their site works.
- Re: (Score:3, Insightful)
  
  by NeutronCowboy ( 896098 ) writes:
  
  Someone ought to mod this up. Facebook's only value is in the information you provide to Facebook about who you are, where you live and who your connections are. As a result, they will defend that little nugget as if their life depended on it - because it does.
- - Re: (Score:3, Informative)
    
    by cdrguru ( 88047 ) writes:
    
    If your position in entering the above motion was that "I'm right, so I should win" and offered nothing else - such as expert witnesses of your own, you are going to war unarmed. Of course you are going to lose.
    The adversarial system is based on the idea that you have to defend your position. Ranting that "I'm right" doesn't count for much - presenting facts, witnesses, expert testimony, etc. is what counts. And doing so in the proper format for the court.
    You are mostly correct that a lawyer would know t
- Re: (Score:2)
  
  by Rob the Bold ( 788862 ) writes:
  
  Twilight was written by a Morman Author.
  Do you mean The Charch of Jesas Chrast of Lattar Day Saants?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

For an Interesting Exercise in Head Asplosion (Score:5, Interesting)

Re:For an Interesting Exercise in Head Asplosion (Score:5, Insightful)

Re: (Score:2, Redundant)

Re:For an Interesting Exercise in Head Asplosion (Score:4, Informative)

Re:For an Interesting Exercise in Head Asplosion (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Maybe he did ... (Score:2)

Re: (Score:2)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:For an Interesting Exercise in Head Asplosion (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

No database copyright (Score:2)

Re: (Score:2)

If Facebook had done this... (Score:5, Insightful)

Facebook *did* do this (Score:5, Insightful)

Robots.txt is insufficient. (Score:5, Interesting)

Re: (Score:3, Informative)

You are missing my point (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1, Insightful)

Re:If Facebook had done this... (Score:5, Interesting)

Re:If Facebook had done this... (Score:5, Interesting)

Re: (Score:2)

Re:If Facebook had done this... (Score:5, Insightful)

Re: (Score:2)

Yes, by all means, let's stamp out... (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Publicly available (Score:5, Interesting)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

chilling effect (Score:5, Insightful)

gray-market black-market (Score:2)

Facebook is evil (Score:1)

So (Score:2)

Very interesting (Score:3, Informative)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Facebook does stuff like this a lot (Score:5, Interesting)

Re:Facebook does stuff like this a lot (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Facebook did do this (Score:5, Insightful)