Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Social Networks Your Rights Online

Facebook Kills Dataset of Crawled Public Profiles 158

holy_calamity writes "Internet entrepreneur Pete Warden wrote a crawler that collated the public profiles of 210 million Facebook profiles and was set to release an anonymised version to researchers. The pages crawled can be read by any web user, and the robots.txt did not forbid crawling. However, Facebook claimed he had violated its terms of service and threatened legal action. Fearing costs, Warden has now destroyed his dataset. For a snapshot of the insights that data could have allowed, see Warden's post on how the friend networks of the 120 million US users in his data segregated into seven clusters." Of course, if he had it, this means anyone who wants it made their own version of this.
This discussion has been archived. No new comments can be posted.

Facebook Kills Dataset of Crawled Public Profiles

Comments Filter:
  • by eldavojohn ( 898314 ) * <eldavojohn@gm a i l . com> on Wednesday March 31, 2010 @10:19AM (#31688234) Journal

    Fearing costs, Warden has now destroyed his dataset.

    Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

    Then Facebook could ask the EFF to protect their user's privacy and information being sold to marketers and corporations (sorry, when you're introduced as "Internet entrepreneur" that means there's profit to be had).

    • by paeanblack ( 191171 ) on Wednesday March 31, 2010 @10:38AM (#31688532)

      Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

      Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

      If you don't like that reality, keep it in mind next time you vote.

      • Re: (Score:2, Redundant)

        by truthsearch ( 249536 )

        Except Facebook is claiming he violated its terms of service (a contract), not the law.

        • by Tobor the Eighth Man ( 13061 ) on Wednesday March 31, 2010 @10:52AM (#31688706)

          Not really a meaningful distinction, as contract law is very much an aspect of the law. We can bicker about whether terms of service are enforceable and to what extent, but the reality is that this guy has better things to do than wage a complex and almost certainly protracted legal battle against a corporation.

          • by dubbreak ( 623656 ) on Wednesday March 31, 2010 @11:24AM (#31689138)

            Not really a meaningful distinction, as contract law is very much an aspect of the law.

            If he was using an account I could see there being a contract enforceable (e.g. if you except these terms of service we will give you an account). If he was just crawling publicly viewable facebook pages, then what is the consideration? I'd argue there is none and therefor no contract exists. You aren't forced to login to view many pages and it's not like they even have a click through "I agree" TOS on each publicly viewable page. He broke no laws and there is no enforceable contract.

            If facebook doesn't want people crawling pages publicly viewable pages then make them private (loging in required) or at least have a robots.txt that prohibits crawling of those pages.

            • by Svartalf ( 2997 )

              robots.txt requires that a crawling app HONOR said file.

              • by tibman ( 623933 )

                robots.txt just gives you advice, nobody is required to follow it. http://en.wikipedia.org/wiki/Robots.txt#Disadvantages [wikipedia.org]

                That is my understanding of the thing anyways.. maybe when it becomes a real standard they can do more with it?

                • The point I was trying to make about robots.txt was't that it would make it illegal (I really doubt it does), it's that there wasn't one to even suggest that the pages shouldn't be crawled. There was nothing to prevent access and nothing to suggest one shouldn't.

                  I don't know about you, but if I don't want people looking at my house I'll grow a huge hedge so people have to trespass to see it. If I don't want people to access my webpages I'll make them private so they have to hack or have an account to see
                  • A robots.txt file is the internet equivalent of a "No Trespassing" sign.

                    Access controls are more like locked gates.

                    They both make it illegal to enter without permission, but only the second one actually prevents. it.

                  • by tibman ( 623933 )

                    ah ok, gotcha. My mistake. Yes i agree with you, it's too bad the theat of legal action is so scary. I don't blame the guy.

              • by gknoy ( 899301 )

                robots.txt requires that a crawling app HONOR said file.

                I believe you mean "suggests" or "requests", rather than "requires". Well behaving robots do obey robots.txt, but even so, if a URL is accessible, it should be either secured or considered publically available.

                Unfortunately, I believe some courts likely have considered modifying URLS to be "hacking", in that it's "unauthorized access" -- simply because the server owners *thought* it was inaccessible, rather than actually protected. I hope such lunacy

                • by sgbett ( 739519 )

                  This is an interesting point, I think it is fair to say that modifying of a URL is an intentional act and not an accident?

                  I'm thinking of a scenario whereby a user modifies a URL say changing a userid to get access to another persons information. Of course the site should prevent this, but if it doesn't can it be said that no liability lies with the user doing the URL modifying?

                  Not trolling, interested in viewpoints. Sane ones preferably!

              • by AHuxley ( 892839 )
                robots.txt requires that a crawling app HONOR said file.
                Think of it more as a 'do not display to public' flag.
        • Re: (Score:3, Funny)

          Except Facebook is claiming he violated its terms of service (a contract), not the law.

          To me, this claim seems to be as legitimate as a public library claiming that I read too many books and threatening to sue me.

        • So Facebook is claiming that their ToS is binding for anyone that can view the publicly available profile information they're posting on the web for all to see? (Basically, anyone with a web connection?)

          I shall now claim everyone in existence is bound by my ToS, then, b/c I too have a web page on the intarpip3s. And you are in violation of it, sir/madam. Pay!

      • Re: (Score:3, Insightful)

        Couldn't Warden have sent requests to the EFF to provide lawyers so he could fight an evil corporation to use freely publicly available information?

        Finding something on the web does not give you the legal authority to publish and redistribute it. Sure, he could have stuck the whole thing on a torrent somewhere, but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law.

        If you don't like that reality, keep it in mind next time you vote.

        I'm not sure what he did was not legal; but the article is pretty clear he doesn't have the resources to fight it in a court and so decided to destroy it. Maybe someone with more money and time may someday decide to fight it and the legality of scrapping information will be clarified by a court.

        To me, the real question is how do TOS square with robot files? Given the generally accepted and followed practice of their use; does not forbidding crawling implicitly allow the data to be collected and used as th

      • Yes, but you can collect data and publish it as such. Scientific data, not data in the computer sense.
        He should of kept his mouth shut, compiled the data , and then just submitted it to a number of journal. At that point Facebook needs to go after the journals. Facebook would have a tough time winning. and even if they did when, going after the journals would be bad PR. SO no real win there. There bet bet would be to actually help him after the fact and look at the data to ensure that an "individuals privacy has not been violated"

        The data on social networking sites is amazing and could teach us a lot about human nature.

      • Re: (Score:3, Informative)

        by Rantastic ( 583764 )

        Finding something on the web does not give you the legal authority to publish and redistribute it.

        Nonsense.

        Allow me to call your attention to Fair use, a doctrine in United States copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as for commentary, criticism, news reporting, research, teaching or scholarship.

        Of course, none of that is actually relevant as Facebook is not making a copyright claim. They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

        If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it.

        • Re: (Score:3, Informative)

          by clone53421 ( 1310749 )

          They are claiming he violated their terms of use. I just scanned it and the only seemingly relevant text I can find is

          Here. [74.125.95.132]

      • Re: (Score:3, Interesting)

        by The Moof ( 859402 )

        but if he actually wants to do real work and real research with these data, he's got to play by the rules of the real world...

        The summary says the crawler simply indexed public information. Why is this relevant? Well, recently, I noticed that Facebook Apps, all of which I have all disabled and blocked via my privacy settings, have started accessing my information again. Naturally, I assumed something got reset and started hunting for the settings again. Until I found this new block of text in all of their privacy settings:

        When you visit a Facebook-enhanced application or website, it may access any information you have made visible to Everyone Edit Profile Privacy as well as your publicly available information. This includes your Name, Profile Picture, Gender, Current City, Networks, Friend List, and Pages. The application will request your permission to access any additional information it needs.

        So they claim they can't stop people from acquiring and using my 'publicly available' information, because

      • "he's got to play by the rules of the real world...the one with the big blue ceiling and a concept called the rule of law."

        Which are bought and sold by lobbyists. The law is such a joke because it always kowtow's in some way or another to private interests.

      • by CAIMLAS ( 41445 )

        Finding something on the web does not give you the legal authority to publish and redistribute it

        At the same time, if he never agreed to the EULA (and they did not require him to do so in order to read the content) then he's probably over-reacting in deleting the data. What laws might he be breaking, here? I'm not aware of any - though he was certainly setting himself up for wanton litigation on account of the bad publicity.

        This isn't wanton publishing of said data. It's a 'derivative work'. Think: someone canvasing an area for who has which kinds of grass is seeded in peoples' yards (and how well its

      • Finding something on the web does not give you the legal authority to publish and redistribute it.

        The US doesn't have "database copyright". The US has Feist vs. Rural Telephone, which says that "facts" can't be copyrighted. It's legal to scan in a phone book and load the address info into a database. You just can't reproduce the page layout; that's covered by copyright. That decision created the third-party phone book industry and began the era of widespread data mining.

        The EULA issue is harder. I

    • by mwvdlee ( 775178 )

      Assuming all those profiles were indeed publicly available without having to log in to facebook, how could he have ever violated terms of service if he never agreed to any terms of service?

      Am I to assume that anybody that has the misfortune to view a facebook profile without being a facebook member is automagically in violation of facebook's terms of service?

  • by John Hasler ( 414242 ) on Wednesday March 31, 2010 @10:24AM (#31688298) Homepage

    ...you'd be flaming them for invading your "privacy".

    • by Chirs ( 87576 ) on Wednesday March 31, 2010 @10:29AM (#31688398)

      I see very little problem with an automated scan that respects robots.txt.

      By not blocking automated access to the profiles, facebook is squarely at fault.

      • by way2trivial ( 601132 ) on Wednesday March 31, 2010 @10:46AM (#31688632) Homepage Journal

        I'm sorry- it is..

        robots.txt allows you to "refuse a specific named bot" or "refuse everyone" or "allow everything" or "allow these directories" or "only allow these directories"
        (want a fascinating read? try robots.txt at your favorite government site- whitehouse.gov used to be fascinating stuff)
        there is no way in robots.txt to permit crawling based on intent of information use like a CC license does

        I can- with photographs, have a creative commons license that sez "use it for anyhting" "use it with credit to me" "free for non-commercial" etc.
        I would WANT google to see my site, I would want bing to see my site- for the purposes of indexing in a search engine.
        I can't say in robots.txt
        "come in and index for search engines and relevance- but you may not use the data to collect information on our membership for marketing to or marketing their info to others"

        If I build a website all about-- coffee- I want the information available to the general public,but from/on my site....

        • Re: (Score:3, Informative)

          by truthsearch ( 249536 )

          So you block all of your content from being indexed by Google? Because Google's also using your content for marketing.

          Also, robots.txt doesn't refuse anything to anyone. It's just a suggestion that any system can ignore. If you don't want systems "seeing" your content, then you must remove your content from the internet or put it behind a wall. A crawler is just another client like a web browser. The internet is intentionally built without discrimination.

          • by way2trivial ( 601132 ) on Wednesday March 31, 2010 @11:11AM (#31688958) Homepage Journal

            and I really think it is worth making.

            Copyright protections are important, the snippet of text that google uses to let people know my site is relevant is easily fair use
            I don't have a problem with it- I welcome it as it's beneficial for both myself and google for it to be there.

            the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.
            and if robots.txt had a 'license' mode, I'd have a much stronger case of protections if I chose to pursue a blatant copying and re-publication of my site.

            robots.txt labels that I wish there were include
            'allow function:indexing'
            'disallow function:total and complete reproduction'
            'disallow function: total and complete reproduction for XXX days'
            (so I can allow wayback machine and equivalents'
            'disallow function: aggregate data collection'
            'disallow function: user data collection'
            'disallow function: email collection'

            looking at amazon, http://www.amazon.com/robots.txt [amazon.com]
            they somewhat do this by putting the information they don't want into the wild in it's own directories
            then disallowing those directories- actually, now that I look at it- it's a neat way to go..
            but I'd still prefer a robots.txt option that different 'intended use of data to be crawled' permissions covered

            • the ENTIRE TEXT of my site- copied and recopied to put into a web page that exists only to generate ad-sense revenue by a third party is not.

              You mean like google cache? I actually agree with you overall -- it's my data, not yours. You may not publicly exhibit copies of it for your own benefit. It's just that it's a difficult line to draw, in large part because of omnibus monetizing service providers like Google.

            • by Hatta ( 162192 )

              Copyright protections are important

              Copyright is irrelevant here. Facts are not copyrightable. This data from Facebook is no different than the collection of data in the phone book. Republishing a page from Facebook or the phone book is illegal. Republishing facts sourced from those pages is not.

        • by Ksevio ( 865461 )

          User-agent: *
          Crawl-delay: 10

          Sitemap: http://www.whitehouse.gov/feed/media/video-audio

          That's not such an interesting read these days it seems.

      • by Inda ( 580031 )
        Back in the late nineties I wouldn't have thought twice about downloading a whole site. It wasn't unusual. I had a program for doing it, although I believe the popular browser of the day had a feature that saved a good potion.

        My, how things have changed. GOML.
    • Re: (Score:1, Insightful)

      by Anonymous Coward

      If Facebook had released this information we would be flaming?

      They did and we still are.

      (yes)

    • by 2obvious4u ( 871996 ) on Wednesday March 31, 2010 @10:31AM (#31688434)
      Isn't this the golden egg of Facebook, I though this is what they were selling. That data is fascinating, it is completely anonymous, yet at the same time very insightful for marketing purposes. I think Facebook is just upset because they plan on selling the same data that Pete was.
      • by NeutronCowboy ( 896098 ) on Wednesday March 31, 2010 @11:49AM (#31689466)

        Most likely. Facebook's gold mine isn't even so much the user information itself - it's the networks that they can build out of the relationship data. As of right now, they haven't figured out a way how to make money from it, but they certainly aren't going to let someone take the most valuable aspect of their system - the network information - and put it out in the open.

        Personally, I hope someone does the same work, but uploads the raw data anonymously to a torrent somewhere.

      • Except Pete can't actually sell the data, that would be a derivative work of their copyrighted web-pages. Sure he has the fair-use ability to publish academic studies, but he'd be limited to using the data internally.
    • by Altus ( 1034 ) on Wednesday March 31, 2010 @10:35AM (#31688486) Homepage

      why do you think they threatened him? they want to sell this data themselves.

    • It seems that many of these data sets are public and easily accessible to analysis. I would find it interesting to simply use various forums like slashdot and have a ranking of who had the most insightful comments by user name. Certainly the data is available as people make it so. It seems that there is a schizophrenic aspect to this, people want to be recognized for what they represent and when they become too famous they get nervous about it.
      I am sure that much of this data is already available in an org
  • by jeffb (2.718) ( 1189693 ) on Wednesday March 31, 2010 @10:32AM (#31688448)

    ...all the researchers who do everything in the open and with proper anonymization.

  • Publicly available (Score:5, Interesting)

    by mdsharpe ( 1051460 ) on Wednesday March 31, 2010 @10:32AM (#31688452)
    Since this is publicly available information, and all he did was send a program to go grab it (much akin to asking your web browser to download it), does this mean Facebook has essentially threatened him for no more than reading too much of Facebook too quickly? Sounds absurd to me.
    • Re: (Score:2, Insightful)

      by CoffeeDog ( 1774202 )
      Just because something is publicly available doesn't mean just anyone is free to reproduce and distribute it. In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them. By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.
      • by Trepidity ( 597 )

        You can't copyright facts though, so it's not clear they would own the dataset, depending on how it were created. For example, while Facebook owns the actual literal webpages on facebook.com, it's questionable whether they own the friend graph, which is simply a fact about how people choose to associate themselves.

      • by cdrguru ( 88047 )

        By your logic it should be legal to photocopy and distribute any book that is available from the public library or record and distribute MP3s of any song that was broadcast on a radio station.

        Legal, maybe not. But it happens every day over the entire planet. And there doesn't seem to be any reasonable way to stop it, so it is going to continue forever.

        Redistribution is the key to the new digital un-economy.

      • In Facebook's TOS their users agree to give Facebook rights to distribute the data they provide to them.

        If Facebook needs to write into their TOS an implied permission to distribute users' data, it says to me that the owners of such data are the users themselves. That being the case, Facebook wouldn't have any standing to make demands about what is done with that data by third parties; that would be the individual users' problem insofar as any of the data might be subject to copyright at all. (Most of i

      • by Phrogman ( 80473 )

        This is more like going into a public library and writing down a list of all the books they have by title, ISBN, placement on the shelves, publisher etc, and then relating that information to show connections between the books. Its all publicly available information and anyone can walk in and look at it, write it down etc.

        The difference here is that Facebook is providing its services free to the public so that THEY can go grab all this information and turn it into a dataset they can sell to corporations tha

    • Not really. It means that Facebook needs to have some data publicly available for users to browse, but that it can't let people take that data out of the Facebook realm. In other words, Facebook knows exactly what it is doing, and is acting in both cases in its best interest.

      Now, does that mean that Facebook's approach makes sense, and would stand up in court? I doubt it, but I don't have the cash to test that theory. Which in turn means that the outcome was just as predictable: Facebook makes up random rul

    • Disclaimer: I work for the company mentioned in the article, not in legal role though.

      Privacy is dynamic and "publicly available information" is not set in stone - user could've chosen to hide specific bits of that information a few minutes later, and there doesn't seem to be any update protocol to remove those bits from the scraped DB.

  • chilling effect (Score:5, Insightful)

    by Anonymous Coward on Wednesday March 31, 2010 @10:33AM (#31688464)
    Don't see Facebook going after Google, even though the data that they posses is ostensibly the same as Warden's. The primary diff that i see is that warden was offering analysis and results for free- not trying to monetize it. Maybe that's what made them mad.
  • All data that exists, and someone can sell somehow, is for sale somewhere, somehow. That's the law of money, which is rather strong. So forget the right to privacy law, it's not working for a long time now, there is no way to enforce it, just like the law prohibiting drugs, it just doesn't work. I don't know the solution, or if it's good or bad, but that's the situation, like it or not. Wikileaks, for example, is a result of this.
  • Besides the obvious (wasting time, too much info being shared with future employers), their privacy and data policies have gotten worse and worse. Once you sign up with them, they own everything you do. Or at least so they believe. From his writing, this researches was quite open and tried to be as forthcoming as possible. If they had concerns over anonymity, I suspect he would have been happy to discuss the exact data-scrubbing procedure to make sure it's on the level. But instead, these turds reach f

  • (not that it was actually destroyed), but why destroy the dataset? Just post to slashdot, wait for someone to send you a link to chilling effects or eff, then follow up with chilling effects or eff, then release the dataset.

  • Very interesting (Score:3, Informative)

    by Bearhouse ( 1034238 ) on Wednesday March 31, 2010 @10:40AM (#31688552)

    I'll let others debate the 'privacy' issues; (personally I think there's nothing wrong with scraping profile information that people have explicitly made 'public')
    Anyways, just check what he did with it; very interesting: (FTA)
    http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html [typepad.com]
    There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

    • Re: (Score:3, Funny)

      by Bearhouse ( 1034238 )

      ahem, put 'to', of course...

    • There must be many, many legit uses this data could be put too...shame it's being killed by NIH syndrome

      By "NIH syndrome," I assume you're referring to "Not Invented Here." I don't really see what that has to do with this case.

      • Correct on NIH.
        Well, if they were smart, Facebook would already be marketing this data, and/or services based on it, to their users and others.
        One could imagine all kinds of apps; "hey, 20% of your friends are in town 'x', why not go there for a weekend"
        The links to business could be huge, too...
        "Hey, here's a hotel you could stay in..."
        If they proposed those kinds of things, instead of asinine games, then maybe I'd be prepared to take them more seriously, (and not have a problem with their using my 'public

        • The thing is, that I just don't understand why you would use "NIH Syndrome" in this context. That is usually used when somebody in Company X says "Hey, why don't we use this awesome technology to make a better product," but is rebuffed by Company X because the technology was invented by company Y.

          In this example, there is no new technology involved, and Facebook already has the data. What is "not being invented here"? Facebook already invented Facebook, how is Facebook using the data they generated inventin

  • by TheSpoom ( 715771 ) <slashdot@@@uberm00...net> on Wednesday March 31, 2010 @10:41AM (#31688564) Homepage Journal

    They did something similar to FB Purity [fbpurity.com], a Greasemonkey script that allows users to filter out apps and other stuff they don't want to see in their feed. Facebook argued that they were misusing their "FB" trademark... eventually they let them continue under the name "fluff busting purity", probably due to the PR backlash that shutting them down would bring.

    They've also shut down the Facebook portion of the Web 2.0 Suicide Machine [suicidemachine.org], which runs scripts that allow a user to delete their social profiles as thoroughly as sites will allow. In that case, they argued that the Suicide Machine was violating their "Statement of Rights and Responsibilities"... which isn't even a law! Nonetheless, the Suicide Machine didn't have the financial ability to fight even frivolous claims like that, so they folded that section.

    Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

    • by Anonymous Coward on Wednesday March 31, 2010 @11:01AM (#31688800)

      They're not wrong though. People on FB constantly get outraged at new policies, interfaces and features, but I don't know of anyone who has actually left the site. I am just as bad myself; all I've done is remove everything from my profile and just use it as a hub to stay in contact with people all around me, I haven't gone as far as stopping using the site, and I don't think I will. Nor will many people.

      • I left the site. Well, I tried to. At first, they told me that I could only "suspend" the account; ie, people could still send me stuff and FB kept ALL of my data. Outraged, I tried to find an alternative.

        Surprise, surprise. After digging through their FAQ I found an obscure part of it that said you could permanently delete. Here's the problem with it. After you agree to permanently delete, it stays up for two weeks. If you log in even once, it undoes the delete option. Furthermore, there is no guarantee an

      • by CAIMLAS ( 41445 )

        It's probably something to do with the fact that: eh, you can:

        1) leave the site and have them keep all the data, while at the same time not be able to view your friends' profiles again
        2) stay

    • Re: (Score:2, Insightful)

      by flabordec ( 984984 )

      Facebook apparently believes that its users will continue using the site regardless of the ridiculous access policies that their legal department create and defend. I hope they're wrong.

      I'm afraid the average Facebook user is a teen who is more worried with getting a higher score in whatever Flash game she is currently playing than in FB's access policies for computers.

      • This. I tried to convince three friends to quit FB, and they were vehemently against it.

        Three different reasons given:

        1. I have nothing to hide, so why not share everything with everyone?

        2. My privacy settings are on, so it's okay.

        3. I don't care, I want to keep in touch with my friends that live in the same dorm that I also text obsessively and eat every meal with.

        My generation is as anti-privacy as they are anti-copyright; they hate the establishment but love giving said establishment all of their data.

  • Don't worry... (Score:3, Interesting)

    by turbotroll ( 1378271 ) on Wednesday March 31, 2010 @11:06AM (#31688876)

    Somebody else will do it again, this time anonymously and with an evil robot that hides its tracks. It only takes perl, LWP, MySQL, tor and a little time and imagination to do so.

    Fuck you, Zuckerberg.

  • The most boring of the clusters, the area around Seattle is disappointingly average.

  • I'm not sure copyright law even applies here. No more than it applies to say Google or Yahoo. He scraped DATA from a publicly accessible website as permitted by the robots.txt file. How is this really any different than what Google or Yahoo does? Perhaps the distribution? Though that's hardly significant in this case as the data is already out there. He just organized the presentation. Sounds to me like Facebook just pushing buttons to try and avoid another privacy controversy. /IANAL //Don't use fac
  • by clone53421 ( 1310749 ) on Wednesday March 31, 2010 @01:35PM (#31691046) Journal

    You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.

    An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

    • Putting a line somewhere on your website doesn't mean it applies to everyone who visits your website.
      *Reading this comment intitles the writer of this comment, to compensation of no less then 100,000 USD per reading
      I'll assume the check is in the mail, by your logic.
      • You are correct. Simply reading it does not mean that.

        If you plan on caching and reusing the data, however, it does mean that you should check for applicable terms and copyrights.

        If I see a nice picture gallery on a website, I’m welcome to click through and admire the pictures. But if I want to save them and publish them elsewhere, I’d better check the bottom of the page and/or the TOS page for any copyright notices. It’s no different.

    • You will not collect users’ content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.

      An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

      But has the guy even signed up? We're not talking the Geneva Convention, here. Could facebook really impose its facebook Constitution on a non member? Sure I understand they'd want to. But wanting and having are two different things, he said, noting the absence of his army of Natalie Portman fembots.

      Do you suggest that this work falls in the realm of unauthorized access? Do you think facebook has specifically authorized Google? There are facebook pages in Google's cache. So does Yahoo! And bing, dogp

      • I don’t think it falls under unauthorized access... I think it’s unauthorized use of the information.

        Yeah, it’s a much trickier question since a lot of spiders have implicit authorization to use the information. Googlebot will obviously spider it and index it for Google, and this is such a well-established fact — as is the way to prevent it from doing so by robots.txt — that not actively preventing Googlebot from accessing the page is probably pretty good justification for clai

    • An empty robots.txt is not blank-check permission to crawl and use the data for whatever you want.

      No, but it's not a ban either.

      Common sense dictates that if data is publicly accessible and not accompanied by a specific usage limitation, you can mine the data and use it for scientific purposes as fair use. This guy did not charge for his results, nor for the compiled data, so it was textbook fair use.

      Remember, he did not use the collected data directly but only the relationships it inferred. That information is the product of the crawlers compilation, not the data itself, and only the data itself can be

  • Has anyone else noticed this new banner at the top of Slashdot?

    Become a fan of Slashdot on Facebook

    It's funny that as much railing on Facebook that is done on Slashdot that Slashdot is advertising for people to become fans of them on Facebook.

  • I fail to see how he did anything wrong. If FB doesn't like it then they can change how their site works.

Life is a whim of several billion cells to be you for a while.

Working...