Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Courts United States IT

Web Scraping is Legal, US Appeals Court Reaffirms (techcrunch.com) 78

Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling. From a report: The landmark ruling by the U.S. Ninth Circuit of Appeals is the latest in a long-running legal battle brought by LinkedIn aimed at stopping a rival company from scraping personal information from users' public profiles. The case reached the U.S. Supreme Court last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case. In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law.

The Ninth Circuit's decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo. But there have been egregious cases of scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users' data scraped over the years.

This discussion has been archived. No new comments can be posted.

Web Scraping is Legal, US Appeals Court Reaffirms

Comments Filter:
  • If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.

    You can't put a photo out for the world to see, then complain someone actually looked.

    • by Merk42 ( 1906718 )
      and if I take a photo of you in a public spot, then the same applies.
      Want privacy? Never leave your home and disconnect everything from the Internet.
      • and if I take a photo of you in a public spot, then the same applies.

        Exactly right.

        Want privacy? Never leave your home and disconnect everything from the Internet.

        That's a bit extreme don't you think?

        If you want privacy, spend more time in private venues where photography is not allowed.

        If you want more privacy, don't post to social media (my wife takes this approach). My wife is on the internet all the time, but you can use much of the internet without having to constantly share your view on things. Stay

    • by bsolar ( 1176767 ) on Monday April 18, 2022 @04:51PM (#62457710)

      If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.

      You can't put a photo out for the world to see, then complain someone actually looked.

      No, they cannot "do anything they like with it" unless the copyright holder either waives copyright or provides the image with a license that allow to do anything with it. Publishing something does not make that work fall into the public domain.

      • No, they cannot "do anything they like with it" unless the copyright holder either waives copyright or provides the image with a license that allow to do anything with it.

        Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.

        The can do anything they like with it, then the question is what happens if they try to share that further is a different matter, and maybe wholly irrelevant depending on where they are

        • by EvilSS ( 557649 )
          You might want to look up what the word “anything” means.
          • Read responses much?

            Not my fault you cannot distinguish between what people can do, and what there might be legal repercussions for doing.

            • by martynhare ( 7125343 ) on Monday April 18, 2022 @06:57PM (#62458020)
              The licence grants: "A worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish and process, information and content that you provide through our Services and the services of others, without any further consent, notice and/or compensation to you or others. These rights are limited in the following ways: You can end this license for specific content by deleting such content from the Services, or generally by closing your account, except (a) to the extent you shared it with others as part of the Service and they copied, re-shared it or stored it and (b) for the reasonable time it takes to remove from backup and other systems."

              What doesn't that legally allow exactly? IANAL but that looks like "anything goes" as even re-sharing under identical terms is permitted.
              • by EvilSS ( 557649 )
                Linkedin, then yea, that license is kind of shitty for the end user. But this thread didn't limit it to linkedin :

                If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.

                You can't put a photo out for the world to see, then complain someone actually looked.

            • by EvilSS ( 557649 )

              Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.

              Actually, it does control what they can do with it. They can't, for example, scrape it from one site and republish it on another. That would fall under the definition of "do anything they like".

              Every post you just make it more and more obvious you're a fucking moron.

        • by bsolar ( 1176767 )

          Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.

          In most jurisdictions, including the US, derivative works are only one of many aspects of copyright. The fundamental aspects of copyright are the rights of copy and distribute the work.

          Again, there is a difference between "publishing a work" and "putting a work into the public domain". "Putting something out in public" can be the former or latter and in the former case a copyright holder can decide the terms of license they want to apply to whoever wants to obtain a copy of it.

        • Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.

          The first part of this sentence is incorrect on its face - copyright protection covers all forms of copying, not just derivative works.

          As for the second part of this sentence, data is generally not eligible for copyright protection, while photographs generally are eligible.

    • by EvilSS ( 557649 ) on Monday April 18, 2022 @04:58PM (#62457732)

      If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.

      OK that bold part is wrong. Copyright does not go away because something is posted publicly. The owner of the copyright on the photo can certainly DMCA or pursue by other legal means any use they find they don't like that doesn't fall under some form of fair use (which most people on the internet have no understanding of and think it's a get-out-of-jail-free card for anything copyright related).

    • by swell ( 195815 )

      Is my photo public when others post it online? That would mostly be family, friends, co-workers, neighbors, extortionists, etc. What about my phone number? Data about me and you is out there in places you can't find, places that you can't hide. But dedicated data brokers will find it.

      You seem to think that nobody shares information online about others.

      • Is my photo public when others post it online?

        Yes of course it is.

        You seem to think that nobody shares information online about others.

        Incorrect, I just realize the implication is you need to be careful who you give information to, instead of hopelessly claiming public data is not public.

    • by lsllll ( 830002 )
      Who marked this as insightful? You absolutely cannot do anything you want with it. You can't take a photo I put on my web site and sell it to anybody else (on its own, or embedded into your product) and make money off of it, unless I expressly applied a license to it that allows you to do so. "Looking" is different than "do anything they like with it".
      • That's what the defendants are doing. They're selling the data they scraped from Linkedin and 2 decisions in a row say that it's ok.
        • Are they selling data, or are they selling photographs? Data is generally not eligible for copyright protection.
          • Digital photographs ARE data.So are typewritten stories in digital form.
            In fact, if you think about it, a digital photo or text article/book is actually, among other things, a single very large unique integer (the concatenation of all the bits in all the bytes of the image or text file.)

            The way current copyright law seems to be interpreted, some large integers are copyrightable, because they are an encoding of a copyrightable work. I guess it's the arrangement of information that is copyrightable. It's not
            • Digital photographs ARE data.

              When discussing copyright law, this is very much incorrect.

              • When discussing physical reality, it is very much correct though. The only precise definition of the digital photograph is the sequence of bytes that represents its information. If I take your digital photograph, and brighten it up with simple digital filters, I no longer have your photograph, but a derivative work.

                Maybe you know the legal answer to THIS different question though:

                If I go to the same exact location your digital photo was taken from, and wait for very similar lighting conditions, and capture
                • When discussing physical reality, it is very much correct though.

                  Congratulations? This thread is about copyright law, so this is irrelevant.

                  If I go to the same exact location your digital photo was taken from, and wait for very similar lighting conditions, and capture a very similar image of the same subject (assume a static subject that is still there) have I violated your copyright?

                  That is a good question, and I don't know enough case law to know the answer for certain (I'm not a lawyer, I only played one on TV). If you don't know that my photograph exists, it almost certainly isn't a copyright violation, since there's no kind of copying being done. If you go to the same location as my photograph was taken from, holding a copy of my photograph, and do everything you can to recreate the exact same scene, then a

            • by jbengt ( 874751 )
              A photograph is considered a creative work, hence copyrightable. "Data" in the sense used above is just facts, which is not copyrightable.
              • But your so called creative work's only precise definition is the large set of facts it consists of... facts about the light intensities and wavelengths that reflected off things out there just so and reflected in just the precise directions to hit each of the precise sensors of the sensor grid of the digital camera, each sensor which then captured (physically interacted with) the amount of light, creating an amount of electric voltage or current, which was converted to a binary integer number. Then all tho
        • by EvilSS ( 557649 )

          That's what the defendants are doing. They're selling the data they scraped from Linkedin and 2 decisions in a row say that it's ok.

          They say it does not violate Linkedin's rights when they scrape the data. There is nothing here as it pertains to copyright, where applicable (photos for instance), since the copyright would be held by the creator and not linkedin, and as far as I can see no end users are party to this lawsuit.

  • Aaron Swartz (Score:5, Insightful)

    by franzrogar ( 3986783 ) on Monday April 18, 2022 @04:06PM (#62457590)

    Aaron Swartz

    • These youngins probably don't even know who he is.

    • RIP.

      Find a DOI on sci-hub.ru in his honor.

    • by schwit1 ( 797399 )

      Unfortunately if their are no consequences to prosecutors(Carmen Ortiz) they can run you through a psychological and financial ringer until the case gets thrown out.

    • It doesn't seem like Swartz's case would've been helped by this ruling as the JSTOR documents he scraped are not "publicly accessible on the internet".

      That is of course not to say that his prosecution didn't expose glaring flaws with the CFAA and the way that it's applied.

  • Passing legislation to restrict how companies collect and use data about us has proven to be very thorny. Whereas if the government simply declines to enforce exclusive property rights over these databases, they become less valuable, so less money will be devoted to collecting them at all.
    • Re:Could be huge (Score:4, Insightful)

      by DarkRookie2 ( 5551422 ) on Monday April 18, 2022 @04:08PM (#62457598)
      It is only thorny because the people abusing the data want to keep abusing it.
      • No, you can't settle the issue of what's good or bad just by using a pejorative word to refer to it. If everybody agreed on what constituted abuse, sure, there would be no issue. So what? The tension between the freedom to act, and freedom from being acted upon, is the underlying issue, and it has no obvious solution. (Although there are examples that almost anybody would agree are egregious).
        • It has an oblivious solution.
          Collect only the information required to run your service and no more. You are not allowed to let anyone else look at said data unless the gov comes looking for a warrant. Cookies and the like are illegal except for ones containing login info.

          All these tech companies do NOT need all the data they are collecting. They just want it.
  • by blahabl ( 7651114 ) on Monday April 18, 2022 @04:08PM (#62457600)
    It's also good news for everyone out to violate your privacy even further. Collating your publicly available data and selling it to highest bidder.
    • It's also good news for everyone out to violate your privacy

      The "privacy" of al the data you CHOSE to make public?

      Everything LinkedIn has, you entered.

      • by blahabl ( 7651114 ) on Monday April 18, 2022 @04:57PM (#62457726)

        It's also good news for everyone out to violate your privacy

        The "privacy" of al the data you CHOSE to make public?

        Everything LinkedIn has, you entered.

        That photo of you someone posted on Facebook and tagged you - that too? By showing you face in public where someone might photograph it, you agree to get entered into Clearview's database? And I guess if I loose a hair in a public spot it's fair game for someone to pick it up, sequence the DNA and tell all the insurance companies about that mutation making me susceptible to cancer so they can ramp up my insurance rate? After all I just left that DNA data laying around, hey, it's fair game! By not wearing a spacesuit all the time I consent to being sequenced by anyone who feels like it!

        • I think that you can remove your name from someone else's photo if they tagged you.
        • by Okind ( 556066 )

          That photo of you someone posted on Facebook and tagged you - that too? By showing you face in public where someone might photograph it, you agree to get entered into Clearview's database? And I guess if I loose a hair in a public spot it's fair game for someone to pick it up, sequence the DNA and tell all the insurance companies about that mutation making me susceptible to cancer so they can ramp up my insurance rate? After all I just left that DNA data laying around, hey, it's fair game! By not wearing a spacesuit all the time I consent to being sequenced by anyone who feels like it!

          For photographs, that's where portrait/personality rights (US: rights of publicity) come into play. Basically, you can control public use of any data related to you. I'm not certain what this means for private commercial use (like what Clearview is doing).

          Then again, photographs of you are data where you are the data subject. So this definitely falls within the scope of e.g. the European GDPR. So what Clearview is doing is very much illegal where European data subjects are concerned. (not sure whether it's

    • It's good because it reaffirms the underlying principle of the Web: if it's publicly viewable, just viewing it is not a violation of the law no matter what method you use to view it or who's doing the viewing for what purpose. If a site wants the data to not be publicly viewable, it's on the site to restrict access to it so it isn't publicly viewable.

      It also affirms another principle: you aren't bound by a site's terms and conditions merely by viewing pages that're public viewable. If the site wants you to

    • You do realise the reason LinkedIn is annoyed is not that your privacy was violated, but that someone else sold it and not them.

    • Set your account to private, then it can't be scraped unless one of your friends does it.
    • Think about it this way: when you walk on the street, you are making yourself "publicly available." Tourists are allowed to take pictures where you are in, like, if you want to take a photo of a monument you're not going to wait until everyone is out of the field of the camera juste because "privacy", the fact that there are identifiable people in tourist photos is "fair use". But if the tourist is a professional photographer and later wants to sell the photo where you are in, the person has to either remov

  • If you scrape for profit, then that wouldn't be fair use. Archiving, academic, etc., then fair use.

    • Since when is "academic" use not for profit. Maybe not the profit of the person writing, but certainly the journal publishing your paper, and/or the institution they belong to.

    • by EvilSS ( 557649 )

      If you scrape for profit, then that wouldn't be fair use. Archiving, academic, etc., then fair use.

      The problem is you generally can't copyright a bunch of facts. AT&T tried this ages ago to shut down companies trying to create their own phone books, and the courts shot it down. The content of a linked in profile would mostly be uncopyrightable. Plus the parts that are, are not actually owned by Linkedin but by the person who created them (profile photos for instance). The user might have given Linkedin a license to use them when they signed up, but they didn't give them control over the copyright. So

  • by Joe_Dragon ( 2206452 ) on Monday April 18, 2022 @04:27PM (#62457650)

    say an scraping system does not read what the robots.txt file says to do?
    The site may be public but the DRM says read only no save.

    • robots.txt is not DRM.
      • by EvilSS ( 557649 )

        robots.txt is not DRM.

        Dumb to even have to ask this question but has this actually been tested in court? Considering some of the other dumb ass DMCA filings like Admiral going after ad-blocker domain lists with their domains because they consider their ad-block-blocker DRM and therefor the lists as circumvention, I'd not assume a robots.txt isn't unless it's been litigated.

      • by jd ( 1658 )

        "Breaking and entering" applies when the owner of a property makes a good-faith effort to block entry and this method is bypassed. I know of no country with such a law that imposes a minimum standard of lock.

        robots.txt may well apply here. It is certainly a good-faith effort, although closer to the "no trespassing" signs that you see (since it is a posted restriction rather than a physical constraint), and courts may well see them as equivalent.

        However, I know of no court case that creates the necessary cas

  • The Ninth circuit only has jurisdiction over the following parts of the country link [uscourts.gov]

    1. Alaska
    2. Arizona
    3. Central District of California
    4. Eastern District of California
    5. Northern District of California
    6. Southern District of California
    7. Guam
    8. Hawaii
    9. Idaho
    10. Montana
    11. Nevada
    12. Northern Mariana Islands
    13. Oregon
    14. Eastern District of Washington
    15. Western District of Washington
    • by lsllll ( 830002 )
      That's not how it works. Having jurisdiction doesn't mean its rulings apply only to those states. It's a federal court. Its rules apply to all of U.S. If another federal court finds something to the contrary, then the supreme court will get involved on an appeal.
      • Except for Qualified Immunity and plenty of other rulings that prove your statement patently false. With Qualified Immunity, the exact nature of a 100% identical prior case has to have been tried in the same circuit to allow a case against a police officer to proceed. So, no, rulings in the circuits clearly do not apply to other circuits - especially when the Supreme Court directly allows differences to exist between the circuits.

        I, of course, agree that is an extremely messed up policy and causes all kin

      • [Citation Needed] you dumbnuts.

        This ruling only applies to the jurisdiction of the Ninth Circuit. If a similar case were filed a court under another circuit, that court might read the opinion, but they would issue their own opinion, potentially entirely different. Through various means, the ruling can be appealed to the respective appellate court, and if the conflict persists, one of the parties could petition SCOTUS for a writ of certiorari to resolve the split [wikipedia.org].

      • I totally understand why you might this surprising, but that IS how it works. It's called the law of circuit doctrine, or the circuit rule. Decisions in the circuit bind only that circuit.

        Strangely enough, the different circuits even have different rules for how precedent works in that circuit. In some some circuits, a three-judge appeals court can set aside circuit precedent based on intervening SCOTUS cases. In other circuits, that requires en banc review.

        You may recall for a time some people were upset

  • by fabioalcor ( 1663783 ) on Monday April 18, 2022 @05:15PM (#62457806)

    If my private information is left exposed by those who were supposed to keep it private, did they committed a crime?
    I ask because it remembers that case of a journalist who almost got prosecuted for founding a trove of SSNs in a state website.
    https://www.businessinsider.in... [businessinsider.in]

  • is going to be very unhappy about this. I thought of them because they've been fighting particularly hard about this very kind of thing.

  • by Dan East ( 318230 ) on Monday April 18, 2022 @06:18PM (#62457948) Journal

    Someone tell that to the Virginia State Police:

    Legal and Illegal Uses. The information on this web site is made available solely to provide information to the public. Information obtained from this site may not be sold, re-hosted, or aggregated into other products or services without the express written permission of the Virginia State Police. Automated data collection (a. k. a. "scraping") is prohibited.

    https://sex-offender.vsp.virgi... [virginia.gov]

    • Until the case is closed it won't apply in VA.
      Even at that it's unclear due to the remanding.
      But but all means rent a VPS in the 9th Circuit to do your scraping. :)

    • Yes but (in most cases) it's possible to view (and to do that, you have to initiate a copy of it on to your computer) the information on the website without ever seeing the EULA or copyright notice of the website. I should think that makes such a EULA or notice legally invalid, since it was not agreed, and the notice was not prominently posted and likely to be viewed.
  • by www.sorehands.com ( 142825 ) on Monday April 18, 2022 @06:21PM (#62457954) Homepage

    I believe this is a ruling on a preliminary injunction, not on a final judgement. To obtain a preliminary injunction you have to show that you are likely to win and the harm is irreparable.

    In Facebook v. Power Ventures, 844 F.3d 1058 (9th Cir. 2016), the 9th circuit ruled illegal to bypass IP based blocks when scraping. The Linked Court didn't appear to address IP blocking which it did and which was also done in Facebook.

    • Re:I'm not dead yet. (Score:4, Interesting)

      by Richard_at_work ( 517087 ) on Monday April 18, 2022 @07:41PM (#62458114)

      The only issue I have with the whole LinkedIn scraping case is that the scraper was granted an injunction preventing LinkedIn from changing its site to stop the scraping.

      The scraping might be legal, but that doesn't mean that the site being scraped has to bend over and take it.

      That was the main problem with this whole thing - the court saying to LinkedIn that they couldn't break the scraper.

  • If you broadcast a video or any other information to the public, it doesn't mean it's the public's right to do whatever they please with it. It's the entire reason copyright exists.
  • Does this also mean I can scrape movies, music, and graphics that is available online?

  • There is not, and never has been, any such thing as "privacy" online. If it's been posted in a place where someone else can see it, that someone will likely use what they see in an unexpected -- and potentially disagreeable -- way.

    'Nuff said.

Solutions are obvious if one only has the optical power to observe them over the horizon. -- K.A. Arsdall

Working...