Web Scraping is Legal, US Appeals Court Reaffirms (techcrunch.com) 78
Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling. From a report: The landmark ruling by the U.S. Ninth Circuit of Appeals is the latest in a long-running legal battle brought by LinkedIn aimed at stopping a rival company from scraping personal information from users' public profiles. The case reached the U.S. Supreme Court last year but was sent back to the Ninth Circuit for the original appeals court to re-review the case. In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law.
The Ninth Circuit's decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo. But there have been egregious cases of scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users' data scraped over the years.
The Ninth Circuit's decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo. But there have been egregious cases of scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users' data scraped over the years.
If your photo is public, it is public (Score:2, Insightful)
If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.
You can't put a photo out for the world to see, then complain someone actually looked.
Re: (Score:1)
Want privacy? Never leave your home and disconnect everything from the Internet.
Not that hard (Score:2)
and if I take a photo of you in a public spot, then the same applies.
Exactly right.
Want privacy? Never leave your home and disconnect everything from the Internet.
That's a bit extreme don't you think?
If you want privacy, spend more time in private venues where photography is not allowed.
If you want more privacy, don't post to social media (my wife takes this approach). My wife is on the internet all the time, but you can use much of the internet without having to constantly share your view on things. Stay
Re: (Score:2)
I'll be even more pedantic. The only thing that isn't private while you are out in public is what can be viewed from a distance. This isn't just limited to mental privacy (mental states obviously can't be viewed from a distance), but includes things like the state of your undergarments. General rule - if you are showing it off (making it visible) in public, then you can't claim that it is private. If you aren't showing it off (it isn't plainly visible, from a distance), then it can/should be considered
Re:If your photo is public, it is public (Score:5, Informative)
If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.
You can't put a photo out for the world to see, then complain someone actually looked.
No, they cannot "do anything they like with it" unless the copyright holder either waives copyright or provides the image with a license that allow to do anything with it. Publishing something does not make that work fall into the public domain.
Practically speaking, does not matter much (Score:1)
No, they cannot "do anything they like with it" unless the copyright holder either waives copyright or provides the image with a license that allow to do anything with it.
Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.
The can do anything they like with it, then the question is what happens if they try to share that further is a different matter, and maybe wholly irrelevant depending on where they are
Re: (Score:2)
Re: (Score:1)
Read responses much?
Not my fault you cannot distinguish between what people can do, and what there might be legal repercussions for doing.
In the context of the article (Score:4, Informative)
What doesn't that legally allow exactly? IANAL but that looks like "anything goes" as even re-sharing under identical terms is permitted.
Re: (Score:2)
If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.
You can't put a photo out for the world to see, then complain someone actually looked.
Re: (Score:3)
Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.
Actually, it does control what they can do with it. They can't, for example, scrape it from one site and republish it on another. That would fall under the definition of "do anything they like".
Every post you just make it more and more obvious you're a fucking moron.
Re: (Score:3)
Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.
In most jurisdictions, including the US, derivative works are only one of many aspects of copyright. The fundamental aspects of copyright are the rights of copy and distribute the work.
Again, there is a difference between "publishing a work" and "putting a work into the public domain". "Putting something out in public" can be the former or latter and in the former case a copyright holder can decide the terms of license they want to apply to whoever wants to obtain a copy of it.
Re: (Score:2)
Copyright only controls if others can legally present derivative works, ti doens't control what they can do with data you put out in public to start with.
The first part of this sentence is incorrect on its face - copyright protection covers all forms of copying, not just derivative works.
As for the second part of this sentence, data is generally not eligible for copyright protection, while photographs generally are eligible.
Re:If your photo is public, it is public (Score:5, Informative)
If you have marked a photo as public, then anyone can view it and do anything they like with it. You have inherently agreed to that by not limiting access to that photo.
OK that bold part is wrong. Copyright does not go away because something is posted publicly. The owner of the copyright on the photo can certainly DMCA or pursue by other legal means any use they find they don't like that doesn't fall under some form of fair use (which most people on the internet have no understanding of and think it's a get-out-of-jail-free card for anything copyright related).
Re: (Score:2)
Is my photo public when others post it online? That would mostly be family, friends, co-workers, neighbors, extortionists, etc. What about my phone number? Data about me and you is out there in places you can't find, places that you can't hide. But dedicated data brokers will find it.
You seem to think that nobody shares information online about others.
Re: (Score:1)
Is my photo public when others post it online?
Yes of course it is.
You seem to think that nobody shares information online about others.
Incorrect, I just realize the implication is you need to be careful who you give information to, instead of hopelessly claiming public data is not public.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
In fact, if you think about it, a digital photo or text article/book is actually, among other things, a single very large unique integer (the concatenation of all the bits in all the bytes of the image or text file.)
The way current copyright law seems to be interpreted, some large integers are copyrightable, because they are an encoding of a copyrightable work. I guess it's the arrangement of information that is copyrightable. It's not
Re: (Score:3)
Digital photographs ARE data.
When discussing copyright law, this is very much incorrect.
Re: (Score:2)
Maybe you know the legal answer to THIS different question though:
If I go to the same exact location your digital photo was taken from, and wait for very similar lighting conditions, and capture
Re: (Score:2)
When discussing physical reality, it is very much correct though.
Congratulations? This thread is about copyright law, so this is irrelevant.
If I go to the same exact location your digital photo was taken from, and wait for very similar lighting conditions, and capture a very similar image of the same subject (assume a static subject that is still there) have I violated your copyright?
That is a good question, and I don't know enough case law to know the answer for certain (I'm not a lawyer, I only played one on TV). If you don't know that my photograph exists, it almost certainly isn't a copyright violation, since there's no kind of copying being done. If you go to the same location as my photograph was taken from, holding a copy of my photograph, and do everything you can to recreate the exact same scene, then a
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
That's what the defendants are doing. They're selling the data they scraped from Linkedin and 2 decisions in a row say that it's ok.
They say it does not violate Linkedin's rights when they scrape the data. There is nothing here as it pertains to copyright, where applicable (photos for instance), since the copyright would be held by the creator and not linkedin, and as far as I can see no end users are party to this lawsuit.
Re: (Score:2)
Aaron Swartz (Score:5, Insightful)
Aaron Swartz
Re: Aaron Swartz (Score:2, Flamebait)
These youngins probably don't even know who he is.
Re: (Score:2)
RIP.
Find a DOI on sci-hub.ru in his honor.
Re: (Score:2)
Unfortunately if their are no consequences to prosecutors(Carmen Ortiz) they can run you through a psychological and financial ringer until the case gets thrown out.
Re: (Score:2)
It doesn't seem like Swartz's case would've been helped by this ruling as the JSTOR documents he scraped are not "publicly accessible on the internet".
That is of course not to say that his prosecution didn't expose glaring flaws with the CFAA and the way that it's applied.
Could be huge (Score:2)
Re:Could be huge (Score:4, Insightful)
Re: (Score:3)
Re: (Score:2)
Collect only the information required to run your service and no more. You are not allowed to let anyone else look at said data unless the gov comes looking for a warrant. Cookies and the like are illegal except for ones containing login info.
All these tech companies do NOT need all the data they are collecting. They just want it.
Don't know why you're happy (Score:5, Insightful)
Re: (Score:2)
It's also good news for everyone out to violate your privacy
The "privacy" of al the data you CHOSE to make public?
Everything LinkedIn has, you entered.
Re:Don't know why you're happy (Score:4, Insightful)
It's also good news for everyone out to violate your privacy
The "privacy" of al the data you CHOSE to make public?
Everything LinkedIn has, you entered.
That photo of you someone posted on Facebook and tagged you - that too? By showing you face in public where someone might photograph it, you agree to get entered into Clearview's database? And I guess if I loose a hair in a public spot it's fair game for someone to pick it up, sequence the DNA and tell all the insurance companies about that mutation making me susceptible to cancer so they can ramp up my insurance rate? After all I just left that DNA data laying around, hey, it's fair game! By not wearing a spacesuit all the time I consent to being sequenced by anyone who feels like it!
Re: (Score:2)
Re: (Score:2)
That photo of you someone posted on Facebook and tagged you - that too? By showing you face in public where someone might photograph it, you agree to get entered into Clearview's database? And I guess if I loose a hair in a public spot it's fair game for someone to pick it up, sequence the DNA and tell all the insurance companies about that mutation making me susceptible to cancer so they can ramp up my insurance rate? After all I just left that DNA data laying around, hey, it's fair game! By not wearing a spacesuit all the time I consent to being sequenced by anyone who feels like it!
For photographs, that's where portrait/personality rights (US: rights of publicity) come into play. Basically, you can control public use of any data related to you. I'm not certain what this means for private commercial use (like what Clearview is doing).
Then again, photographs of you are data where you are the data subject. So this definitely falls within the scope of e.g. the European GDPR. So what Clearview is doing is very much illegal where European data subjects are concerned. (not sure whether it's
Re: (Score:3)
It's good because it reaffirms the underlying principle of the Web: if it's publicly viewable, just viewing it is not a violation of the law no matter what method you use to view it or who's doing the viewing for what purpose. If a site wants the data to not be publicly viewable, it's on the site to restrict access to it so it isn't publicly viewable.
It also affirms another principle: you aren't bound by a site's terms and conditions merely by viewing pages that're public viewable. If the site wants you to
Re: Don't know why you're happy (Score:2)
You do realise the reason LinkedIn is annoyed is not that your privacy was violated, but that someone else sold it and not them.
Re: (Score:2)
Re: (Score:3)
Think about it this way: when you walk on the street, you are making yourself "publicly available." Tourists are allowed to take pictures where you are in, like, if you want to take a photo of a monument you're not going to wait until everyone is out of the field of the camera juste because "privacy", the fact that there are identifiable people in tourist photos is "fair use". But if the tourist is a professional photographer and later wants to sell the photo where you are in, the person has to either remov
Maybe don't put your information out there. (Score:2)
Should be a copyright/fair use issue (Score:2)
If you scrape for profit, then that wouldn't be fair use. Archiving, academic, etc., then fair use.
That's all for profit (Score:1, Redundant)
Since when is "academic" use not for profit. Maybe not the profit of the person writing, but certainly the journal publishing your paper, and/or the institution they belong to.
Re: (Score:2)
If you scrape for profit, then that wouldn't be fair use. Archiving, academic, etc., then fair use.
The problem is you generally can't copyright a bunch of facts. AT&T tried this ages ago to shut down companies trying to create their own phone books, and the courts shot it down. The content of a linked in profile would mostly be uncopyrightable. Plus the parts that are, are not actually owned by Linkedin but by the person who created them (profile photos for instance). The user might have given Linkedin a license to use them when they signed up, but they didn't give them control over the copyright. So
what about the DMCA and robots.txt? (Score:3, Insightful)
say an scraping system does not read what the robots.txt file says to do?
The site may be public but the DRM says read only no save.
Re:what about the DMCA and robots.txt? (Score:4)
Re: (Score:2)
robots.txt is not DRM.
Dumb to even have to ask this question but has this actually been tested in court? Considering some of the other dumb ass DMCA filings like Admiral going after ad-blocker domain lists with their domains because they consider their ad-block-blocker DRM and therefor the lists as circumvention, I'd not assume a robots.txt isn't unless it's been litigated.
Re: (Score:2)
"Breaking and entering" applies when the owner of a property makes a good-faith effort to block entry and this method is bypassed. I know of no country with such a law that imposes a minimum standard of lock.
robots.txt may well apply here. It is certainly a good-faith effort, although closer to the "no trespassing" signs that you see (since it is a posted restriction rather than a physical constraint), and courts may well see them as equivalent.
However, I know of no court case that creates the necessary cas
Web Scraping is Legal *IN THE NINTH CIRCUIT* (Score:1)
1. Alaska
2. Arizona
3. Central District of California
4. Eastern District of California
5. Northern District of California
6. Southern District of California
7. Guam
8. Hawaii
9. Idaho
10. Montana
11. Nevada
12. Northern Mariana Islands
13. Oregon
14. Eastern District of Washington
15. Western District of Washington
Re: (Score:2)
Re: (Score:2)
Except for Qualified Immunity and plenty of other rulings that prove your statement patently false. With Qualified Immunity, the exact nature of a 100% identical prior case has to have been tried in the same circuit to allow a case against a police officer to proceed. So, no, rulings in the circuits clearly do not apply to other circuits - especially when the Supreme Court directly allows differences to exist between the circuits.
I, of course, agree that is an extremely messed up policy and causes all kin
Re: (Score:3)
[Citation Needed] you dumbnuts.
This ruling only applies to the jurisdiction of the Ninth Circuit. If a similar case were filed a court under another circuit, that court might read the opinion, but they would issue their own opinion, potentially entirely different. Through various means, the ruling can be appealed to the respective appellate court, and if the conflict persists, one of the parties could petition SCOTUS for a writ of certiorari to resolve the split [wikipedia.org].
Maybe suprising, but that's how it works (Score:3)
I totally understand why you might this surprising, but that IS how it works. It's called the law of circuit doctrine, or the circuit rule. Decisions in the circuit bind only that circuit.
Strangely enough, the different circuits even have different rules for how precedent works in that circuit. In some some circuits, a three-judge appeals court can set aside circuit precedent based on intervening SCOTUS cases. In other circuits, that requires en banc review.
You may recall for a time some people were upset
What about the other way around? (Score:3)
If my private information is left exposed by those who were supposed to keep it private, did they committed a crime?
I ask because it remembers that case of a journalist who almost got prosecuted for founding a trove of SSNs in a state website.
https://www.businessinsider.in... [businessinsider.in]
Craigslist (Score:2)
is going to be very unhappy about this. I thought of them because they've been fighting particularly hard about this very kind of thing.
Virginia State Police (Score:3)
Someone tell that to the Virginia State Police:
Legal and Illegal Uses. The information on this web site is made available solely to provide information to the public. Information obtained from this site may not be sold, re-hosted, or aggregated into other products or services without the express written permission of the Virginia State Police. Automated data collection (a. k. a. "scraping") is prohibited.
https://sex-offender.vsp.virgi... [virginia.gov]
Re: (Score:2)
Until the case is closed it won't apply in VA. :)
Even at that it's unclear due to the remanding.
But but all means rent a VPS in the 9th Circuit to do your scraping.
Re: (Score:3)
I'm not dead yet. (Score:3)
I believe this is a ruling on a preliminary injunction, not on a final judgement. To obtain a preliminary injunction you have to show that you are likely to win and the harm is irreparable.
In Facebook v. Power Ventures, 844 F.3d 1058 (9th Cir. 2016), the 9th circuit ruled illegal to bypass IP based blocks when scraping. The Linked Court didn't appear to address IP blocking which it did and which was also done in Facebook.
Re:I'm not dead yet. (Score:4, Interesting)
The only issue I have with the whole LinkedIn scraping case is that the scraper was granted an injunction preventing LinkedIn from changing its site to stop the scraping.
The scraping might be legal, but that doesn't mean that the site being scraped has to bend over and take it.
That was the main problem with this whole thing - the court saying to LinkedIn that they couldn't break the scraper.
Copyright exists (Score:2)
does this also mean (Score:2)
Does this also mean I can scrape movies, music, and graphics that is available online?
Re: (Score:2)
if they are on public sites that need no login.
Re: (Score:2)
That would include YouTube.
It bears repeating (Score:2)
There is not, and never has been, any such thing as "privacy" online. If it's been posted in a place where someone else can see it, that someone will likely use what they see in an unexpected -- and potentially disagreeable -- way.
'Nuff said.