Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Privacy Social Networks The Internet Technology

Web Scraping Doesn't Violate Anti-Hacking Law, Appeal Court Rules (arstechnica.com) 32

An anonymous reader quotes a report from Ars Technica: Scraping a public website without the approval of the website's owner isn't a violation of the Computer Fraud and Abuse Act, an appeals court ruled on Monday. The ruling comes in a legal battle that pits Microsoft-owned LinkedIn against a small data-analytics company called hiQ Labs. HiQ scrapes data from the public profiles of LinkedIn users, then uses the data to help companies better understand their own workforces. After tolerating hiQ's scraping activities for several years, LinkedIn sent the company a cease-and-desist letter in 2017 demanding that hiQ stop harvesting data from LinkedIn profiles. Among other things, LinkedIn argued that hiQ was violating the Computer Fraud and Abuse Act, America's main anti-hacking law.

This posed an existential threat to hiQ because the LinkedIn website is hiQ's main source of data about clients' employees. So hiQ sued LinkedIn, seeking not only a declaration that its scraping activities were not hacking but also an order banning LinkedIn from interfering. A trial court sided with hiQ in 2017. On Monday, the 9th Circuit Appeals Court agreed with the lower court, holding that the Computer Fraud and Abuse Act simply doesn't apply to information that's available to the general public. [...] By contrast, hiQ is only scraping information from public LinkedIn profiles. By definition, any member of the public has authorization to access this information. LinkedIn argued that it could selectively revoke that authorization using a cease-and-desist letter. But the 9th Circuit found this unpersuasive. Ignoring a cease-and-desist letter isn't analogous to hacking into a private computer system.
"The CFAA was enacted to prevent intentional intrusion onto someone else's computer -- specifically computer hacking," a three-judge panel wrote. The court notes that members debating the law repeatedly drew analogies to physical crimes like breaking and entering. In the 9th Circuit's view, this implies that the CFAA only applies to information or computer systems that were private to start with -- something website owners typically signal with a password requirement.

The court notes that when the CFAA was first enacted in the 1980s, it only applied to certain categories of computers that had military, financial, or other sensitive data. "None of the computers to which the CFAA initially applied were accessible to the general public," the court writes. "Affirmative authorization of some kind was presumptively required."
This discussion has been archived. No new comments can be posted.

Web Scraping Doesn't Violate Anti-Hacking Law, Appeal Court Rules

Comments Filter:
  • by SuperKendall ( 25149 ) on Monday September 09, 2019 @06:17PM (#59175654)

    I am really glad to see this ruling, way too many companies want to publicly put data on websites that anyone can access, but then act like its confidental.

    If you are going to put data up for all to see anyone should be allowed to examine and make use of that in other contexts.

    • by fahrbot-bot ( 874524 ) on Monday September 09, 2019 @06:50PM (#59175736)

      I am really glad to see this ruling, way too many companies want to publicly put data on websites that anyone can access, but then act like its confidental.

      If you are going to put data up for all to see anyone should be allowed to examine and make use of that in other contexts.

      Or companies/people that send you a physical letter (or email) with a confidentiality and/or copyright notice tacked on. Um... hate to tell them, it's *my* property now and I'll do what I want with it. (Unless, for pedantic readers, I've already entered into some sort of NDA with the sender.)

      • Well, it is yours, sure, but the copyright remains with the company. You can do as you wish with the copy they sent you, except for making additional copies of it and distributing them, except within narrow parameters allowed for by law.
  • by supernova87a ( 532540 ) <kepler1@@@hotmail...com> on Monday September 09, 2019 @06:24PM (#59175672)
    That is *not* what this court ruled. It would be nice to get someone with kindergarten legal understanding to check these stories first. What this court ruled on was that the arguments being made by LinkedIn weren't persuasive enough to let them continue blocking HiQ or prevent this case from going to trial.

    But this case is so ridiculous on multiple fronts that although this procedural ruling (injunction) seems technically correct (to allow the case to proceed to actual court), it could just as well have been thrown out with no difference in or ultimate harm to the parties.

    First, LinkedIn makes the claim that its users have a right to privacy against scraping by such a 3rd party. That's laughable. As the court saw, their whole business model is made on people sharing their profiles broadly and mostly to the public.

    Secondly, HiQ claims that LinkedIn's efforts to stop it from using the data are tortious interference. That's bold -- suppose someone is taking your assets (you believe illegally) and selling them to others -- can you imagine the gall that the person taking your assets can sue you for interfering with their subsequent sale of your assets?

    Finally, that LinkedIn resorted to using the computer fraud and anti-terrorism statutes to make their argument is ridiculous.

    So much craziness to go around. I would've just tossed the case, but I guess there is the whole bit about due process... Maybe HiQ will fail anyway at the next substantive trial, but what a waste of time.
    • by Retired ICS ( 6159680 ) on Monday September 09, 2019 @06:42PM (#59175714)

      "Secondly, HiQ claims that LinkedIn's efforts to stop it from using the data are tortious interference. That's bold -- suppose someone is taking your assets (you believe illegally) and selling them to others -- can you imagine the gall that the person taking your assets can sue you for interfering with their subsequent sale of your assets? "

      This is bullshit.

      A more correct analogy would be that someone is BUYING something you are offering for sale at the price you are offering to sell them for, but then RESELLING them at a higher price by bundling them with something else which adds value. In such a case, refusing to continue to do business with the party of the first part (the one buying what you are offering for sale under the conditions on which you are making the offer) is, in fact, tortuous interference.

      In this particular case, LinkedIn is "selling" their assets for free, and the bundled value added product is the additional analysis provided by HiQ.

      More interesting, does not the threat to commence proceedings under the CFAA by LinkedIn if HiQ does not cease and desist their activity constitute the crime of extortion? If not, then why not? It certainly would qualify as extortion here in Canada.

      • In this particular case, LinkedIn is "selling" their assets for free

        Surely the quick way to solve this would be simply for LinkedIn to put some terms on the free access to the data. This way it would be free for personal use and not free for large scrapping companies to come along and resell. This had better be legal otherwise various Creative Common licences, particularly those with 'NC', are in serious trouble.

        • This had better be legal otherwise various Creative Common licences, particularly those with 'NC', are in serious trouble.

          Facts are not copyrightable.

      • Bullshit. HiQ didn't hack anyone and that's nonsense, but if Linkedin can technically stop them from doing that they want to do then good - fuck HiQ they don't have any kind of contract that forces LinkedIn to provide data in a way that is digestible by them in any particular format.

        Just like "fair use" isn't a right. Say some music provider discovered magic (work with me here) that literally made it impossible for you do anything but listen to their music on special headphones. You are unable to make a 10

      • by Luthair ( 847766 )

        I don't think your analogy is apt, so here is another way to think of it - just because a store is open to the public doesn't mean someone can show up with a crew and film a movie in it. The store can tell you to leave, and that if you come back it would be considered trespassing.

        While I don't think that scraping should generally be considered under fraud & abuse, don't you feel that when someone has been told not to access a service any longer and is circumventing safeguards to prevent access that a li

        • "I don't think your analogy is apt, so here is another way to think of it - just because a store is open to the public doesn't mean someone can show up with a crew and film a movie in it. The store can tell you to leave, and that if you come back it would be considered trespassing."

          I cannot tell if you are dense or just being obtuse. Just because a store is open and offers to sell things to the general public does not mean that it is offering to be a place where movies can be filmed. If however it IS offe

          • by Luthair ( 847766 )

            I cannot tell if you are dense or just being obtuse. Just because a store is open and offers to sell things to the general public does not mean that it is offering to be a place where movies can be filmed. If however it IS offering to be a place where any member of the public may enter and film a movie, then it cannot exclude you from doing so.

            Right, so why can't LinkedIn tell this company to stop scraping its service. LinkedIn is not making the information accessible so any asshole can scrape the site, the intent is that human beings can view it.

            • by mishehu ( 712452 )

              I cannot tell if you are dense or just being obtuse. Just because a store is open and offers to sell things to the general public does not mean that it is offering to be a place where movies can be filmed. If however it IS offering to be a place where any member of the public may enter and film a movie, then it cannot exclude you from doing so.

              Right, so why can't LinkedIn tell this company to stop scraping its service. LinkedIn is not making the information accessible so any asshole can scrape the site, the intent is that human beings can view it.

              This is a non-sequitur. It doesn't matter who the intended audience is if the show is available publicly. All can view the show. They might as well tell Google not to spider LinkedIn. (robots.txt is a request and is not legally binding)

              • by Luthair ( 847766 )

                This is a non-sequitur. It doesn't matter who the intended audience is if the show is available publicly. All can view the show. They might as well tell Google not to spider LinkedIn. (robots.txt is a request and is not legally binding)

                See my earlier post in the thread about mapping this to something else in our society.

                • by mishehu ( 712452 )

                  This is a non-sequitur. It doesn't matter who the intended audience is if the show is available publicly. All can view the show. They might as well tell Google not to spider LinkedIn. (robots.txt is a request and is not legally binding)

                  See my earlier post in the thread about mapping this to something else in our society.

                  And now we know for certain that you are deliberately being obtuse. That is not an applicable scenario that you attempted to map to, and somebody already had told you why.

      • by DarkOx ( 621550 )

        I would say the correct analogy is more like gathering facts for an almanac. HiQ is presumably combining the information from linkedin with data from elsewhere. They are making a material effort in the form of gathering facts. Suppose I was writing a "Who is Who in whatever" and walked into your Uni and saw you coming from a class room. If I asked who is the professor and what kind of research does s/he do? You can't then claim you own that information if I publish it in my book along side other data.

        Lin

        • by tlhIngan ( 30335 )

          Linkedin shares data promiscuously they can't claim ownship of the facts. I can't republish their entire database but I certainly can republish facts from it.

          Why not? The service HiQ provides is making a database available - they scraped LinkedIn and somehow converted that information into a much more useful form for their customers. The value add for their customers is the ability to query the database how and when they like, rather than using LinkedIn's limited views of the same.

          LinkedIn is probably mad b

    • by Falos ( 2905315 )

      >>that LinkedIn resorted to using the computer fraud and anti-terrorism statutes to make their argument is ridiculous

      The CFAA gets resorted to nonstop, it's glorious in a sort of "you can use duct tape to solve ANYTHING" kind of way.

      Except it's weaponized by those with the trappings of legitimacy; so, less glorious about fixing sheds and shoes, more glorious in a ghastly and dystopic kind of way.

    • by Dutch Gun ( 899105 ) on Monday September 09, 2019 @07:19PM (#59175778)

      First, LinkedIn makes the claim that its users have a right to privacy against scraping by such a 3rd party. That's laughable.

      Agreed, that's just absurd. I use LinkedIn to store my public resume. I do this with the full understanding and expectation that anyone can freely access this data. There is ZERO expectation of privacy there, at least on my part - quite the opposite, in fact. It's better for me if as many people as possible have easy access to my professional information.

      • by rtb61 ( 674572 )

        Do you realise they are using that data to target and manipulate employees things like flight risks (what trying to escape a shitty company, targeted for threats of immediate no notice firing, seriously). Clearly M$ wants to take on the whole spy on employee market and active retribution tactics for any infractions of corporate law. Seriously people would be deleting their chain-LinkedIN to the life profile, your are setting yourself up for gross abuse by M$ and the employers who contract out management and

        • There's always a danger of information being used against employees. But I think you just have to be aware of that. I make a 100% separation between personal and professional data when I'm at work, as a matter of absolute policy. I don't look at my LinkedIn page at work, nor any other personal page where I log it. If I browse Slashdot at lunchtime, I do so anonymously. Essentially, I assume an employer is monitoring everything I do on the company computer, and I act accordingly. It's their hardware an

    • This case is analogous to people taking your picture in public. In public you have no expectation of privacy. If you put data on the internet that is supposed to be accessible to the internet (e.g. LinkedIn public profile data), then you should have no expectation of privacy or who accesses it, human or bot. That essentially is why the case didn't go forward.
  • Is the principle I live by.
  • Could not LinkedIn use a scheme where the first five hundred accesses a month are free, and then ramp up the cost for subsequent accesses? That way they can set the threshold for what they consider "legitimate" user of their service by number of free accesses and then ramp up the cost as they see fit?

    Can they make an EULA requirement that prohibits re-sale of data and derivative works? Usage restrictions that the data can only be used for hiring purposes?

  • Is data mining on someone site also using the web sites bandwidth? That,s the same as asking the government for papers and you need to pay for the paper and the employee it takes to get it for you..data mining cost the web site money..right? They can all it scraping but its really just data mining for profit using the the web sites resources. Not sure how they are mining but my guess it avoid seeing ads and being mined themselves something the site makes its own money from.
    • by DarkOx ( 621550 )

      Then its up to the site to control what requests are fulfilled under what conditions. I can't just barge into a government office local, state, or federal and demand information. They create a barrier to entry, I have to complete the proper request paperwork first and file it with the proper people/office. Only then can my request be fulfilled and often only if I pay.

      LinkedIn can ask for a credit card or captcha if they want stop scraping. They can even do what Google does and let you perform some large

  • by oneiros27 ( 46144 ) on Tuesday September 10, 2019 @09:55AM (#59177016) Homepage

    The article is rather lacking in details.

    I assume LinkedIn had a robots.txt file, and hiQ tried to get around it. LinkedIn would also have had some sort of EULA.

    For those who haven't managed web servers -- robots.txt was from the early days, where you'd list the pages that you didn't want web crawlers to scrape. It's roughly equivalent to 'no photography allowed' signs in a museum. You could you specify the name of the bot, and then the pages that bot was blocked from looking at. It wasn't supposed to be so you could favor one search engine over another ... it was because there were some that were well behaved, and others that would just run amuck and DOS your server)

    The problem was that you ended up listing all of the places that you *didn't* want people looking. So it might be that you were blocking off a bunch of CGIs that were expensive to run if they weren't called with the right POST, but most people would also list the password protected bits of their website. (and in those days, there were a ton of sites using basic auth)

    But there were lots of unsavory scrapers out there (eg, ones looking for e-mail addresses), so some of us would do things like redirect them to a page saying that we know they're a bot, and they should go away ... and then give a long loop with sleep() and slowly printing out bogus e-mail addresses. (with some that had a legit server, but if anyone ever sent e-mail to it, we black-holed the server it came from)

    It wasn't until many, many years later that someone came up with 'sitemaps [sitemaps.org]' to tell automated tools what pages there were on a given site, and how often search engines should check them.

    To the best of my knowledge, there's never been a court case to say that robots.txt is legally binding, and if ignoring it counts as trespass or otherwise stealing services. But you'd also have to make a judgement on EULAs on websites that allow you to get information without creating an account. Is it like a 'shrink wrap' agreement, where you're automatically bound to it? How obvious or in-your-face do websites have to make it so it's enforceable?

    Anyway, there are lots of bad analogies out there for what's being done here. I'm not sure which it closest -- it's not quite like taking pictures in a museum and selling prints, unless we're dealing with stuff so old it's out of copyright. Maybe it's closer to taking pictures through someone's windows. ... but without knowing what LinkedIn tried to do to stop the other company, and what the other group did to avoid LinkedIn's measures, I can't tell if hiQ was in the right, downright obnoxious, potentially illegal, or just a little bit sleazy.

    • Robots.txt was the equivalent of an unlocked room with a big sign pointing to it that says "All of the best stuff is in here! Please do not help yourself to it's contents."

  • This ruling comes too late for some.

    • Aaron Swartz was using credentials. The information was not "public" it was locked behind a password.
  • Let's say it was ruled violating the anti hacking law, if I were to write the information down on a notepad, and then type it into a database, that would be illegal, right?

      After all, I "scraped" the web page with my eyes and entered what I saw into that database.

"And remember: Evil will always prevail, because Good is dumb." -- Spaceballs

Working...