Catch up on stories from the past week (and beyond) at the Slashdot story archive

LinkedIn Says It's Illegal To Scrape Its Website Without Permission (arstechnica.com) 167

Posted by msmash on Monday July 31, 2017 @01:40PM from the my-data,-my-rules dept.

A small company called hiQ is locked in a high-stakes battle over web scraping with LinkedIn. It's a fight that could determine whether an anti-hacking law can be used to curtail the use of scraping tools across the web. From a report: HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting. LinkedIn, which was acquired by Microsoft last year, sent hiQ a cease-and-desist letter warning that this scraping violated the Computer Fraud and Abuse Act, the controversial 1986 law that makes computer hacking a crime. HiQ sued, asking courts to rule that its activities did not, in fact, violate the CFAA. James Grimmelmann, a professor at Cornell Law School, told Ars that the stakes here go well beyond the fate of one little-known company. "Lots of businesses are built on connecting data from a lot of sources," Grimmelmann said. He argued that scraping is a key way that companies bootstrap themselves into "having the scale to do something interesting with that data." [...] But the law may be on the side of LinkedIn -- especially in Northern California, where the case is being heard. In a 2016 ruling, the 9th Circuit Court of Appeals, which has jurisdiction over California, found that a startup called Power Ventures had violated the CFAA when it continued accessing Facebook's servers despite a cease-and-desist letter from Facebook.

This discussion has been archived. No new comments can be posted.

LinkedIn Says It's Illegal To Scrape Its Website Without Permission

Load All Comments

Search 167 Comments Log In/Create an Account

Comments Filter:

then dont' make it public (Score:5, Insightful)

by Anonymous Coward writes: on Monday July 31, 2017 @01:42PM (#54914951)

don't make it public fi you don't want it read

Share
twitter facebook
- Re:then dont' make it public (Score:5, Interesting)
  
  by Anonymous Coward writes: on Monday July 31, 2017 @02:07PM (#54915221)
  
  don't make it public fi you don't want it read
  They want it read. By people. (And search engines.) They don't want it read by companies that take the information and then sell it as their business model.
  If we support hiQ, saying that scraping publicly-accessible content from another site and then using that for profit is permissible, then doesn't that mean it's also applicable to other sites? Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?
  Sorry hiQ, as much as software and internet legislation is behind the times and technically inappropriate, there are some things in law which follow common sense - and one of them is you can't take someone else's stuff and sell it for yourself. If you want to use their content then you need to follow the (common) practice of establishing some sort of licensing agreement.
  But anyways, what about their user agreement?
  You agree that you will not: [...] Develop, support or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons, or any other technology or manual work) to scrape the Services or otherwise copy profiles and other data from the Services;
  Is that not enough for at least an injunction and civil suit?
  
  Parent Share
  twitter facebook
  - Re:then dont' make it public (Score:4, Insightful)
    
    by BronsCon ( 927697 ) writes: <social@bronstrup.com> on Monday July 31, 2017 @02:19PM (#54915327) Journal
    
    They don't want it read by companies that take the information and then sell it as their business model.
    What do search engines do, then?
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by tattood ( 855883 ) writes:
      
      What do search engines do, then?
      Search engines create an index that is searchable and make money by selling ads on the search page. Search engines are NOT collecting the website data, and make correlations about the data on the website and selling that data to companies.
      - Re: (Score:2)
        
        by omnichad ( 1198475 ) writes:
        
        1) collecting the website data. Check The spider downloads all of the text content and stores an index along with contextual relationships.
        2) make correlations about the data on the website Check The hyperlinks on the web site are used to evaluate the relative importance of the linked web site.
        3) selling that data to companies. Nearly They don't charge for the search engine directly - they charge for advertisers and then provide the data for free to visitors. More or less the same thing effectively sp
        
        Re: (Score:2)
        
        by kwbauer ( 1677400 ) writes:
        
        but the "less" and "effectively speaking" are the keys to the whole thing. Along with that pesky little thing called copying without permission for the intended usage. For some concrete, everyday examples, wander into any Catholic or Methodist or LDS (Mormon) church and look through the hymnal. You will find something at the beginning explaining how all the music can be copied for non-commercial use except as otherwise noted. And then you will find that some of the songs are marked with phrases that fit the
      - Re: (Score:2)
        
        by saloomy ( 2817221 ) writes:
        
        Even so they do (display titles), and even so they haven't (figured out a way to display excerpts without collecting data), Google respects a sites robots.txt, which clearly linked.in as asked hiQ to do, but hiQ is flouting the EULA. On one hand, I agree: if you don't want it read, keep it private.
        On the other hand: Posting something in the public space effectively gives the readers a right to consume it (akin to reading a book). It does not give the reader the license to freely copy or build upon it (akin
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Right, the issue here is the "we want to let search engines use it without license, but want to require a license for anyone else" attitude. Either require everyone who uses it to license it (even if you don't charge some of them), or require nobody to license it, that's more or less how copyright law works.
        
        Re:then dont' make it public (Score:4, Insightful)
        
        by smooth wombat ( 796938 ) writes: on Monday July 31, 2017 @03:10PM (#54915711) Journal
        
        "we want to let search engines use it without license, but want to require a license for anyone else" attitude.
        
        No, that is not correct. Search engines point to a page and may give a very brief line or so from the article, but one still has to click on the link to go to the real page and read everything.
        
        hiQ goes to the Linkedin site and rather than pointing to the pages in question, takes the data, packages it, and then sells it to someone else, having left Linkdedin to do all the heavy lifting.
        
        The two are not close.
        
        Parent Share
        twitter facebook
        
        Re:then dont' make it public (Score:5, Interesting)
        
        by BronsCon ( 927697 ) writes: <social@bronstrup.com> on Monday July 31, 2017 @03:32PM (#54915889) Journal
        
        The two are not close.
        They really are, though. LinkedIn has copyright on all of their content, in whole and in part, not just as a whole. That's how copyright works, otherwise I could change a single word in a book and republish it as an original work under its own copyright. It is also important to keep in mind that (most) search engines -- and Google specifically -- don't just grab the page title, META description (or first couple lines of content) and a word/phrase count, they grab the entire content of the page, and they do so in order to display the exact part of the content that contains your search term(s) -- as I mentioned earlier -- rather than a likely irrelevant summary or intro.
        
        To do this, search engines must necessarily use the entire page and not just key pieces of data. That is, Google et-al get away with using more of LinkedIn pages without license than hiQ is using. Therein lies the problem.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by angel'o'sphere ( 80593 ) writes:
        
        The question is nor what they grap/scrap but what they copy and redistribute.
        Google creates a catalog, pointing with every search result to the original.
        HiQ is balantly violating copy right, privacy rights and EULAs/TOSs.
        If you don't grasp the difference I hope you are not a software developer. Ignorance on that scale can easy be the end of your career.
        
        Re: (Score:2)
        
        by American Patent Guy ( 653432 ) writes:
        
        No, that's not quite correct. LinkedIn only has a copyright in that which (1) they acquired from their employees or other sellers and (2) constitutes a "work of authorship". They do not have a copyright in the content acquired from other sources, e.g. data, phrases, images that originate from members or other sources. The arrangement of information on a page may be a work of authorship, but only if there is some creative aspect to it. Data on a web page is not a work of authorship, and no one has a copyrig
        
        Re:then dont' make it public (Score:4, Insightful)
        
        by alzoron ( 210577 ) writes: on Monday July 31, 2017 @11:09PM (#54917849) Journal
        
        This is not a copyright issue. This is a CFAA issue. It's been long determined that you cannot copyright facts. The CFAA deals with unauthorized access to computer systems. LinkedIn told these companies to stop doing it and they kept doing it That's a pretty clear case of unauthorized access.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Actually, both violate copyright, but the reality is that they can (and should) choose who they go after. And on those grounds, LinkedIn can and should issue a flurry of DMCA notices and sue for an injunction. They can't sue for damages without registering their copyright, including a copy of the work being registered, but they sure can get an injunction. The CFAA simply does not apply to publicly available data until and unless they get said injunction and even then it would be a hell of a stretch.
        
        Also,
        
        Re: (Score:2)
        
        by kwbauer ( 1677400 ) writes:
        
        They do not violate copyright if the copyright holders says they don't violate copyright. Kind of like how the police cannot charge my neighbor for stealing my lawnmower if I don't care that he has borrowed it even if I didn't give my express permission to him prior to the police asking me about it.
        
        Re: (Score:2)
        
        by saloomy ( 2817221 ) writes:
        
        No. The difference between a search engine and hiQ is that a search engine (Google specifically, but I think all of them) respect the robots.txt instruction set. If you don't want a search engine in your content, then there is an "opt out". Maybe one could argue that this should be an "opt in", but there is a way to say "I don't want to authorize you to scan my page for indexing".
        hiQ has not listened to Linked.In's "opt-out" in the form of a cease and desist which is a pretty strong indication the content o
        
        Wrong! (Score:4, Informative)
        
        by www.sorehands.com ( 142825 ) writes: on Monday July 31, 2017 @08:47PM (#54917503) Homepage
        
        The CFAA applies immediately or when the defendant (or defendant to be) exceeds the permitted access. This could be also through a cease and desist letter. See Facebook, Inc. v. Power Ventures, Inc., No. 13-17102 (9th Cir. July 12, 2016) https://cdn.ca9.uscourts.gov/d... [uscourts.gov]
        You are permitted to grant different people different terms or access. Look at https://qz.com/981029/a-federa... [qz.com]
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by russotto ( 537200 ) writes:
        
        Didn't read it, did you? It does say that a cease&desist can trigger the CFAA. It does not say "The CFAA applies immediately or when the defendant (or defendant to be) exceeds the permitted access". In fact, it specifically says violation of terms of use cannot trigger liability under the CFAA.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        You can't copyright facts, but you can copyright collections of them, provided you've done more than simply compile facts you already had. That is, if you had to collect the facts from multiple sources, your representation of those facts is protected by copyright. LinkedIn's data is a collection of facts, which they collected from multiple sources.
        
        I'm sure you're very familiar with patents, but you've missed some nuance of copyright law in your reply.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Precisely this. Especially important in the case of Facebook or LinkedIn, which you have to actively avoid if you wish to not access them.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        You can, in fact, copyright a representation of facts you've done work to compile from multiple sources. LinkedIn has done work to compile their collection of facts and is entitled to copyright protection.
        
        That is unlike, for example, a local phone directory, because the local phone company is the sole source of that data. A nationwide phone directory, comprising data collected from the various ILECs and CLECs, would be a copyrightable work. The bar being so low in that case would likely mean you'd be fine
        
        Re: (Score:2)
        
        by American Patent Guy ( 653432 ) writes:
        
        A compilation of facts can be copyrighted, but not the underlying data. If this company wants to extract those facts, data or other bits of information and create its own compilation, it violates no one's copyright. It doesn't matter whether those facts came from multiple sources or a single one.
        Even if there were to be a copyright here, the doctrine of implied license and the statutory exclusion of fair use upon infringement would probably apply. By making the data available to anyone over the web, LinkedI
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        A compilation of facts can be copyrighted, but not the underlying data. If this company wants to extract those facts, data or other bits of information and create its own compilation, it violates no one's copyright. It doesn't matter whether those facts came from multiple sources or a single one.
        If, in doing so, a copy of the compiled data is made... well...
        The courts have ruled it is a fair use to record movies on your DVR for your own personal viewing
        Yes, it is.
        and it would arguably be the same for extracting a collection of data from an Internet source
        Sure, for your own personal use.
        provided that the entity didn't compete with that source
        You mean provided the use was not commercial or for profit, right? After all, you're:
        an intellectual property lawyer
        Your appeal to authority does not imply correctness or completeness of understanding; especially so given the argument you just made.
        
        Re: (Score:2)
        
        by American Patent Guy ( 653432 ) writes:
        
        If, in doing so, a copy of the compiled data is made... well...
        ... and because that's how LinkedIn provides the data, the scraper operates under the doctrine of fair use. (There's no other way for it to collect the data.)
        Whether your copying of the data is for personal or business use is not distinguished in the law. Your impact on the market is what counts. This scraper isn't affecting LinkedIn's ability to operate or provide the service that it does. You're free to gather information over the web (or another medium) as much as you like, recompile it, and resell it if
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        So wait, what you're saying is if my use of a short sample from a song doesn't affect the sale of that song, it's fair use? Sorry, Robert Van Winkle wouldn't have settled out of court if it looked like he was going to win against Bowie and Queen. What's happening here is practically the same. So, no, I don't think I'd like you to correct me again, since your "corrections" don't align with reality.
        
        Fair used is determined based on four factors, which I'm sure you're quite familiar with (but hoping I'm not a
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Ugh... I had a whole response typed out, clicked Preview and, in my sleep-addled state, closed the tab without posting. I'm not writing it all again, but the long and short of it is that there are four factors used in determining fair use and you're considering only one of them. I would argue that hiQ's use of LinkedIn's data does harm their ability to collect and sell it; I sure don't want hiQ mining shit about me from LinkedIn for the specific purpose for which they are doing so, so I'm less likely to con
        
        Re: (Score:2)
        
        by American Patent Guy ( 653432 ) writes:
        
        Well, then. Post your "whole response" and perhaps I'll have something to respond to other than your sleep-addled insults. You haven't rebutted what I have said.
        As anyone can download LinkedIn's data, HiQ is doing nothing special in the market. You're less likely to use LinkedIn because you've discovered that it can be used by anyone in a way you don't like. HiQ hasn't impacted the market by scraping it. Your analysis applies to the entire compilation. HiQ is downloading information for individual postings,
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        You haven't rebutted what I have said.
        You haven't read what I have written.
        If you posted such wisdom elsewhere, why not post it here?
        Because I did post it here, in this very thread. It's not my fault you've chosen not to read the thread in its entirety before replying to me, nor is it my responsibility to repeat everything I've ever posted here to every dumbass who can't scroll a page to find it himself.
        
        Sorry, I reserve that level of service for my paying clients, not random armchair quarterbacks who claim to be lawyers yet can't do a simple review of what's already on the page they're looking at be
        
        Re: (Score:3)
        
        by buss_error ( 142273 ) writes:
        
        No, that is not correct.
        I'm of two minds about LinkedIn.
        In the first place, I'm required to have an account by my current employer.
        In the second place, LinkedIn in my opinion does a ton of scraping themselves (asking to access your mail box contacts, for instance.) But at least Linkedin ASKs to access it. Still, it feels creepy to me. The "psycho" girl friend kind of creepy.
        On the third hand, LinkedIn told the to stop. So they should stop.
        
        Re: (Score:2)
        
        by KingMotley ( 944240 ) writes:
        
        Uh no. That's not how copyright works. They have to right to grant or deny the rights to copy their data however they see fit. Their EULA clearly states that you are not allowed to do exactly what HiQ is doing. Copyright, unlike trademarks, does not forfeit its right to enforce it just because they haven't in the past, or have chosen to enforce it in specific instances. HiQ has been directly told they don't have permission, end of story.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Read the rest of my comments, then feel silly.
        
        Re: (Score:2)
        
        by kwbauer ( 1677400 ) writes:
        
        No, they have given permission to search engines to do what search engines do which is to direct traffic to the website from which the data was collected. The search engines are operating within the license given to them by LinkedIn.
        LinkedIn has not given hiQ permission to do what it is doing. Kind of like how a songwriter or copyright holder can give permission to an organization to reprint songs/music in a songbook but restrict others from reprinting those same songs or even photocopying them out of the a
    - Re: (Score:2)
      
      by F.Ultra ( 1673484 ) writes:
      
      Where exactly in the complaint by Linkedin are they telling search engines to not index linkedin.com?
      - Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Not relevant to the argument being made, which was in response to someone claiming they want search engines to be able to do what search engines do then, in the very next sentence, claiming they don't want search engines to be able to do what search engines do.
        
        Re: (Score:2)
        
        by F.Ultra ( 1673484 ) writes:
        
        But it seams that Linkedin want precisely that, i.e for search engines to continue to do what they do but not let hiQ do what hiQ does.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        Right. I was replying to this, though:
        They want it read. By people. (And search engines.) They don't want it read by companies that take the information and then sell it as their business model.
        I was pointing out that search engines "take the information and then sell it as their business model."
        
        Sorry you missed that.
        
        Re: (Score:2)
        
        by F.Ultra ( 1673484 ) writes:
        
        I didn't miss that, just look like you thing that extracting the title of a page constitutes "take the information and then sell it", something that is covered by fair use. It would be a whole different affair if i.e Google extracted and resold the amount of information that hiQ does, which of course was the point of the GP.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        I didn't miss that, just look like you thing that extracting the title of a page constitutes "take the information and then sell it"
        No, it looks like you missed where they're taking the entire content of the page and not just the title, since you keep coming back to "just the title".
        It would be a whole different affair if i.e Google extracted and resold the amount of information that hiQ does, which of course was the point of the GP.
        So, if Google took only key pieces of information, rather than the entire page, that would be problematic? Because Google takes the whole page, while hiQ takes key pieces of data; Google is actually taking, repackaging, and profiting from more of LinkedIn's data than hiQ is.
        
        But, all of that is still highly irrelevant to what I was replying to.
        
        Re: (Score:2)
        
        by kwbauer ( 1677400 ) writes:
        
        But Google is doing something of which LinkedIn approves and has given Google permission to do. hiQ, on the other hand, is doing something of which LinkedIn does not approve and has not given hiQ permission to do. That is entirely the difference here. LinkedIn believes that they benefit from the way Goole indexes their pages and allows them to be searched but LinkedIn believes that what hiQ does is harmful to LinkedIn as it will tend to drive people away.
        I understand that people who have never created anyth
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        But Google is doing something of which LinkedIn approves and has given Google permission to do.
        Have they, though? Or have they simply not asked them to stop?
        I understand that people who have never created anything of value or who believe strongly in socialism have no concept of ownership of property
        Lovely assumption, but incorrect. I, in fact, have created quite a bit of value in this world. Just as a small sample, my clients value me enough to keep me employed long-term and my employees value the income and stability I provide them. So, then, you must think I'm a socialist? Why is that? Wait, no, you can't possibly think I have no concept of ownership of property when I've stated that LinkedIn has ownership of the data they've collected.
    - Re: (Score:2)
      
      by bws111 ( 1216812 ) writes:
      
      It doesn't matter what search engines do. The owner of the site is perfectly within his rights to say 'these accesses are allowed, these are not'.
      Being indexed by a search engine is probably beneficial to LinkedIn. Both parties gain from being indexed, it is a symbiotic relationship.
      HiQ is probably not beneficial. By ratting out LinkedIn's user to their employers they are potentially decreasing the number of people who will use LinkedIn. That is a parasitic relationship.
      - Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        The owner of the site is perfectly within his rights to say 'these accesses are allowed, these are not'.
        Yes, and they can do that with HTTP200 and HTTP403 status codes, respectively.
        
        Re: (Score:2)
        
        by bws111 ( 1216812 ) writes:
        
        Sure, they CAN do that, but they don't HAVE to do that. Once you have been told you don't have permission, you don't have permission.
        
        Re: (Score:3)
        
        by BronsCon ( 927697 ) writes:
        
        Actually, anything you're able to view from a public space is fair game under current laws, with the exception of court orders stating otherwise. If hiQ's servers can view the content from the public internet (that is, if LinkedIn's servers serve it to them without them hacking around some technical measure), it's fair game unless LinkedIn gets an injunction against hiQ. That is, what you're claiming is really for the courts to decide.
        
        Or, you know, LinkedIn could just claim copyright on their data and iss
        
        Re: (Score:2)
        
        by American Patent Guy ( 653432 ) writes:
        
        Data is not copyrightable, because it isn't a "work of authorship" under the copyright statutes. That's why LinkedIn is using this hacking law in a contorted way to try to stop the use of this content.
        
        Re: (Score:2)
        
        by kwbauer ( 1677400 ) writes:
        
        A library shelf is a public space and so is a museum wall. Are you claiming that anybody has the right to walk in, take pictures or photocopies of anything in those public spaces and resell those copies and that they are not violating current law? I would be asking those several lawyers that you consulted with for a refund.
        
        Re: (Score:2)
        
        by BronsCon ( 927697 ) writes:
        
        When it comes to reading (viewing in your museum example), which is what was discussed in the argument I was originally replying to, the above is absolutely true. When it comes to copying, it's a little more nuanced than that, of course; but, then, I was writing a Slashdot post, not a fucking dissertation, I certainly was not giving legal advice and, again, was arguing against someone who claimed that merely viewing something viewable from public space, which the owner readily serves up to you with no techn
        
        Re: (Score:2)
        
        by AK Marc ( 707885 ) writes:
        
        I can go into the Louvre, sit in front of the Mona Lisa, and sketch an exact replica of it, down to the brush stroke (except for the fact that there are always people standing in front of it trying to take a selfie), then sell that copy. Wait, what was your point again?
    - Re: (Score:2)
      
      by AHuxley ( 892839 ) writes:
      
      Re "What do search engines do, then?"
      Connecting people who worked on secret mil/gov projects with people looking for staff to work on other secret mil/gov projects.
      So people list all the projects they worked with and can show they are trusted in plain text.
      They used the same methods in the gov/mil and just expect the same results on the net.
    - - Re:then dont' make it public (Score:5, Insightful)
        
        by sexconker ( 1179573 ) writes: on Monday July 31, 2017 @03:02PM (#54915649)
        
        No, only one side has legitimacy.
        If you complain about people using information you post PUBLICLY, you are an idiot.
        This doesn't even rise to copyright infringement.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by Wootery ( 1087023 ) writes:
        
        That something is 'in public' doesn't mean you're free to copy it.
        Walk around a city and you might see countless TVs. That doesn't mean you're allowed to record them and sell the videos - that's still copyright infringement.
    - - Re: (Score:3)
        
        by BronsCon ( 927697 ) writes:
        
        Look again, they're given a lot of explicit restrictions and a handful of explicit permissions. In Google's case those are limited to:
        Allow: /psettings/guest-controls*
        Allow: /psettings/guest-email-unsubscribe*
        Allow: /psettings/sms-unsubscribe*
        Allow: /psettings/guest-controls/retargeting-opt-out*
        Allow: /settings/loid-email-unsubscribe-router*
        Allow: /settings/loid-email-unsubscribe*
        Allow: /help/
        
        For reference, the first 6 are pages where one can unsubscribe from various forms of marketing and the last
  - Re: (Score:2)
    
    by JaneTheIgnorantSlut ( 1265300 ) writes:
    
    I guess you can't print a page either.
  - Re: (Score:2)
    
    by Dog-Cow ( 21281 ) writes:
    
    If MS asserted a copyright claim, that would be different. There is no fraud and no hacking taking place when scraping publicly-accessible data.
  - Re: (Score:2)
    
    by Gavagai80 ( 1275204 ) writes:
    
    Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?
    Copyright law clearly makes that illegal. This case is a little different in that it seems to be about the kind of data that can't be copyrighted.
    - Re: (Score:3)
      
      by AK Marc ( 707885 ) writes:
      
      Then why didn't they file a copyright complaint? Instead, they are claiming "hacking" for viewing public information. (not copyright for using it, but "hacking" for viewing). Copyright is irrelevant, and not the complaint.
  - Re: (Score:2)
    
    by omnichad ( 1198475 ) writes:
    
    Slashdot's content is public: can I scrape everything, host it on my site, insert ads, and make money?
    Plain old Copyright law is enough to put an end to that. However, facts are not copyrightable, and Linkedin has a lot of valuable facts in its database.
    But anyways, what about their user agreement?
    Can you put a EULA in a document folder (page 50 in a stack of 200 pages) and throw it on the ground in the park, and expect to enforce it when it tells people not to read the other pages in the envelope? That's the physical-world equivalent.
  - Re: (Score:2)
    
    by AK Marc ( 707885 ) writes:
    
    So the solution is to provide public APIs, and request scrapers use those, so the data access can be tracked and identified just like when humans and search engines use it.
    
    If they make it public and predictable so search engines point to them, then they have given a robots.txt that allows that use, so it's "licensed" by the lack of controls, same as search engines.
    But anyways, what about their user agreement?
    The search engines never log in or agree to the user agreement, and this use seems to be a search engine that doesn't simply direct views to the
- Exactly! (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  I refuse to use any social media site including LinkedIN. A lot of companies - such as Goodwill - recruit exclusively from LinkedIN. Fuck'em.
  I don't work for any company that uses social media for recruiting.
Scrape or Scrap? (Score:2)

by DontBeAMoran ( 4843879 ) writes:

Because if it's not illegal to scrap their websites, black hat hackers will have a field day.
I've done several scraping projects (Score:4, Interesting)

by GerryGilmore ( 663905 ) writes: on Monday July 31, 2017 @01:49PM (#54915017)

Using some add-on python packages it is ridiculously easy to scrape any web page, even those that use ASP (It's a PITA to get set up the first time, but...). The ONLY thing - aside from legal action, apparently - is to have a login mechanism in front. Without authenticating, it's no-go.

Share
twitter facebook
- Re:I've done several scraping projects (Score:5, Interesting)
  
  by iggymanz ( 596061 ) writes: on Monday July 31, 2017 @01:53PM (#54915063)
  
  hahaha, you imagine login is a cure?
  no, scripts can log in. with sites having millions of users you can make as many logins as you need, it's a whack-a-mole the site can't win
  
  Parent Share
  twitter facebook
  - Re:I've done several scraping projects (Score:4, Informative)
    
    by im_thatoneguy ( 819432 ) writes: on Monday July 31, 2017 @02:05PM (#54915197)
    
    You can have terms of service though on a login to make it easily illegal.
    "By logging in you agree to not republish data that you view."
    
    Parent Share
    twitter facebook
    - Re:I've done several scraping projects (Score:4, Informative)
      
      by Gr8Apes ( 679165 ) writes: on Monday July 31, 2017 @02:07PM (#54915217)
      
      That's not illegal, that's merely a violation of the user agreement.
      
      Parent Share
      twitter facebook
      - Re: (Score:2, Insightful)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        Which gives them standing in court. It *might* not be a crime but it creates a contract that doesn't exist without it. This is far from the first time a company has tried the old "The Internet doesn't work the same way for us as it does for the rest of the world. Callsies, no take-backs!" defense.
        
        Re: (Score:2)
        
        by Rockoon ( 1252108 ) writes:
        
        It *might* not be a crime
        
        Breaking a contract is not a crime. Full stop.
        
        Re: (Score:2)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        You couldn't be more wrong.
        
        Re: (Score:2)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        That's not correct. You directing your son to click the button is no different than you directing him to commit a crime. The culpability and responsibility rests with you. You will be held to the contract in the former case and charged with a crime in the latter. Great parenting though!
      - Re: (Score:3)
        
        by im_thatoneguy ( 819432 ) writes:
        
        Not criminal but breach of contract is grounds for a civil cause of action.
  - Re: (Score:2)
    
    by Cajun Hell ( 725246 ) writes:
    
    no, scripts can log in. with sites having millions of users you can make as many logins as you need, it's a whack-a-mole the site can't win
    There's no rule that says getting login credentials needs to be trivial. Can you make a throw-away account at your bank?
    LinkedIn can authenticate people if they want, assuming they don't mind having a barrier to entry that keeps people from using their site. But keeping people from using their site does seem to be the agenda item here...
  - Re: (Score:2)
    
    by GerryGilmore ( 663905 ) writes:
    
    Sure, scripts can login - if you have a valid login credential. If you do, obviously you can scrape away. If not, well....
  - Re: (Score:2)
    
    by h33t l4x0r ( 4107715 ) writes:
    
    Not if they verify with SMS, which I believe they do.
- what about wifi scanning just looking for ssid's (Score:2)
  
  by Joe_Dragon ( 2206452 ) writes:
  
  what about wifi scanning just looking for ssid's is on by default on many os's
Happens in other industries too (Score:5, Interesting)

by ErichTheRed ( 39327 ) writes: on Monday July 31, 2017 @01:54PM (#54915075)

Airline websites have this same problem -- the online "cheap ticket" engines regularly scrape the publicly available data by essentially running the "book a trip" workflow millions of times to try to pull the entire set of fares for different city pairs. It's a cat-and-mouse game because the information has to be available for normal humans to book trips; no one is going to solve a CAPTCHA to look up fares. Basically these engines are looking for any irregularities like mis-filed fares or fares that happen to be a particularly good deal. (Airlines have to publish their fares in advance and make them available to online sources that are available to travel agents. This is why you'll occasionally see stuff like a transatlantic business class ticket for $50 or similar...)
I'm not sure if LinkedIn can actually bar someone from scraping their public data. If that was the case, no one could run wget on a website and pull down all the static content.

Share
twitter facebook
- Re: (Score:2)
  
  by shuz ( 706678 ) writes:
  
  I have direct experience with this myself.
  This is why companies like Akamai have products geared specifically for this problem. However stopping bots is nearly impossible unless you deal with them on a realtime basis. It would be interesting if Linkedin could get the entire world to make website scrapers illegal and then actually enforce that illegality. As of now when a bot owner is shutdown they just move the operation overnight to the ISP that will take their business in the same country or move countrie
  - Re: (Score:2)
    
    by viperidaenz ( 2515578 ) writes:
    
    Wouldn't it just be easier to run your bots through multiple VPN's with endpoints in different countries?
  - Re: (Score:2)
    
    by h33t l4x0r ( 4107715 ) writes:
    
    No offense but you are a complete noob if you're trying to scrape sites without connecting through proxies. LinkedIn will start sending 403's almost right away.
This is bonkers! (Score:5, Interesting)

by Zobeid ( 314469 ) writes: on Monday July 31, 2017 @02:05PM (#54915205)

Here's why it seems bonkers to me. . . When you access a website, you are merely sending that site a request for information. That's all. Assuming it responds with the requested information, one must presume that's because the operator (and, by proxy, the owner) of the website set it up for that purpose. So what we have here is effectively. . .
LinkedIn: Don't request information from us!
hiQ: Please send the following information.
LinkedIn: OK, here you go.
LinkedIn: Dammit, you requested information after we told you not to! WE'RE GONNA SUE!!

Share
twitter facebook
- Re:This is bonkers! (Score:5, Interesting)
  
  by bluefoxlucid ( 723572 ) writes: on Monday July 31, 2017 @02:19PM (#54915323) Journal
  
  Actually, LinkedIn has a point.
  LinkedIn supplies service to the public at-large, in the same way that a MicroCenter supplies retail service to the public at-large. All members of the public are allowed to enter a MicroCenter. You walk up to the doors and they open automatically.
  You can be trespassed for no reason by a retail center or other physical location open to the public at-large. The doors still open to you, but you're not allowed in. It's the same with a Web site: it's difficult in-practice to establish a verifiable packet identity on the Internet. IP addresses change, and you can do goofy shit like put the data scrapes in AJAX requests to distribute their source.
  In other words: you're by default authorized to access LinkedIn's public assets. You're not allowed to access stuff requiring a logged-in session until you've gotten log-in credentials, because there are actual systems in place to stop you from doing that, implying that you're not supposed to force access there. Basically, civilized understanding of the expectations of your host on the face.
  If LinkedIn tells you to stop, you've now had your authorization revoked. You can't claim a restraining order is invalid because someone's outside and you can also be anywhere outside, and you also can't claim that LinkedIn can't de-authorize you unless they specifically identify and block you. Blocking an individual entity from a Web site is hard and has collateral damage.
  So the CFAA is actually a valid vehicle here, since "abuse" is essentially defined as "accessing a system to which you are not authorized." The reasonable person test holds up a lot of behavior, largely because it's unreasonable for a person to determine if a certain behavior or function on a Web site might not be something they're allowed to touch, or whatnot, given the reasonable behavior of people at-large. A lot of stuff happens that won't pass CFAA as fraud or abuse, even though it's inconvenient and unintended. By the same token, when somebody has told you to stop accessing their systems in a certain way and you do it anyway, a reasonable person might assume you were, you know, told not to, and not allowed to do that, and that you know damned well you're not allowed to do that.
  That's not to say threats, lawyers, and other anti-social behavior are good business. Poor diplomacy here. Effective in the legal field, but not your best option.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Train0987 ( 1059246 ) writes:
    
    Then blacklist IP's at the firewall(s) for endpoints that are scraping your site.
    - Re:This is bonkers! (Score:4, Insightful)
      
      by tattood ( 855883 ) writes: on Monday July 31, 2017 @02:53PM (#54915581)
      
      Then blacklist IP's at the firewall(s) for endpoints that are scraping your site.
      IP addresses are fairly easy to change. You can use something like TOR, so your public IP always changes.
      
      Parent Share
      twitter facebook
    - Re: (Score:3)
      
      by bluefoxlucid ( 723572 ) writes:
      
      Let's try this again.
      it's difficult in-practice to establish a verifiable packet identity on the Internet. IP addresses change, and you can do goofy shit like put the data scrapes in AJAX requests to distribute their source.
      Blocking an individual entity from a Web site is hard and has collateral damage.
      Wikipedia has tried this, with collateral damage and limited success. I've seen people get sent to jail for harassment and legally barred from accessing certain sites and systems under restraining order, and then continue to access them with no reasonable way to prove their identity (i.e. could be someone else pretending to be said person).
      These days, it's different. Those IP addresses are probably automatically-assigned or internal to cloud infrastructure. IAAS may share address
  - Re: (Score:2)
    
    by Ichijo ( 607641 ) writes:
    
    Except when you enter a MicroCenter, you are stepping foot on their property. When you anonymously request a public web page from a web server, you're standing on the public sidewalk at the walk-up window. Since you as a taxpayer own that sidewalk, can the store owner restrain you from your own property as a way to make you stop placing orders at the window?
    From TFA:
    [Orin Kerr, a legal scholar at George Washington University] argues sites wanting to limit access to their site should be required to use a tec
    - Re: (Score:2)
      
      by bluefoxlucid ( 723572 ) writes:
      
      When you request a public Web page, you're accessing and using their machinery.
      My entire argument was "available to the public" versus "except you; you get the hell out right now." A technical mechanism is infeasible: if they want the data to be publicly-viewable and don't want people to do certain things, then a password doesn't work; and firewalls and the like will have to contend with modern global, auto-scaling, IP-changing data centers where you can't just single out a particular actor by IP addres
  - Re: (Score:2)
    
    by viperidaenz ( 2515578 ) writes:
    
    A trespass notice can't stop you looking at MicroCenter from a public space.
    If you want to restrict someone in a public space, you need a restraining order from a judge.
    Transferring that to the internet, a cease and desist letter is like a trespass notice. Probably appropriate for telling someone to stop creating new logins to access restricted content after you disable their old ones.
    Asking a judge for an injunction would be appropriate to stop someone accessing publicly available content. Of course, this
    - Re: (Score:2)
      
      by bluefoxlucid ( 723572 ) writes:
      
      Actually, transferring that to the Internet, you have to walk into the MicroCenter and turn the display around, then go back outside the window and look at it again to get a view of what's there. Every time you want to see it, you have to walk inside, fiddle with things, then walk back out.
      You do know that nothing is actually "on the Internet", right? Do we need to explain to you how the Internet works?
  - Re: (Score:2)
    
    by Frosty Piss ( 770223 ) * writes:
    
    LinkedIn supplies service to the public at-large
    OK, there's where you're wrong.
  - Re: (Score:2)
    
    by American Patent Guy ( 653432 ) writes:
    
    You're not allowed to access stuff requiring a logged-in session until you've gotten log-in credentials, because there are actual systems in place to stop you from doing that, implying that you're not supposed to force access there.
    Actually, if the scraper used a valid username and password (or other valid credentials) to gain access, access was authorized. It might have violated a user agreement perhaps, but that's a separate civil matter. The Computer Fraud and Abuse Act specifies criminal acts that a private entity (like LinkedIn) can't use as a basis for its suit.
    - Re: (Score:2)
      
      by bluefoxlucid ( 723572 ) writes:
      
      The point wasn't that they used a password; there was a further point down that LinkedIn had de-authorized them from non-password-protected mechanisms: they told them they're now specifically not allowed to do that, which means they're not.
      Imagine if you ssh'd to a bank's accounting system across the 'net and found that it just lets you log in as root, no password. Is that also legal?
- Re: (Score:2)
  
  by bws111 ( 1216812 ) writes:
  
  Many stores have doors that you can open by pushing a button. Assuming the door opens, one must presume that is because the management (and, by proxy, the owner) of the store has set it up for that purpose. So what we have here is effectively..
  Store: You have been banned from this store. Do not come back
  You: Push the button
  Store: Door opens, you go in
  Store: We told you to stay out, we're having you arrested for trespassing
  This, of course, happens all the time (except for the idiotic assertion that the
- Re: (Score:2)
  
  by Gavagai80 ( 1275204 ) writes:
  
  Trying to make it illegal to scrape the data is beside the point -- what linkedin really wants to do is prevent others from publishing the data. Just because you can find a book in the library and the book doesn't fire lasers at your eyes to blind you and stop you reading it doesn't mean you have permission to sell your own book which consists of photocopies of that book with a few small changes.
Give all a bit of trust and get ripped off (Score:2)

by Trax3001BBS ( 2368736 ) writes:

I refer to the Robot.txt used to tell search engines what's out of bounds. http://www.searchtools.com/rob... [searchtools.com]
- Re: (Score:2)
  
  by HornWumpus ( 783565 ) writes:
  
  But they want to be indexed by Google, just not by they company that tells employers their staff is looking.
  The solution is just to never, ever, stop looking. Even if you love your job, having a current resume on Linkedin will get you better raises.
  - Re: (Score:2)
    
    by Mandrel ( 765308 ) writes:
    
    But they want to be indexed by Google, just not by they company that tells employers their staff is looking.
    A robots.txt file can state which HTTP User Agent strings are allowed. For example, Slashdot only allows [slashdot.org] access by certain search engines. If you're starting a new one, you have to misrepresent yourself, or you're buggered. The question is when such misrepresentation is legal and moral, and whether it is instead up to sites to more accurately detect who they want to serve, and serve errors to those they don't.
    The solution is just to never, ever, stop looking. Even if you love your job, having a current resume on Linkedin will get you better raises.
    Again it pays to be the selfish squeaky wheel. The basis of advertising.
They just went about this the wrong way (Score:2)

by 93 Escort Wagon ( 326346 ) writes:

Now if LinkedIn had instead posted "ecto gammat", all the nerds would be in their corner.
Pot, Kettle. (Score:2)

by AnotherBlackHat ( 265897 ) writes:

LinkedIn's whole business model is "scraping" information from people. It's not like they pay people to enter that information.
When CDDB tried this sort of B.S. it led to FreeDB. Maybe LinkedIn being assholes will lead to something similar.
HiQ (Score:2)

by tylersoze ( 789256 ) writes:

Can we talk about what HiQ is doing with the data for a sec? "HiQ scrapes data about thousands of employees from public LinkedIn profiles, then packages the data for sale to employers worried about their employees quitting" I mean WTF?
- Re: (Score:2)
  
  by American Patent Guy ( 653432 ) writes:
  
  Like it or not, if you (or an employee in your example) choose to publish information about yourself in a publicly-accessible place, then you've voluntarily relinquished whatever privacy rights you had in that information. Whatever you believe about HiQ, they are only organizing and re-releasing public information. LinkedIn has no copyright in it (as they didn't create the data, nor is it a work of authorship), and they were complicit in the act by delivering it up upon request.
No standing (Score:2)

by American Patent Guy ( 653432 ) writes:

The Computer Fraud and Abuse Act is part of the Federal Criminal Code, and no private entity can use it to bring a suit. A prosecuting attorney for the government could make a criminal charge, but LinkedIn would have to persuade him/them to take that act. This is much ado about nothing.
- Re: (Score:3)
  
  by russotto ( 537200 ) writes:
  
  It has not been tested in court that the CFAA covers violating terms of use.
  Yes, it has, but only in the Central District of California as far as I know. The interpretation that the CFAA covers violating TOS was found to be overbroad in U.S. v. Drew, 259 F.R.D. 449 (C. D. Cal. 2009).

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

then dont' make it public (Score:5, Insightful)

Re:then dont' make it public (Score:5, Interesting)

Re:then dont' make it public (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:then dont' make it public (Score:4, Insightful)

Re:then dont' make it public (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:then dont' make it public (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Wrong! (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:then dont' make it public (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Exactly! (Score:2, Informative)

Scrape or Scrap? (Score:2)

I've done several scraping projects (Score:4, Interesting)

Re:I've done several scraping projects (Score:5, Interesting)

Re:I've done several scraping projects (Score:4, Informative)

Re:I've done several scraping projects (Score:4, Informative)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

what about wifi scanning just looking for ssid's (Score:2)

Happens in other industries too (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

This is bonkers! (Score:5, Interesting)

Re:This is bonkers! (Score:5, Interesting)

Re: (Score:2)