Judge Says LinkedIn Cannot Block Startup From Public Profile Data (reuters.com) 166
A U.S. federal judge on Monday ruled that LinkedIn cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public. Reuters reports: U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles. The dispute between the two tech companies has been going on since May, when LinkedIn issued a letter to hiQ Labs instructing the startup to stop scraping data from its service. HiQ Labs responded by filing a suit against LinkedIn in June, alleging that the Microsoft-owned social network was in violation of antitrust laws. HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit. "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers," Chen's order reads. Meanwhile, LinkedIn said in a statement: "We're disappointed in the court's ruling. This case is not over. We will continue to fight to protect our members' ability to control the information they make available on LinkedIn."
Huh? (Score:5, Insightful)
We will continue to fight to protect our members' ability to control the information they make available on LinkedIn
If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.
Besides, given LinkedIn's past behavior with scraping people's contacts/address books on their PCs and email accounts, it has no lessons to give anyone else.
Re:Huh? (Score:5, Interesting)
If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.
Besides, given LinkedIn's past behavior with scraping people's contacts/address books on their PCs and email accounts, it has no lessons to give anyone else.
LinkedIn doesn't give a good goddamn about "what users want to protect", and their "past behavior" is the proof. LinkedIn cares only about having exclusive use of that mine full of data, (except for the bits and pieces that users gather about each other), because it doesn't want potential competitors to eat a slice of the pie they've come to think of as belonging entirely to them.
Re: (Score:2, Insightful)
Whether or not the user made the info public, does this ruling affect how a website or service can regulate third parties and the extra load they create?
Grabbing one users public info is a world of difference to grabbing a million users public info - LinkedIn may have a legitimate argument about undue additional load on their service as a result of scraping public info from them.
Re:Huh? (Score:5, Insightful)
Wrongo! Its their server. This ruling is *very* erroneous, and since I'm not in the job market, I'm going to be deleting my account now. Which is actually a shame, because I was using it to keep up with former workmates from previous jobs, but I'll be damned if I'm going to be handing my work history over to asshole companies that specializing in mining through other peoples bins looking for evidence to hang me with
Re: (Score:2)
Re: (Score:1)
It's not "technically" public. Any person in the world can view and download the same data as it IS public.
Re: (Score:2)
Re: (Score:2)
As a LinkedIn user, I'm actually fine with anyone scraping my data and using it. Whatever information I put on LinkedIn, I did so with the full intention of being available to the public at large. That's the whole point of LinkedIn, at least for me. It's a place to post your public resume + a way of maintaining professional contacts with colleagues. If it were not publicly view-able, I wouldn't have bothered, as I want potential employers to be able to find me.
Obviously, this is very valuable data, but
Re: (Score:2)
What kind of simpleton would give LinkedIn full access to their email account?
Much to many, considering that I get contact suggestions that only can come from the fact that the other person imported my eMail address somehow into their linked.in account.
Or how else should linked-in suggest one who I only know because I was sailing with him a year ago?
Re: (Score:2)
I guess I've never seen that the benefits outweighed the potential risks. E-mail security is absolutely vital to securing your complete online identity. Why someone would entrust that to a third-party is beyond me. If there's someone I want to get in contact with, I can generally do so without potentially compromising my email security.
No offense, as the "simpleton" crack was probably not appropriate. Different people have different priorities, I guess.
Re: (Score:2)
No, they are simpletons.
Re: (Score:1)
Re: (Score:2)
Even if the ruling goes against them I'm sure they can think of imaginative ways to fuck around with people scraping their site.
Re: (Score:2)
Back when I was on LinkedIn, so several years ago now, you used to be able to see who was viewing your profile. It was quite interesting to see who was looking at you. Mostly recruiters of course.
If that's still the case then copying the data to another web site means that users of LinkedIn can no longer see who is viewing their profile, or get an accurate "hit count" on the stuff that is public and available to non-logged-in viewers.
I don't know what controls LinkedIn has for privacy. Is public visibility
Re: (Score:2)
If users added their info, and made it public, it's not up to LinkedIn to decide what users want to protect.
It is not absolutely public. Users shared their information with LinkedIn, and possibly chose not to restrict it through privacy controls LinkedIn offers users, BUT LinkedIn themself gets to decide how "Public" their website actually is. That "Public" could very well mean only visible to users that registered and/or accepted some terms prior to viewing.
Re: (Score:2)
If the data they're hosting is uncopyrightable, and it's freely available to the public, then yes.
Re: (Score:1)
From Slashdot's TOS:
By sending or transmitting to us Content, or by posting such Content to any area of the Sites, you grant us and our designees a worldwide, non-exclusive, sub-licensable (through multiple tiers), assignable, royalty-free, perpetual, irrevocable right to link to, reproduce, distribute (through multiple tiers), adapt, create derivative works of, publicly perform, publicly display, digitally perform or otherwise use such Content in any media now known or hereafter developed. You hereby grant the Company permission to display your logo, trademarks and company name on the Sites and in press and other public releases or filings. Further, by submitting Content to the Company, you acknowledge that you have the authority to grant such rights to the Company. PLEASE NOTE THAT YOU RETAIN OWNERSHIP OF ANY COPYRIGHTS, TRADEMARKS AND SERVICE MARKS IN ANY CONTENT YOU SUBMIT.
I'm guessing Linkedin has something similar. By using their service, you give them permission to use your content and display it (or not display it) anyway they want. And your content is copyrighted.
Re: Huh? (Score:2, Interesting)
And your content is copyrighted.
Umm, no. You cannot copyright such data, so any such provisions are meaningless. As the federal court has just reaffirmed for the upteenmth time.
Re: (Score:1)
And your content is copyrighted.
Yeah, by me, not Slashdot.
Re: (Score:2)
Just because information can or cannot be copyrighted doesn't give me the privilege of hijacking your printing press to do the actual copying.
The judge here screwed up. The courts have NO BUSINESS dictating to a website what information it can or cannot publish, and it has even less business attempting to turn the website into a mouthpiece.
LinkedIn should have the right to post what they please and block who they like from accessing it. Barring privacy issues.
Re: (Score:1)
They haven't. The justice simply said that HiQ cannot be blocked from otherwise public information. In other words, if the judge, at his desk can access the info without logging in, the HiQ should be able access the same information.
Re: (Score:2)
That makes sense.... LinkedIn can allow HiQ to access the information, BUT employ rate-limits to make sure they can't generate more usage than a normal network with humans would, AND employ Captchas/Bot-prevention countermeasures on IP addresses suspected to be something different than the rest of the public.
If HiQ wants to hire an army of humans to manually transcribe data (without an unusually large number of requests from one network), then all the power to them.
Re: (Score:2)
Under the covers however, the link and the javascript that controls it and the elements of containing visible text, and the layout in general could be engineered in a way to be a pain in the ass to read automatically and scrape into a coherent form. At the very least it would slow d
Re: (Score:2)
According to this, the judge is specifically saying that LinkedIn isn't free to use technical measures to block them.
Re: (Score:2)
They can just put in place technical measures to control all abuse of their services and all bots and then say they are not using any technical measures to specifically block hiQ.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
If the data they're hosting is uncopyrightable, and it's freely available to the public, then yes.
The issue with scraping non-copyrightable data is that it is Theft of service: violations of terms users agree to in order to access the resource.
The owner of the server/website pays for processing time and bandwidth, AND the owner of the server DOES NOT HAVE TO provide a free-for-all --- everyone who owns a computer/network has a right to direct who can use the resources provided by their server/equipmen
Re: (Score:2)
Re: (Score:2)
They should revise their statement then (Score:2)
By your logic, they should revise their statement then:
We will continue to fight to protect our data we extracted from our members and the ability to control the information they make available to us, here on LinkedIn"
Quick, there's still time for you to call them and tell them to revise it!
Re: Huh? (Score:2)
Either way, they shouldn't pretend they are doing their users a service.
Re: (Score:2)
LinkedIn are.... HiQ's use of the data is scary and ADVERSIVE to the users of the website.
They're essentially a surveillance service to help employers spy on workers to suggest when certain people might be a risk.
This is very big-brotherish, and should not be allowed in a more civilized society......
Re: (Score:2)
Actually, the big ones do not pay any network charges when you access their web sites, they get money for it! Search on peering agreements and you will see this is how it works.
Maybe it was inspired by telcos where the one that terminates the call bills the caller, it works the same way anyway.
Re: (Score:1)
Re: (Score:2)
B: Creimer's still waiting for the coffee money to roll in. He's focusing on making that Little Debbie money, first. At 25 cents per delicious, chewy Oatmeal Cream Pie, he should start making enough to buy 2 or 3 a month, soon!
When I get my June earnings at the end of the month, I can buy three cases [amzn.to] and still have enough change for a skinny vanilla latte.
Re: (Score:2)
5. Void Where Prohibited; Indemnification
Doesn't apply to what I'm doing. This is standard legal boilerplate to cover Slashdot's collective ass from legal liability.
Also look slashdot.org/robots.txt
Doesn't apply to what I'm doing. My Python script isn't a web crawler and I'm scraping my own comments. If you look at the bottom of each Slashdot page: "Comments owned by the poster." I'm just recovering my own intellectual property that I freely shared with the Slashdot community.
If you seriously believe that I'm violating the Slashdot TOS, file a compliant with management. However
Re: (Score:2)
Doesn't apply to what I'm doing. My Python script isn't a web crawler and I'm scraping my own comments. If you look at the bottom of each Slashdot page:
"Comments owned by the poster." I'm just recovering my own intellectual property that I freely shared with the Slashdot community.
If you seriously believe that I'm violating the Slashdot TOS,
file a compliant with management. However, considering the shit that Anonymous Cowards get away with, I wouldn't hold my breath.
Your script is sure enough a robot! Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.
https://slashdot.org/robots.tx... [slashdot.org]
Your user-agent is *, so your robot should not access the following pages: /authors.pl /index.pl /comments.pl /firehose.pl /journal.pl /messages.pl /metamod.pl /users.pl /search.pl /submit.pl /pollBo
User-agent: *
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Re: (Score:2)
Re: (Score:2)
lol didn't you notice the word "guidelines" in my OP?
Re: (Score:2)
Your script is sure enough a robot!
Yet no tutorial on Python web scraping ever mentioned the robots.txt.
Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.
I'll let God sort it out since He has a better algorithm.
Re: (Score:2)
Your script is sure enough a robot!
Yet no tutorial on Python web scraping ever mentioned the robots.txt.
Says the Unabomber: "Your honor, no tutorial mentioned that what I was doing was illegal..."
Whether /. tolerates it or not is irrelevant, your are still not being a nice christian by not following their robot.txt guidelines.
I'll let God sort it out since He has a better algorithm.
I am god you insensitive clod! A nice Christian at your church asked me to look over you in a prayer she made...
Re: (Score:2)
What shit do AC's get away with?
Dick pics.
[...] Amazon links that nobody clicks on [...]
Let me check... $1,000+ in merchandise this past weekend... not bad for links that nobody clicks on.
[...] while claiming you're going to buy a yacht [...]
Citation, please?
I know which way I'd go if I were you.
I'm here to stay. Especially since you ACs have convinced me that I could easily make coffee money while reading and posting as I normally do. You have no one to blame but yourselves.
Re: (Score:2)
Sure, I believe you. Maybe you should post a pic with proof of that on your blog, creimer. Then maybe we'd believe the utter bullshit you spout here!
https://twitter.com/cdreimer/status/897516205216604160 [twitter.com]
Re: (Score:2)
He's actually quite clever if his claims are true. It would never occur to me to monetize posting and interacting on here.
;)
Re: (Score:2)
A computer scaping will read all, causing a heavy load making the website performance poor.
Depends on how the web server is set up. When I run my Python script to scrape my Slashdot comment history, 16 pages can be requested at the same time. More than 16 pages, the server shuts down the connection.
Re: (Score:2)
Your phyton script should not know about that. Connection KeepAlive server settings like:
KeepAlive On
MaxKeepAliveRequests 50
KeepAliveTimeout 5
should be completely transparent to you. Your client library should transparently reconnect when it gets a Connection: close from the server. Heck, some sites don't even use keep alives (KeepAlive Off).
I have written such client software and I never bothered about MaxKeepAliveRequests setting on the servers and if KeepAlive was on, the libraries I used were doing the
Re: (Score:2)
An additional note; the same applies if you build an auto-refresh web page in ajax etc. Arrange so that you refresh the page more often than KeepAliveTimeout if you want connections to be re-used by your customer browsers.
Re: (Score:2, Informative)
Your phyton script should not know about that.
Someone on Slashdot complained that my script was taking to long to fetch, parse and save each page. So I rewrote the script to use a concurrent queue for each phase that launches 16 threads. Since 16 was the maximum number of threads that could launch without the web server shutting down the connection, I used that number for all the queues in the pipeline. It takes 30 minutes to process 733+ pages (11,000+ comments).
Re: (Score:2)
The real question is, why does any of this matter?
I've gotten quite a few requests for this script. It's a shame that Slashdot doesn't offer the functionality for users to download their own comment history.
Re: (Score:2)
So you spent 3.5 months refactoring your code [...]
I haven't touched my script in two months. After those five user accounts got deleted, I no longer needed to use the script that often.
https://www.kickingthebitbucket.com/2017/06/20/the-confessions-of-slashdot-asshats/ [kickingthebitbucket.com]
Re: (Score:2)
There should be no difference between a human reading the site and a machine. If it is able to be accessed by a person then it should be ok to scrape and aggregate it.
Do you also believe that there should be no difference between a person buying tickets to an event, and a bot doing so? That if it is able to be purchased by a person then it should be ok to use bots to buy up a few thousand tickets in a few seconds and artificially increase the price?
BTW, I agree with what you said; but while I was thinking about your comment that analogy crossed my mind. I'd like the people who use bots to buy up tickets to DIAF, yet I'm happy to let hiQ scrape LinkedIn data. Strange...
robots.txt (Score:2)
Read https://linkedin.com/robots.tx... [linkedin.com]
Especially at the end
Re: (Score:1)
Translation (Score:5, Informative)
Translates to
"We will continue to fight to protect our profits and our ability to control and sell the information they make available on LinkedIn "
Re: (Score:2)
"We will continue to fight to protect our profits and our ability to control and sell the information they make available on LinkedIn "
further translates to:
"sell the information they make available for free on LinkedIn
Re: (Score:1)
What's wrong with that?
LinkedIn paid little or nothing to get the data in the first place.
Re: (Score:1)
What's wrong with that?
LinkedIn paid little or nothing to get the data in the first place.
and?
My server, My rules (Score:2, Interesting)
LinkedIn's servers are their private property, and they should have the right to decide who can access them.
In the physical world, there are many places that are generally "open to the public", but they are private property, and the property owner can order you to leave and never come back. If you come back again it's called trespassing, and it's a criminal offense. You can and will be arrested, and if you go to trial, you will be convicted. It's well settled law.
I don't see why the LinkedIn situation is
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
A public profile is more like a item in a display window, if you display things in the windows of the store for people walking outside too see then it should be available to everyone, someone might go outside taking notes or images of what you have displayed to the outside.
No, a display window has no marginal cost per viewer, whereas a service like LinkedIn does. Crowds in front of display windows likely cause more people to come want to view the displays. Crawlers cause much higher loads on all sorts of backend systems compared to normal users. Each crawler has a real monetary cost to LinkedIn, and their usage may have a chilling effect on LinkedIn members.
Further, the aggregate of openly available data is often much more valuable than what it simply visible on a profile.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Stopping a physical trespasser is fairly straight forward. How do you stop a virtual trespasser?
firewalls, geo-blocking, security, rate and connection limiting, search limits, CAPTCHA's, client processor intensive scripts, interactive components etc etc. We regularly use a variety of those depending on the behaviour we are trying to block with bots that are trawling some of the sites I look after. It actually is quite easy to block virtual trespassers or at least make it very difficult for them to automate that.
Re: (Score:1)
Not showing information to users without logging in would be a good start.
What you cans see without logging in is public information.
Microsoft using Linkedin users as a human shield (Score:1)
Microsoft bought Linkedin to profit off of users data. Users on Linkedin specifically post info so it is shared. Most users were members long before MS bought the social network. I certainly didn't have any say in this purchase, or my data. I don't appreciate that they can buy my public data, 3rd party website or not, and then act holier than though about it.
I'm not sure MS could create a social network that worked based on their past history. They've already changed the behavior of the site to promo
They need a Public Profile API (Score:2)
Re: (Score:2)
So accessing the public profiles is to be allowed unless its done in such a way as to create unnatural load on their servers, something akin to a DDoS attack. They can set a throttle on hits per minute for programmed access. Or provide an API so HiQ and others can access the public profile info without impacting user facing servers, except the users get an additional profile security option to allow API access and default it to Off for everyone initially so they can opt in.
So, public data, except not accessible to the entire public and not on by default.
Sounds like a great way to give the host company a huge advantage on mining while pretending to give access to others. That API is worthless unless you restrict the host to the same requirements.
I took it half seriously (Score:2)
until the whopper at the end.
Am I the only one here... (Score:2)
The LinkedIn article is here [reuters.com].
Nothing in the ruling prevents... (Score:2)
Re: (Score:2)
Sounds like a barrier to me...
Re: (Score:1)
Good to probe the greedy hypocritical ToS (Score:2)
Linkedin wants to have their cake and eat it, too. The users post their data for all interested parties to see, unless they put some explicit restrictions (e.g. friends only). Linkedin then add all sorts of artificial limits on visibility, search, and god forbid you try to fetch that data with a script. Suddenly it is no longer the person's data shared as they want, but Linkedin's data intended for monetization.
I understand they have expenses incurred by careless bots. It is possible to traffic shape the
Their servers (Score:1)
LinkedIn should have a right to keep anyone from using their property - their servers.
Linkedin needs a better argument (Score:1)
The ruling is certainly a tradeoff for the Internet.
(Lowers content creation funding, but raises content access freedoms.)
I think on balance it's a good thing.
Here's the kernel of hiq's argument.
28. LinkedIn is thus improperly using the Computer Fraud and Abuse Act, the Digital
Millennium Copyright Act and related state penal code and trespass law, not as a shield – as
intended by those laws – to prevent harmful hacking and unauthorized computer access, but as a
sword to stifle competition and asse
we block people from scraping our clients' sites (Score:2)
We block people from scraping our clients' sites all the time, because it places excess load on the server.
We played cat and mouse with one for awhile ... eventually, they emailed a generic address with our client and said they weren't going to give up, so we should just make an easy to consume feed available to them. I laid it out to the client and said they might want to consider it, but they didn't go for it.
I can't imagine a court order mandating us to allow scrapers.
Re: (Score:2)
We played cat and mouse with one for awhile ... eventually, they emailed a generic address with our client and said they weren't going to give up
This is when you get your attorney to write up a Cease and Decist letter and reply back to the scraper's E-mail, AND now they have been warned and ordered by the owner of the property to stop, and further actions can result in a lawsuit or criminal charges regarding Unauthorized Access/Access In Excess of Authorization.
Clearly no one read the FA (Score:2)
Re: (Score:2)
Re: (Score:2)