Lawsuit Stops Headline Scraping 85
Stephen Larson alerts us to the out-of-court settlement of Gatehouse v NY Times, a lawsuit that attempted to stop the Boston Globe from linking to headlines and excerpting initial sentences from a competitor's Web site. At issue was the Globe's practice — barely distinguishable from those of Google News, Yahoo, and others — of linking to another news source's coverage of local news. The upshot is that the Boston Globe will stop the linking. No judicial precedent was set, because the case was settled before reaching a judge.
Lawsuit Stops Headline Scraping (Score:5, Funny)
Re:Lawsuit Stops Headline Scraping (Score:5, Funny)
Re: (Score:2)
Summary is wrong. Massachusetts District Court cannot set binding precedents [wikipedia.org], which are "findings of law made by a higher court" such as Appeals or Supreme Court that must be honored by a lower court. The lowest court on the hierarchy cannot set a precedent even if this had gone to court!
This seems like a case of "pay off the annoying buzz" rather than anything important in a legal sense. (IANAL, etc., etc.)
Re: (Score:2)
Read the rest of your wikipedia link. Perhaps the summary has been changed, but precedent is precedent, whether binding or not. In this case, because it was settled, absolutely NO precedent is set. Had it proceeded through trial, persuasive precedent could have been set, and if appealed, then even binding precedent could be set. By settling, they avoid all of this.
No Stopping (Score:2)
Google News [google.com]
Web fundamental (Score:3, Insightful)
Since links are so fundamental to the web, wouldn't it be easier if they just GTF off the internet rather than bother with these lawsuits?
Re:Web fundamental (Score:5, Insightful)
I don't know. Screen scrapers can be pretty fucking irritating. Particularly in the parallel case of support forums. It's a problem when you want to search for a problem with some code or a database and the first eight hits are all the same post on different "forums", (usually all ripped off Usenet). How do you know if the replies are the same on all threads. What if *you* want to reply? Which site do you use? And they obscure different answers just through drowning them out. Ideally, I want a Google or Yahoo search engine plugin which will let me exclude all the scrapers.
Re: (Score:2)
Re:Web fundamental (Score:5, Insightful)
I agree, but not everything that is annoying should be made illegal.
Re: (Score:1)
I agree, but not everything that is annoying should be made illegal.
While I agree with you now, years ago my lil sister alone would of caused me to find a way of arguing that all things annoying indeed should be made illegal. LoL
Re: (Score:3, Interesting)
Re:Web fundamental (Score:5, Informative)
I would give almost anything to have a blacklist of domains I could set while logged into google so that those never showed up in my searches ever again...
Exactly what you are looking for, Google's customizable search engine:
http://www.google.com/coop/cse/ [google.com]
Re: (Score:2)
Re: (Score:3, Interesting)
Re: (Score:1)
It's a pain, but you can manually remove sites from search results with "-". For example, "yoursearchterms -www.blacklistedhost1.com".
Re: (Score:2)
I find it a great filtering mechanism actually. When you see the same thing showing up in multiple forums... blacklist those forums.
Like muting trolls in irc or reading /. at +3... I find it greatly increases the signal:noise ratio.
Re: (Score:1)
A similar issue is websites that have dummy pages for all kind of things. E.g. when you enter the name of any videogame into google you'll find tons of entries for the likes of IGN, Gamespot, etc with each entry saying they have reviews, previews, etc, even if they have NOTHING on the game. There's no way to tell whether they have anything without clicking it because the excerpt Google shows is just the generic header for all pages.
Re: (Score:2)
I thought Google Groups was for searching Usenet? I think they did add their own groups, and you have to click the "more" box, but last time I used it, I think it worked fine. No low quality web forms...
I have been thinking about a P2P style search system. Why not have all your friends and friends of friends help index sites, then you wouldn't have to depend on some centralized automated system to find sites. I think it could be done with a daemon and firefox plugin. The browser automatically indexes page
Re:Web fundamental (Score:5, Insightful)
This isn't really about the links, though, is it? On a news site, the effort required to identify a story and get the key facts right is a large part of the value of the site. If someone else can come along and copy the headline and intro, they've got most of that same value for nothing. They are just parasites, damaging the people who are doing the real work, and not even adding any useful value for society more generally. This is why places with sensible copyright laws judge fair use by criteria other than just the size of the excerpt.
Re: (Score:2)
You mean like the US? Where the "amount and substantiality of the portion used" is just one of four factors used for fair use analysis?
Re: (Score:3, Informative)
I took a look at this when the first article came out. The plaintiff's site has an RSS feed. The defendant's site looks like it was aggregating the headlines and initial sentence or so of several locally relevant news site
Re: (Score:2)
Conflicting interests (Score:4, Insightful)
Yes I'm that cynical (in the case of the news industry at least).
Re: (Score:1)
Re: (Score:2)
...flip the voting preferences of 0.1 percent of your viewers by showing the opposing candidate's face from a disadvantageous angle (yes I made that number up).
That's OK, 78.32154% of statistics are made up.
Re:Conflicting interests (Score:5, Insightful)
Actually deep linking helps you in the short term.
You end up with a site like boston.com sending their own customers to your site where their customers read your news articles and you get revenue because you get paid when they see your ads.
Objections to deep linking come from the flawed idea that without deep linking the customers would have come to the main page and read the ads there before going on to the page in question. I find it much more likely that they would never have known about the article at all.
Whoever filed that lawsuit needs to be fired.
Re: (Score:1)
"Whoever filed that lawsuit needs to be fired."
Why? They won.
Re:Conflicting interests (Score:5, Insightful)
They won a battle. It doesn't mean that they've won the war. Especially since the settlement was out-of-court, so the legality of their action hasn't truly been tested. (There are many reasons for settling out of court - you know you can't win; you know can win but it won't be worth the price, you might win but the cost of the judgement against you plus legal fees will be higher than what the other party is willing to take in settlement, etc.)
And as other people have pointed out in this thread, there's a good chance that deeplinking actual drives increased page views by sending people directly to content they are interested in rather than relying on them to find interesting content on their own via the site's main page.
Re: (Score:1)
Read the complaint. This case isn't about linking it's about the unauthorized copying of headlines and first sentences of articles from the plaitiff's website and the us of their trademarks without permission. Obviously, I don't know the terms of the settlement, but this seems open and shut to me, so I'd suspect they've gotten most of what they want; either an injunction, damages, and/or some sort of ongoing deal with boston.com.
Re: (Score:2)
Exactly. WHo else is going to use their RSS feed now? They killed the golden goose, and devalued their online asset, by pursuing someone who was politely playing by the rules.
They claim they were losing money (Score:2)
FTA:
GateHouse had claimed Boston.comâ(TM)s actions violated copyright and trademark laws. Boston.com provided links that sent readers directly to âoeWicked Localâ stories, meaning readers bypassed ads posted on GateHouse home pages, GateHouse claimed.
Re: (Score:2)
Objections to deep linking come from the flawed idea that without deep linking the customers would have come to the main page and read the ads there before going on to the page in question. I find it much more likely that they would never have known about the article at all.
Indeed.
But what they want is to force people onto their main page, thinking they'll go through their intended cycle of page views to get to it, generating more revenue.
Which is another flawed idea.
Fortunately, darwinism ought to weed out businesses that spend resources of flawed ideas.
Re: (Score:2)
Despite the financial troubles, The New York Times gets it. That's why they made subscription free and have pages like this one [nytimes.com], which basically let people know what articles are getting deep linked. That's also probably part of the reason they took this case to court.
Re:Conflicting interests (Score:4, Insightful)
Not quite - newspapers, magazines etc exist to sell advertising space. The editorial content is just there to support this aim. It was different prior to the early 70's when magazines/papers genuinely existed to provide information to the readers but now that's a byproduct.
Wrong (Score:3, Insightful)
the real reason of operating a newspaper or site is to make your audience see the world through your goggles.
No it isn't. The real reason is to make money. If your competitor is stealing your work and using it for their own financial gain, I think you have a right to be pissed off and sue.
You give the media too much credit -- its motives are surprisingly shallow. It doesn't really care what you think, you are free to agree or disagree, as long as you are reading/watching/listening, and of course, paying attention to those wonderful advertisers who make the whole thing possible.
General law about search and link services? (Score:2)
- Does it work in law to say "He's doing it too"?
- Isn't it "unconstitutional" (illegal) to have a law that applies to lots and is only enforced on some?
Re: (Score:2, Insightful)
The law is irrelevant: they settled out of court. Nothing to see here.
How is law irrelevant in settling out of court? Do you think they settle based on their personal morals? On a coin toss?
Obviously the settling is legal but the fact that they settled means there was a chance of losing the case in court.
Re:General law about search and link services? (Score:5, Insightful)
Re: (Score:2)
No, they used the following algorithm:
if (expected_court_cost > requested settlement amount){
settle();
} else {
fight_this();
}
Re: (Score:2)
Legal != Just
Settlements are allowed for good reason. Otherwise there would be an awful lot more SCO like attempts as a means of damaging an opponent. If they can't "settle" to just get you to go away you end up draining them constantly (and of course yourself but still, you might win). But Settlements are not Legal Precedence.
Re: (Score:2)
No, to both of your questions. Try this with a cop the next time that you get pulled over speeding. Not only will you get a ticket, you'll get the cop mad and he will find probably half a dozen other things that you can additional tickets for.
This is ridiculous (Score:5, Informative)
FTA, it sounds like Gatehouse see this as a copyright violation but, as several other posters have pointed out, the same thing goes on on news aggregator sites all the time. In fact most stories on Slashdot contain snippets from other sites. It's an unavoidable and very useful facet of the web
This is yet another example of 'old' media not really understanding online practices. Most sites benefit tremendously from others linking to them - look at what happens with Slashdot. That is, unless the 'benefit' is so great that their server turns to dust.
Re: (Score:2)
Re: (Score:1)
Quite.
A simple nofollow type flag would do it. (Maybe there already is one).
Re: (Score:2)
While I agree that the benefits can be great it doesn't necessaryly mean they are always great. The problem here, I think, is that Gatehouse feel that there are a, possibly significant, number of people that just browse the headlines. Those people can go to the Boston Globe and browse Gatehouse headlines with no interaction with Gatehouse at all.
Re:This is ridiculous (Score:5, Interesting)
FTA, it sounds like Gatehouse see this as a copyright violation but, as several other posters have pointed out, the same thing goes on on news aggregator sites all the time.
Which doesn't make it any less of a copyright violation. "Him too" is not a defence in law.
In fact most stories on Slashdot contain snippets from other sites.
And sometimes Slashdot does go too far, but at least it's in a grey area, with original content and editorial control as well. Presenting factual information is one thing. Mechanically cloning another's work and using their exact words, while adding no value at all of your own, is another.
It's an unavoidable and very useful facet of the web
What is, the using links part, or the mechanical copying without adding value part?
This is yet another example of 'old' media not really understanding online practices.
It sounds to me like yet another example of 'new' media thinking that by being on the Internet they are somehow exempt from the law.
Most sites benefit tremendously from others linking to them - look at what happens with Slashdot.
In this context? I'd like to see some evidence of the benefits the people doing the original work derive in this sort of case, please.
By the way, Slashdot is a particularly unfortunate example, since people not reading the original article is a running joke and "Slashdot Effect" is not a term used to describe an abundance of ad revenue giving your business a huge boost.
Re: (Score:2)
The only thing that makes the "Slashdot Effect" not an abundance of ad revenue is that the site couldn't serve the spike in traffic. If it could then it would have served the ads as well and hence the revenue would boost.
And it being able to is in the hands of the site owner and not slashdot.
Re: (Score:2)
Agreed with everything you said except this last sentence. If people didn't really read the article then there would be no slashdot effect. Most of the urls linked from Slashdot don't go down and I'm sure -do- gain an enormous amount of traffic from the link.
Laches? Fair Use? (Score:3, Interesting)
Which doesn't make it any less of a copyright violation. "Him too" is not a defence in law.
Actually Laches [wikipedia.org] could be a defense. If the plaintiff did not sue other entities that engaged in this practice and then the defendant on seeing that the plaintiff didn't sue also engaged in that practice but the plaintiff suddenly decided to sue the plaintiff but not the other entities, then the defense could claim a laches defense.
(That is in theory, however the facts of this case probably don't support laches because (1) google/yahoo/etc are not competing with the newspaper but the other newspaper is thus
Re: (Score:2)
"Slashdot Effect" is not a term used to describe an abundance of ad revenue giving your business a huge boost.
When your site is "slashdotted", it means demand for your site's content has far exceeded your server infrastructure's ability to supply it to the user.
Now, Slashdot Effect traffic is very much burst-shaped, so if you ever only got hit once there's no good reason to increase your standing capacity to 4000000% of its normal sustained rate. But if it happens repeatedly, it's a sign that your company
Re: (Score:2)
This is yet another example of 'old' media not really understanding online practices.
It sounds to me like yet another example of 'new' media thinking that by being on the Internet they are somehow exempt from the law.
You sound like one more person who fails to understand the concept of fair use and that old laws are not written with new technological possibilities in mind.
There's a BIG difference between taking content and linking to content: The latter is fair.
Most sites benefit tremendously from others linking to them - look at what happens with Slashdot.
Slashdot is a particularly unfortunate example, since people not reading the original article is a running joke and "Slashdot Effect" is not a term used to describe an abundance of ad revenue giving your business a huge boost.
So if [traditional media] would regularly send people "en masse" to [traditional location] in numbers that exceed the room capacity, you wouldn't consider that a huge boost, on account of the physical limitations of the location setting a limit to the amount of e
Re: (Score:2)
You sound like one more person who fails to understand the concept of fair use and that old laws are not written with new technological possibilities in mind.
You might like to reflect on what you wrote there, until you understand the irony.
Re: (Score:3, Interesting)
You sound like one more person who fails to understand the concept of fair use and that old laws are not written with new technological possibilities in mind.
You might like to reflect on what you wrote there, until you understand the irony.
New technologies do not negate fair use, it just adds new uses. Some of which are fair, and some not.
I maintain that linking with an extract is fair:
The 1961 Report of the Register of Copyrights on the General Revision of the U.S. Copyright Law cites examples of activities that courts have regarded as fair use [copyright.gov]: âoequotation of excerpts in a review or criticism for purposes of illustration or comment; quotation of short passages in a scholarly or technical work, for illustration or clarification of the
Re: (Score:2)
I'm not arguing that new technologies negate fair use. On the contrary, I think the US fair use law, based on four general principles rather than numerous specific circumstances, is an excellent example of a law that does adapt to new uses well.
I just happen to consider that in this case, the use probably isn't fair according to those criteria. I base this on the four criteria themselves, rather than an analogy to a case nearly five decades ago: if I were the lawyer in court, I would argue that the excerpts
Re: (Score:2)
in this case, the use probably isn't fair according to those criteria. I base this on the four criteria themselves, rather than an analogy to a case nearly five decades ago
In a common law system (which goes for Boston and New York), the rulings of the past are as important as the original text, so you can bring them up to refute claims based on the original text...
But if I were to argue the 4 points, I'd pit the commercial use against the increased value through linking (1 and 4), and I'd argue the merits of the other 2 points to a draw.
Basically, the first point is obviously going to the complaint, but I'd argue that using the material for profit in a different market and li
Re: (Score:3, Interesting)
Depends on what you mean by "the same thing". First off, Boston.com is not a news aggregator. They are a news generator. They make money selling ads because theoretically someone wants to see the content they generate (and up until now at least, the Boston Globe staff has produced quite a lot of important news that people want to read, the whole expose on presidential signing statements was broken by a Globe reporter). The main problem here is
Re: (Score:2)
Google News (since it's the only news aggregator I use) sells no ads next to any page under the news.google.com subdomain that I've been able to find. Yes I just looked. No ads on the news search pages, no nothing, not even when I turned javascript on for google.com.
I just looked too. [google.com] I counted six ads on the right side of that page listed under "Sponsored Links": "Obama Inauguration Art", "Free Obama $500 Gas Card", "Obama Inauguration Date", "Prepare to be Shocked", "Is Barack Obama Dumb?", "Was Obama A Good Choice?". The second page had two more.
But links to your site are good ... (Score:3, Insightful)
Linking to other media sites is a common feature of many news sites. BBC News has links to other site's reporting for stories. It's just a headline and link, nothing special.
That link boosts the other site's search rankings, and every click-through is a reader that they didn't have before, and an ad-hit, and maybe a repeat visitor.
Taking the headline and the entire article is a different issue altogether, but I don't think that is the situation in this case. It is like all the Belgian (?) newspapers that want to have zero online presence or searchability. It makes no sense! You either participate, or you fade away on the fringes. That's why there is a "web" in "world wide web". Why be a bit of gossamer drifting on the wind when you can be in the web and actually be useful?
Re: (Score:2)
Comment removed (Score:3, Informative)
Wondering who paid for this?... (Score:1)
Who understands these headlines anyway? (Score:2)
Creating newspaper headlines seems like the only job where creating puns and word play is a requirement. I almost feel sorry for interviewers. But are you really getting anything from scraping a headline?
"PATERSON LYIN' KING OF STATE"
"LOON FLIES COOP"
"PETT-Y CASH"
And these were just todays. Bonus point for guessing the actual story.
Re: (Score:2)
Obama Signals New Tone in Relations With Islamic World
Gates Sets Modest Goals for Afghanistan
New York Says Health Chief Abused Power
are all headlines that make sense.
Re: (Score:2)
Obama Signals New Tone in Relations With Islamic World
Ring Tones for helping relations with the Islamic World?
Gates Sets Modest Goals for Afghanistan
Bill Gates has goals for computer sales in Afghanistan?
New York Says Health Chief Abused Power
Ok, you got me on this one.
Why this makes sense. (Score:3, Informative)
Unlike Google News, the Boston Globe is, itself, a news-reporting organization. Mixing their own stories with those from competitors can lead to confusion. I didn't manage to see the offending page before they took down their linked stories; but I imagine it was done in such a way as to have the original source difficult to identify.
A pure aggregator service, like Google News, is different because it is rather obvious that ALL it is doing is aggregating. There is no 'new reporting' being done by "Google".
When will Gatehouse stop doing it to us? (Score:2)
I work for the Democrat and Chronicle, and Gatehouse scrapes our headlines out of our RSS feed for their right-hand sidebar's "Regional News" block.
Bloody hypocrites.
Re: (Score:2)
Re: (Score:2)
I agree - it's an absolutely moronic business decision. I'm just disappointed the NY Times didn't tell Gatehouse to stuff it and proceed to court - I'd love to see their response when The Batavian got brought up.
Shortening... (Score:1)
may stop the headline scraping.
YOUR headline is off the mark (Score:1)