


News Orgs Say AI Firm Stole Articles, Spit Out 'Hallucinations' (arstechnica.com) 20
An anonymous reader quotes a report from Ars Technica: Conde Nast and several other media companies sued the AI startup Cohere today, alleging that it engaged in "systematic copyright and trademark infringement" by using news articles to train its large language model. "Without permission or compensation, Cohere uses scraped copies of our articles, through training, real-time use, and in outputs, to power its artificial intelligence ('AI') service, which in turn competes with Publisher offerings and the emerging market for AI licensing," said the lawsuit (PDF) filed in US District Court for the Southern District of New York. "Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands."
Conde Nast, which owns Ars Technica and other publications such as Wired and The New Yorker, was joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media. The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work, or an amount based on actual damages and Cohere's profits. It also seeks "actual damages, Cohere's profits, and statutory damages up to the maximum provided by law" for infringement of trademarks and "false designations of origin."
In Exhibit A (PDF), the plaintiffs identified over 4,000 articles in what they called an "illustrative and non-exhaustive list of works that Cohere has infringed." Additional exhibits provide responses to queries (PDF) and "hallucinations" (PDF) that the publishers say infringe upon their copyrights and trademarks. The lawsuit said Cohere "passes off its own hallucinated articles as articles from Publishers." Cohere said in a statement to Ars: "Cohere strongly stands by its practices for responsibly training its enterprise AI. We have long prioritized controls that mitigate the risk of IP infringement and respect the rights of holders. We would have welcomed a conversation about their specific concerns -- and the opportunity to explain our enterprise-focused approach -- rather than learning about them in a filing. We believe this lawsuit is misguided and frivolous, and expect this matter to be resolved in our favor."
Further reading: Thomson Reuters Wins First Major AI Copyright Case In the US
Conde Nast, which owns Ars Technica and other publications such as Wired and The New Yorker, was joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media. The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work, or an amount based on actual damages and Cohere's profits. It also seeks "actual damages, Cohere's profits, and statutory damages up to the maximum provided by law" for infringement of trademarks and "false designations of origin."
In Exhibit A (PDF), the plaintiffs identified over 4,000 articles in what they called an "illustrative and non-exhaustive list of works that Cohere has infringed." Additional exhibits provide responses to queries (PDF) and "hallucinations" (PDF) that the publishers say infringe upon their copyrights and trademarks. The lawsuit said Cohere "passes off its own hallucinated articles as articles from Publishers." Cohere said in a statement to Ars: "Cohere strongly stands by its practices for responsibly training its enterprise AI. We have long prioritized controls that mitigate the risk of IP infringement and respect the rights of holders. We would have welcomed a conversation about their specific concerns -- and the opportunity to explain our enterprise-focused approach -- rather than learning about them in a filing. We believe this lawsuit is misguided and frivolous, and expect this matter to be resolved in our favor."
Further reading: Thomson Reuters Wins First Major AI Copyright Case In the US
This is rich. (Score:5, Interesting)
That last sentence of Cohere's statement just drips with liquefied bullshit.
Indeed. <shakes head/>
Let's look at this more closely:
How about, "You're appropriating our content en masse, reworking it, and passing off the resulting mish-mash shit as somehow our product and not yours." Sheesh.
Translation: "We're making a business out of repurposing your content and misrepresenting the results."
This is like playing the victim because the targets of your burglary had the audacity to file charges, instead of coming to you nicely and politely to talk about how much they really would have preferred it if you hadn't ransacked their house.
I understand that there are oodles of issues with current copyright law. That said, I don't think these AI shysters are being unjustly attacked for what certainly looks like their wholesale (soon to be retail!) misuse of content.
Re: (Score:3)
Re: (Score:1)
Re: (Score:2)
It is certainly not unreasonable.
They egregiously broke copyright law. Egregiously. And profited off of that. They have no remorse for what they did and want to continue into the future, not giving content creators anything.
F* them.
This isn't about "reworking the work of others". It's wholesale theft. If you can't see the difference you are too far gone. Cult much?
Re: (Score:1)
Re: (Score:2)
yes?
who is it that you think you're undermining by saying this?
Re: (Score:1)
> How about, "You're appropriating our content en masse, reworking it, and passing off the resulting mish-mash shit as somehow our product and not yours." Sheesh.
Which is exactly what Google did. So you gotta ask, why do these suits not name Google in all this?
Re: (Score:2)
Uh.
No, I don't need to ask.
I think that it's stupid that Google already have publishing deals with these companies, but I know they do. You could have found this out if you'd looked.
Re: (Score:2)
"Republishing" a web page verbatim - advertising and all - as associated with the original URL and as a backup for a way to click through to access the current article if it is still being made available to anyone who walks by is a lot closer to fair use than what most of these AI companies are doing. No one would use the cached version if the original publisher didn't take it down, or shut the website down, or was acquired by a new owner that wanted it placed behind a paywall, or changed the URL for no pa
Re: (Score:1)
Re: (Score:3)
This is wrong. Their business is to create AI tools, it is users who prompt models to do anything they do. LLMs are the worst infringement tools ever invented - they almost never reproduce exactly, cost money and are slow, while good old copying is free, instant and has perfect fidelity. Who would use AI to infringe?
Wow, it must've really been egregious (Score:3)
I mean, The Atlantic and The Republican are suing them together!?
Re: (Score:2)
Re: (Score:2)
Sabbat: 'The Best of Frenemies' was a great tune.
Re: (Score:2)
Many Americans avoid strong past tenses. Count your blessings it did't say "spitted".
On the other hand (Score:2)
Statistical Data Points, Right? (Score:4, Interesting)
I know this is Slashdot, but I DID actually look at the linked examples whereby the AI tool replicated entire (or almost entire) articles. And... as long as the evidence isn't completely fabricated, it's genuinely surprising. People who take the position LLM operators always say that it's just statistical data points and the model doesn't contain enough of the original to be able to spit out anything that would count as infringing. Well... not in this case, no sirree - we're talking multiple paragraphs of text lifted verbatim. For one, I cannot see any "fair use" defence (or any other kind) here. And two, a question: is there something about this specific model that enabled it to quote such large chunks of text, or is this an eye-opener that ALL LLMs might do the same thing?
AI as a Plagerism Unbrella (Score:2)