Reddit Sues AI Startup Anthropic For Breach of Contract, 'Unfair Competition' (cnbc.com) 44

Posted by msmash on Wednesday June 04, 2025 @02:50PM from the tussle-continues dept.

Reddit is suing AI startup Anthropic for what it's calling a breach of contract and for engaging in "unlawful and unfair business acts" by using the social media company's platform and data without authority. From a report: The lawsuit, filed in San Francisco on Wednesday, claims that Anthropic has been training its models on the personal data of Reddit users without obtaining their consent. Reddit alleges that it has been harmed by the unauthorized commercial use of its content.

The company opened the complaint by calling Anthropic a "late-blooming" AI company that "bills itself as the white knight of the AI industry." Reddit follows by saying, "It is anything but."

Reddit Sues AI Startup Anthropic For Breach of Contract, 'Unfair Competition'

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 44 Comments Log In/Create an Account

Comments Filter:

Fuck reddit. (Score:5, Insightful)

by TrentTheThief ( 118302 ) writes: on Wednesday June 04, 2025 @02:55PM (#65427775)

Just that.

- Re: Fuck reddit. (Score:1)
  
  by lengthylemon ( 10502393 ) writes:
  
  Agreed. Fuck reddit. Absolute dog shit of a community (and organization)
  - Re: (Score:2)
    
    by TrentTheThief ( 118302 ) writes:
    
    I used reddit quite a bit after Digg got so fricking stupid. It was awesome at the time. I bailed when they turned into little girls and decided they needed to clean up the internet. I went to voat where things were even better than they'd been at Reddit.
    But Reddit's alt-hater purges turned Voat into a cesspit.
    I stick to darkweb, now.
The data is worth billions (Score:2)

by rsilvergun ( 571051 ) writes:

If not trillions. I know that sounds stupid but being able to train in AI to replace workers is priceless.

The point is Reddit will work something out with them where they get paid but this is barely a bump in the road.
- Re: (Score:3)
  
  by ebunga ( 95613 ) writes:
  
  Imagine assigning positive value to a reddit post.
  - Re: The data is worth billions (Score:2)
    
    by LordofWinterfell ( 90845 ) writes:
    
    Itâ(TM)s humans communicating with each other. No matter how you feel about them or the site, that much data all around correspondence is measurably valuable.
- Re: (Score:2)
  
  by nightflameauto ( 6607976 ) writes:
  
  If not trillions. I know that sounds stupid but being able to train in AI to replace workers is priceless. The point is Reddit will work something out with them where they get paid but this is barely a bump in the road.
  If you can train on reddit data to replace an actual worker in a real job, that job wasn't worth doing in the first place. Reddit is where information goes to not just die, but to be humiliated, tortured, distorted, and brutalized before that ultimate death. If that's "value," I say let the AI data slurpers choke on it.
  - Re: (Score:2)
    
    by msauve ( 701917 ) writes:
    
    >If you can train on reddit data to replace an actual worker in a real job
    
    No reason it couldn't replace a slashdot editor.
    - Re: (Score:2)
      
      by nightflameauto ( 6607976 ) writes:
      
      >If you can train on reddit data to replace an actual worker in a real job No reason it couldn't replace a slashdot editor.
      I'm not convinced there are editors here anymore. I think they were replaced by scripts, not even AI agents.
- Re: (Score:2)
  
  by Njovich ( 553857 ) writes:
  
  Your car would be worth a billion dollars if there were no other cars. However given that cars are available everywhere it is not worth even 0.01% of that.
  Reddit's data would only have this huge value if there was no other place to get it. Right now pretty much all that can be learned from Reddit is embedded in free models from facebook, deepseek and others. Cost 0. You can get indexed data from like 90% of Reddit nearly for free from common crawl at the cost of bandwidth.
  There is definitely value to Reddit
  - Re: (Score:2)
    
    by Pinky's Brain ( 1158667 ) writes:
    
    The question/answer format of a lot of the content makes it very valuable, same as stackoverflow.
    Within its specific type of training data it makes up a huge chunk. I could easily see Anthropic settling for a billion.
- Re: (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  If not trillions. I know that sounds stupid but being able to train in AI to replace workers is priceless.
  This would be a good time to refresh ourselves on the concept of GIGO. (Garbage In, Garbage Out)
  Quite frankly I’m not so sure the latest version of humanity is the one. Perhaps we consider going back 50 or 100 years if we’re looking to teach AI rather than merely train it.
  Would exposing a virgin AI to 4chan be something you pay to get, or pay to get rid of? One ideologies treasures may be another ideologies brainworm.
How do they know? (Score:4, Interesting)

by ZipNada ( 10152669 ) writes: on Wednesday June 04, 2025 @03:00PM (#65427793)

Do they have some evidence? Or is it just a fishing expedition.
"It’s asking for a jury trial", there would be discovery. But it seems like there would need to be a solid basis for the suit in order for it to move forward.

- Re: (Score:2)
  
  by Samantha Wright ( 1324923 ) writes:
  
  There's nothing much to doubt. The evidence is always the same: "our web server logs show scrapers originating from IP addresses owned by someone who didn't pay us."
  The Verge article is a little clearer [theverge.com]. 100,000 threads pilfered over the past year with scraping! Oh no!
  (See also: the actual legal filing [documentcloud.org]. I have to admit the headings sound a little unstable.)
  - Re: (Score:2)
    
    by ZipNada ( 10152669 ) writes:
    
    Thanks for that info.
- Re: (Score:2)
  
  by TheWho79 ( 10289219 ) writes:
  
  What doe "evidence" have to do with it. I bet Anthropic says they dl'd info - they don't need to hide here. DMCA gives them full safe harbor, the same way Google and all search engines get. Hell, Google republished the internet for 25 years and called it caching without remorse, apology, or permission.
Cute lapdogs (Score:2)

by Pinky's Brain ( 1158667 ) writes:

I like how they avoid saying copyright even once, because setting copyright precedence is really not something their OpenAI/Google sponsors would want.
Bravely protecting the "privacy" of their users instead ...
- Re: (Score:3)
  
  by Random361 ( 6742804 ) writes:
  
  The "privacy" of their users on a public platform with search results that show up prominently in Google searches?
Filing (Score:2)

by ISayWeOnlyToBePolite ( 721679 ) writes:

I couldn't find it in the posted article so, link to filing: https://redditinc.com/hubfs/Re... [redditinc.com] from https://www.engadget.com/ai/re... [engadget.com]
Good luck (Score:2)

by Austerity Empowers ( 669817 ) writes:

Their business model is to make their content public and index high on google. And they are mad that someone is using that public knowledge.
The solution is simple, put it behind a paywall. Problem solved, they will go bankrupt in a few hours and Anthropic will have to use something else. Literally everyone wins, except holders of rddt, but they kind of have that coming.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Literally everyone wins
  The the fucking LLMs seeking to shoot themselves in the foot by including reddit in their fucking training corpus.
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    *Even the
  - Re: (Score:3)
    
    by nightflameauto ( 6607976 ) writes:
    
    Literally everyone wins
    The the fucking LLMs seeking to shoot themselves in the foot by including reddit in their fucking training corpus.
    That'd be a big win for the rest of society. Let the rot and infestation of Reddit taint AI training data. Might as well include 4chan while you're at it.
  - Re: (Score:2)
    
    by thoriumbr ( 1152281 ) writes:
    
    It makes more sense than you think. Most of Reddit content is stupidity, but human stupidity. LLMs need human intelligence (even lack of) to be trained, because training on LLM content quickly degrades the training. And Reddit have one of the largest human-generated repositories out there.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      It's based on a dubious assumption.
      
      That they can gain from the naturally generated language, and then de-program via fine-tuning the fact that they've pushed it closer to a confused NaziWokeMAGATransBot or some shit- just the worst of everyone's predilections.
      
      It's a bad idea, but you are correct- they need the natural language.
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        i.e., it's not just token count that matters. There's a very large emphasis in quality of tokens when trying to get your hands on a training corpus. Reddit is about as low quality as you can get. All forums are.
        
        Re: Good luck (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        I'm not sure that's a fair assessment of reddit. You certainly wouldn't want to treat all of the data there the same, but they have conveniently scored it already. Some subs would be generally worthless, but they are already segregated.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        How do you pick which subs to use, and what subs not to use?
        How do we get a score on a per-post basis of non-biased informative or well-reasoned content?
        
        Popularity scores are most certainly not the answer.
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        How do you pick which subs to use, and what subs not to use?
        With a human, of course. You wouldn't have a human vet everything, you'd use some kind of filter that at least got the biggest turds out of the pool. But for large content sources, it's worth spending some time doing a little hands-on classification.
        
        Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Ok- fair- when I say that forums are low quality tokens, that is (perhaps inappropriately) qualified. The qualifier is "in aggregate".
        I don't deny that a person can definitely clean up that data.
âoeUsersâ(TM)â Consent (Score:2)

by nick_davison ( 217681 ) writes:

Reddit is upset that Anthropic is taking a dubious approach to Redditâ(TM)s usersâ(TM) consent - when thatâ(TM)s Redditâ(TM)s job.
A site thatâ(TM)s switched its terms to grant itself the right to sell its usersâ(TM) content, blocked accounts for trying to delete their contentâ¦ is upset that someone else is acting similarly dubiously.
By all means, Reddit, call it for what it is: You have something you think is valuable, others think is valuable, and you want to force
Let me guess ... (Score:5, Insightful)

by allo ( 1728082 ) writes: on Wednesday June 04, 2025 @04:00PM (#65427936)

"has been training its models on the personal data of Reddit users without obtaining their consent"
Their means Reddit's consent and not the user's consent, doesn't it?

- Re:Let me guess ... (Score:5, Informative)
  
  by EvilSS ( 557649 ) writes: on Wednesday June 04, 2025 @04:37PM (#65427994)
  
  "has been training its models on the personal data of Reddit users without obtaining their consent"
  Their means Reddit's consent and not the user's consent, doesn't it?
  100%. Reddit already has at least one deal (reportedly worth around $60M/yr) to sell data to an AI firm. They don't care about privacy, they care about getting paid.
  
Boo hoo (Score:2)

by smooth wombat ( 796938 ) writes:

If you can steal music, video, and software by claiming nothing was stolen, you can use publicly available data without issue.
Just Call It Caching Like Google Did for 20+ years (Score:3)

by TheWho79 ( 10289219 ) writes: on Wednesday June 04, 2025 @05:12PM (#65428072)

All Anthropic has to do is the same thing Google did, by republishing the page and call it a 'cached' page. If they just send Reddit a few trinkets in referrals - poof, burden has been met for safe harbor/caching/512 - call it what you want.
This will get kicked to the curb by the courts. They will want no part of it.

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Images aside, its pretty different as Search engines used a snippet and drove traffic to their sites.
Little dictators owning their subreddits (Score:2)

by thesjaakspoiler ( 4782965 ) writes:

and now claiming that other people's contributions also belong to them.
- Re: (Score:2)
  
  by Digital Avatar ( 752673 ) writes:
  
  Based on the text of the complaint, no, they're not claiming that. Rather, they're claiming that pursuant to their terms of service Anthropic didn't have the right to scrape that data. If they were claiming they owned the data they'd be adding copyright infringement claims to their complaint, but they didn't. Their claims are Breach of Contract, Unjust Enrichment, Trespass to Chattels (read: your bots damaged us), Tortious Intereference (read: Reddit has an obligation to respect the privacy of its users, An

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Fuck reddit. (Score:5, Insightful)

Re: Fuck reddit. (Score:1)

Re: (Score:2)

The data is worth billions (Score:2)

Re: (Score:3)

Re: The data is worth billions (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

How do they know? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Cute lapdogs (Score:2)

Re: (Score:3)

Filing (Score:2)

Good luck (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Good luck (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

âoeUsersâ(TM)â Consent (Score:2)

Let me guess ... (Score:5, Insightful)

Re:Let me guess ... (Score:5, Informative)

Boo hoo (Score:2)

Just Call It Caching Like Google Did for 20+ years (Score:3)

Re: (Score:1)

Little dictators owning their subreddits (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals