Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
AI Books The Courts

Anthropic Bags Key 'Fair Use' Win For AI Platforms, But Faces Trial Over Damages For Millions of Pirated Works (aifray.com) 74

A federal judge has ruled that Anthropic's use of copyrighted books to train its Claude AI models constitutes fair use, but rejected the startup's defense for downloading millions of pirated books to build a permanent digital library.

U.S. District Judge William Alsup granted partial summary judgment to Anthropic in the copyright lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson. The court found that training large language models on copyrighted works was "exceedingly transformative" under Section 107 of the Copyright Act. Anthropic downloaded over seven million books from pirate sites, according to court documents. The startup also purchased millions of print books, destroyed the bindings, scanned every page, and stored them digitally.

Both sets of books were used to train various versions of Claude, which generates over $1 billion in annual revenue. While the judge approved using books for AI training purposes, he ruled that downloading pirated copies to create what Anthropic called a "central library of all the books in the world" was not protected fair use. The case will proceed to trial on damages related to the pirated library copies.

Anthropic Bags Key 'Fair Use' Win For AI Platforms, But Faces Trial Over Damages For Millions of Pirated Works

Comments Filter:
  • by xack ( 5304745 ) on Tuesday June 24, 2025 @11:31AM (#65472481)
    Pretty much all of tech history is piracy groups "going legit" with negotiations and licences, or even being employed by the companies they pirated from. We just have to deal with the fact that data is inherently copyable and all drm gets cracked. It just so happens that instead of just grabbing a few mp3s over p2p we are copying the whole internet now. We seen all the sci-fi coming true and eventually we just to move to new economic realities.
    • basically buy the books before feeding them to the ai fair enough.
      • by AvitarX ( 172628 )

        If they weren't uploading torrents while doing the downloading I assume their (non punitive) damage is limited to the cost of the book (memory is weak on this, but I believe the statutory damage comes from distributing illegal copies).

        I assume they get away with paying what they would have anyway, plus court fees (since the actual use of the copy was fair use).

    • It's how Hollywood started. Nowadays if you even look at a movie wrong, they'll sue. To make matters worse, if you're a California taxpayer (like me) you're now being forced, under penalty of prison, to donate part of your income to Hollywood.

      https://www.latimes.com/entert... [latimes.com]

      So the "workers" (mostly already wealthy writers and actors) successfully priced themselves out of the market, and campaigned for politicians to regulate the studios away. Obviously the taxpayer's fault.

      So even if you don't look at the

      • by Anonymous Coward

        if you're a California taxpayer (like me) you're now being forced, under penalty of prison, to donate part of your income to Hollywood.

        When you say "forced to donate", do you mean you have to pay your taxes? That's how taxes work. It's not a la carte.

        • Maybe you like giving multimillionaire movie producers free money, but I sure as shit don't. I don't know about you but, when I looked at my last pay statement, on the YTD line there were two numbers that kind of irked me:

          Deductions: $77k
          Taxes: $79k
          Net pay: $17k

          That's what happens when you get taxed on money you never even had, and may never even see. And even if you never do see it, FTB ain't giving anything back. May as well have just set that money on fire, especially with the crap Hollywood produces tod

  • AI companies should pay for ALL data that they train on, unless that data is covered by a permissive licence. This means there should be a licencing system set up to pay Artists etc. This will actually allow them to train on more data in the future. Else, they'll be only training on AI slop into the future, as real artists go broke.
  • Scummy (Score:4, Insightful)

    by bill_mcgonigle ( 4333 ) * on Tuesday June 24, 2025 @11:38AM (#65472505) Homepage Journal

    "I need the content of your book for AI but I won't pay you $14 for a copy."

    That's scummy.

    Not the ones they bought and scanned.

    • As people have been saying all along, it is fair use. Scummy or not.
      I'm not against fixing Copyright law to distinguish between me and OpenAI, though.
      • The illegality of downloading for personal use hinges on me making a copy on my personal device. The original intent of these laws was to forbid reproduction, as reproduction for the purposes of selling was the expected outcome of that. Making a copy for my personal device does not violate the spirit of those laws, and saying it's copyright violation is a weasely abuse of the intent of copyright law. Which in itself is just an outright theft from public domain and a violation of the intents of the constitut
        • The illegality of downloading for personal use hinges on me making a copy on my personal device.

          Not quite that simple. Fair use is a consideration for every copy made, period.

          Was your copy made to deprive the owner of money they would have otherwise made? Then no, it's not fair use, any any copy you make for that purpose is illegal.
          Was your copy made for a use that's protected under fair use? Then any copy you make for that purpose is legal.

          • by AvitarX ( 172628 )

            So basically the copy they made off the pirate site: illegal (to avoid paying)

            The internal copies they made after that (or from purchased and scanned books): legal and fair use?

            • Bingo.

              But specifically- copies made for the purpose of training their model.
              There are still internal copies that could be made of legitimately acquired stuff that would also not be fair use.
    • ...assuming the book can still be found in print form
  • by djp2204 ( 713741 ) on Tuesday June 24, 2025 @11:47AM (#65472531)

    The law is the law until it is changed. If you dont like the law then lobby to Congress to change it. Companies that break the law should be bankrupted via litigation. Innovation is not an excuse. We all know that wonâ(TM)t happen in the tiered justice system in the USA

    • Yeah, my congress-critter is totally going to take my letter as more important than the percentage of greenbacks a lobbyist has paid him to ignore me.
  • by kwelch007 ( 197081 ) on Tuesday June 24, 2025 @11:59AM (#65472555) Homepage

    I wonder, had Anthropic bought a copy of all of those books, e-book or physical, and then done the same, would that have been considered fair use? $1bil of profits can buy a lot of books, and would constitute several years worth of revenue-creating value.

    • I wonder, had Anthropic bought a copy of all of those books, e-book or physical, and then done the same, would that have been considered fair use?

      Yes.

      What's at question here is how the data is used to train the model. Regardless of how it's acquired, the training is fair use, because it's highly transformative.
      There are other considerations on top of that- but acquisition isn't one of them.

      If you borrow a book from a friend, feed it into a hashing algorithm, you doing that, while technically in violation of copyright, is fair use.

      • Also at issue is that copyright holders aren't wanting to be paid for copying their works, they're wanting control of how it's used after the sale, which they are not legally entitled to. That's not what "sell" means. It's legally no business of theirs whether I use their art in an unapproved manner after it's sold to me, whether that's training a LLM or squashing a spider.

        • Also at issue is that copyright holders aren't wanting to be paid for copying their works, they're wanting control of how it's used after the sale, which they are not legally entitled to. That's not what "sell" means. It's legally no business of theirs whether I use their art in an unapproved manner after it's sold to me, whether that's training a LLM or squashing a spider.

          Incorrect. Sale does not remove copyright.
          i.e., if you buy a picture, you are not granted license to produce additional copies of it.
          You have a fairly low UID, I'm a bit surprised this concept is foreign to you.

        • by Holi ( 250190 )

          they're wanting control of how it's used after the sale, which they are not legally entitled to.

          Actually they do. If you buy a copy of a book does that give you the right to turn it into a screenplay and make a movie from it? No it does not. you would need to license the work from the author for that purpose.

        • sense most ai i know wont spit out a copyright book hell some wont even do a none copyrighted one. they only use the data to make there own story's theirs no copyright to enforce and the judge sees that.
    • by Holi ( 250190 )

      I don't believe so. Using the entire works for a commercial enterprise should never be considered fair use.

  • Where copyright holders shut down AI don't really understand how our court system works.

    As an overall rule judges side with property. The old saying, possession is 9/10 of the law has a real meaning but it also has an underlining meaning that whoever has the most money is going to get preferential treatment by our court system.

    AI is worth trillions. As a technology the possibility of it replacing workers means that it is the single most profitable thing our species has ever produced.

    Mind you I
  • Judge Alsup has a good understanding of copyright law -- he presided over Oracle's lawsuit against Google about Java and Android -- so it's surprising that he made this ruling, which apparently ignores all the unauthorized copies made while training an LLM. The same logic implies that, for example, a very common clause in licenses for industry standards (that the license may not put a copy on networked storage) is unenforceable because that kind of copy is fair use for reading the standard.

    • because llm dont spit out the book unless its not copyrighted. it will give a summery or tell you what its about.
      • by Entrope ( 68843 )

        You seem to be suggesting that the significant difference is whether a copy of a work is published, but copyright laws limit reproduction even without publication or public display/performance. This is basic black-letter law, and is why federal statutes explicitly authorize various copies of software (for backup purposes, or running software for purposes of maintaining or repairing a computer, as examples).

    • You need to read the decision before commenting. It's a model of clarity, which Alsup summarizes thusly:

      To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library. Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.

      The sort of copying you appear to be talking about, namely ephemeral "copies" made in computer memory etc,. has long been established as fair use.

      • by Entrope ( 68843 )

        The sort of copying you appear to be talking about, namely ephemeral "copies" made in computer memory etc,. has long been established as fair use.

        I have no idea where you got that idea, but it is 100% wrong. MAI Systems Corp. v. Peak Computer, Inc., 991 F.2d 511 (9th Cir. 1993) held that such ephemeral copies are not fair usea nd were not implicitly authorized. This led to the creation of 17 USC section 117 to legally authorize those copies -- as I said. But that authorization is limited to copying software for specific purposes, so it doesn't help Anthropic here.

  • Anthropic asked for early summary judgment in the case—and their gamble paid off, almost. Judge Alsup agreed that training LLMs and digitizing legally purchased books constituted fair use. But when it came to the pirated library they compiled and retained for “research purposes,” the judge was unequivocal: there is no fair use defense for mass copyright infringement cloaked in scientific pretensions. That part of the case survives—and so does the potential for a high-stakes class-act

  • The biggest win for AI companies is the ruling that AI is "highly transformative". If that holds then there's no ambiguity regarding "derivative works" vs "transformative works" and that means that we can use source code or movies generated by AI without being worried that the original authors of the works the AI was trained on will come and get us for copyright infringement.

    I generally agree with this stance, but am a bit worried about when AI "memorizes". For example it could dump out a verbatim implement

Nothing is finished until the paperwork is done.

Working...