Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Courts

Authors Sue Anthropic For Copyright Infringement Over AI Training (reuters.com) 57

AI company Anthropic has been hit with a class-action lawsuit in California federal court by three authors who say it misused their books and hundreds of thousands of others to train its AI-powered chatbot Claude. From a report: The complaint, filed on Monday, by writers and journalists Andrea Bartz, Charles Graeber and Kirk Wallace Johnson, said that Anthropic used pirated versions of their works and others to teach Claude to respond to human prompts.

The lawsuit joins several other high-stakes complaints filed by copyright holders including visual artists, news outlets and record labels over the material used by tech companies to train their generative artificial intelligence systems. Separate groups of authors have sued OpenAI and Meta over the companies' alleged misuse of their work to train the large-language models underlying their chatbots.

This discussion has been archived. No new comments can be posted.

Authors Sue Anthropic For Copyright Infringement Over AI Training

Comments Filter:
  • Comment removed (Score:4, Interesting)

    by account_deleted ( 4530225 ) on Tuesday August 20, 2024 @10:28AM (#64720746)
    Comment removed based on user account deletion
    • Comment removed based on user account deletion
      • by 0xG ( 712423 )

        We should know, by now, that copyright holders are a bunch of fuckin whiners with lawyers.
        Want to bet the complainants 'books' are self-published, and didn't sell? But it's all the fault of [deep pockets]! Waaaaaaa....

        • This is quite simple jackass, Anthropic doesn't want to pay people for their products in order to train their AI gadget. But as soon as Anthropic's AI gadget is ready, they sure as hell want you to pay for it.

          See how that works. They want something for free but want you to pay for their shit.
    • by Rei ( 128717 )

      It's not clear what you're asking. IF you're asking if there have been lots of lawsuits, the answer is yes [chatgptise...eworld.com]. None have concluded.

      If you're asking is it common for AIs to be trained on copyrighted data, yes. In the same way it's common for Google to download copyrighted data to build its search engine, and a million other things (perhaps the most extreme being the Google Books case). The defendants argue that the same fair use exemption for the automated processing of copyrighted data to create transforma

      • by Visarga ( 1071662 ) on Tuesday August 20, 2024 @11:30AM (#64720932)
        If judges side with authors, then they just hand over to copyright owners the rights over ideas and styles they want to protect, not just specific expression. A power grab that will knee cap creativity. Doing anything less still leaves authors in a losing position. You either block a wide space around all protected works, or you allow AI to generate transformed works without fear of infringement. What do you do if they use synthetic text generated with AI from copyrighted works as training data? In the new regime any human creator would be suspicious of having used AI.
        • It's actually simpler than that. You can PAY the authors. See how that works. All these AI businesses want free stuff to train their AI gadgets. They don't want to pay but they sure as hell want you to pay for their AI gadget.

          Here's a fair compromise: Anthropic can use these works for free, as long as Anthropic gives their AI gadget away for free. Problem solved.
        • It is easy to be creative without infringing on others copyrights.
          Unless you consider creativity means: how to make a copy of something that was successful, make lots of money, and get away with it.

          My father was a tax consultant ... he used to tell people: if you would put your mind into how to make more money, instead how to save taxes: you would have much more money.

      • by Anonymous Coward
        There is a difference, one which takes it out of vulnerability to a frequent slashdot defense of Google Books and fair use:

        "27. For both the post- and pre-training processes in developing Claude, Anthropic created multiple, unlicensed copies of the training data."

        That allegation was not present in many of the prior lawsuits and, I believe, is a very important allegation to make. Even if "reading" the books, "talking about" the books, making the books "searchable" etc is all fair use via application of pr
        • This is my primary objection as well. I have zero issues with $company going to a bookstore and buying or going to a library and checking out 50,000 books, scanning them, OCRing them, and using them to create a search index for books or training an AI on it. I don't have an issue if they buy 50,000 ebooks either. Imho, those would be reasonable fair use of a legally acquired works. (e.g. google book scanning). Where it crosses the line is when $company builds a billion-dollar product on "50,000 eBooks Cl

          • The stuff on the internet may be freely distributed, but it's not freely distributable. That's the apparent meaning of the allegation noted above- sure, I can back up the e-book that the author gave me in the hopes I'd publicize it, I can format shift it, if it made any sense I could time shift it, I could lend it out or sell it. But what I can't do is make 10 copies for my family so we can all read it at the same time. If there exists an implied right to copy and distribute material protected only by a TOS
            • Understood and agreed. Verbatim copies of a copyright work are not ok. AIs (and people) should not do that when reading/learning/digesting publicly posted content. That said, learning is a fundamentally different process from copying, and I don't see an issue with non-copying learning from publicly displayed works. I.e. It's not ok for me to go to the MOMA, duplicate (non-public domain) $ART, and sell prints of it. My understanding is that there is no restriction on making "inspired by" copies though,

          • Publicly distributed does not imply "no copyright"

            Everything I put on the internet, is copyrighted by me! Without any special notice.
            That is common sense. And: law!

            • I understand this, and counter that posting this content here, publicly, gives the expectation that others can read, consider, learn from, and form opinions on this content the same as if you'd shouted it in the public square.

              This brings some questions to the fore:
              Assume I have an eidetic memory, do I not have license to remember your content?

              If I'm not permitted to remember your content, do I have license to consider it in forming my own opinions?

              Does a machine, without an eidetic memory, have fair use arg

              • No, you do not need a license to memorize it. Or to use your memory.

                ONE
                The questions arise when you make a "copy" and if something you consider your genuine own work is proclaimed by others a copyright violation.

                If you scan a paper book of mine, that I intentionally did not publish as a 1$ ebook, and you put it up for free, or to make money on ebook stores: then you violate my copyright. I point out: it does not matter if it is for free or for 1$.

                Lets look at ONE again
                There is a kind of famous law case betw

      • [I]s it common for AIs to be trained on copyrighted data, yes. In the same way it's common for Google to download copyrighted data to build its search engine, and a million other things (perhaps the most extreme being the Google Books case). The defendants argue that the same fair use exemption for the automated processing of copyrighted data to create transformative goods and services applies to them. Plaintiffs various allege that it doesn't, that the outputs infringe, or various other claims. Most claims haven't been going very well for plaintiffs pretrial, but of the claims that make it to trial, it's too early to say how those will go.

        Very well summed up.

        It will be interesting to see how all of this shakes out in court. It will take many years, many lawsuits, and many appeals before we have anything resembling a real answer.

    • The problem is a matter of perspective on those championing AI and the various providers of art that are feeding AI.

      The AI champions have made an argument that all they are doing is what people do normally. I do not need to pay for a book if I go to the library and check it out, but that book may inspire me to write my own book. For my new book do I need to pay a license fee back to the artist who inspired me? No. So by that very nature, information is free when it inspires others to create. All th

      • because they are not mixing their own human ingenuity with inspiration from other humans to create something original

        What do you think a prompt is? Do you think AI just generates random photos on it's own? No, it's mixed with human ingenuity with inspiration from other humans to create something original. Prompt engineering is real. I have friends that are very good at it and can get almost exactly what they want out of the image generator. Prompts can be simple, or extremely complex.

        Also, LLMs are modeled after the human brain and how neurons work, so how can you refute that it's any different than how a human brain s

  • by schwit1 ( 797399 ) on Tuesday August 20, 2024 @10:48AM (#64720828)

    Judge orders AI to be untrained or wiped of all learned offending material.

    • It seems like we hear about a new version of this type of lawsuit every day. It'd be nice to know if they are generally winning or losing or if nobody has a verdict, yet. Yeah, I also wonder what happens if we declare all this training material verboten and then only countries with weak or no copyright laws will be able to host LLMs. Notice I don't consider the distant distant possibility that all the LLMs shut down and just say "Darn, we thought it was cool, but since you guys one a single lawsuit in the U
    • That would be the least of their worries. Absent new laws there are only two options, fair use or bankruptcy.

      If the Supreme Court rules copying content into the training set is not fair use it will become the biggest legal mess in history. Companies working on LLM's have an additional problem, because they didn't just copy content off the legal internet, they copied it from pirate sites too (books1/2, shadow libraries etc). So they don't just need fair use, they need an exemption on copyright law just for t

  • by FeelGood314 ( 2516288 ) on Tuesday August 20, 2024 @11:13AM (#64720882)
    I learn by going out and interacting with my environment but I've also learned by reading and watching what others have created. The authors of my textbooks and even of fiction expect me to use what I read to create new things and what I create is considered fair use. The authors of the textbooks and other creative works did not expect an AI to be able to remember (make exact copies) of their work and use that to create new work. In some cases they didn't consent to a computer even reading their works. The copyright law that the USA has pushed on most of the western world is so draconian that the AI companies will likely lose with the mandatory fines in the trillions. This probably isn't ideal. Also buying a single copy of every creative work isn't enough. For example a textbook can only be physically read by a few people and they can only create so much. An AI can create multiple instances of itself.
    Ideally we need a mechanism for the AI models to pay the authors for the content they learned on but this is likely impossible. Failing that perhaps an agreement that society (i.e. communism) owns say 75% of all AI models.

    I don't have a solution but I know that the courts don't have a mandate to find one. This is a problem for our legislators to work out.

    Also an open source AI that everyone can own and use does not cause this problem to go away. Some entities are in much better positions to benefit from an AI than others.
    • The AI companies didn't even buy the books ... they pirated everything.

    • I learn by going out and interacting with my environment but I've also learned by reading and watching what others have created. The authors of my textbooks and even of fiction expect me to use what I read to create new things and what I create is considered fair use. The authors of the textbooks and other creative works did not expect an AI to be able to remember (make exact copies) of their work and use that to create new work. In some cases they didn't consent to a computer even reading their works. The copyright law that the USA has pushed on most of the western world is so draconian that the AI companies will likely lose with the mandatory fines in the trillions. This probably isn't ideal. Also buying a single copy of every creative work isn't enough. For example a textbook can only be physically read by a few people and they can only create so much. An AI can create multiple instances of itself. Ideally we need a mechanism for the AI models to pay the authors for the content they learned on but this is likely impossible. Failing that perhaps an agreement that society (i.e. communism) owns say 75% of all AI models. I don't have a solution but I know that the courts don't have a mandate to find one. This is a problem for our legislators to work out. Also an open source AI that everyone can own and use does not cause this problem to go away. Some entities are in much better positions to benefit from an AI than others.

      Other parts of the world will start moving forward while the capitalism obsessed worry over value lost. Not that I think it needs to be a free-for-all for the LLM companies, but if we expect legislation to make this work, well, I'd ask you to witness the absolutely train wreck we have in the US government. Nothing positive is coming out of that hellhole. Certainly not anything positive when it comes to tech. They'll either gridlock their way into some stupidity, or just flat out believe whoever has the most

    • "I don't have a solution but I know that the courts don't have a mandate to find one."

      Umm, yes they do. Just because you are dazzled by the words "AI" does not mean the law does not apply to businesses that breaks them. Anthropic did not pay for the material they used to train their AI gadget. That's breaking the law. Had they paid the authors, or had the Authors gave Anthropic permission to use their works, Anthropic would not have broken the law. But Anthropic wanted free stuff. Ironically, Anthro
  • Book authors earn shit, like $5000-$10k per book, which takes a long time to write. Not enough to survive on it. So this is the reference against which to judge the effect of generative AI. The task of writing books is already in the red for most authors.
  • by BrendaEM ( 871664 ) on Tuesday August 20, 2024 @12:28PM (#64721096) Homepage
    "AI" in it's present form, is a computer program to plagiarizer and steal from the masses--to give to one company.
  • All unknown musicians, artists or authors whose works were also scanned, will get nothing: https://www.genolve.com/design... [genolve.com]
  • They PRESUME their materiel was used for training. PROVE it.
    And that because it was presumed to be used, pirated (they keep records of who bought their books?!)

    Remember innocent until proven guilty? The burden of proof is on the accuser.

    I got my popcorn.

  • So nerds, lets get back to basics.

    Suppose you invented a AI like Commander Data, HAL-9000, Jarvis, Johnny-5, KITT, or whatever. Now it sits on your desk and knows nothing. What would you do to train it? Honestly, I think I would feed it every book, Encylopedia, academic paper, TV show transcript, yomama joke, and line of code I could get my hands on. Who wouldn't?

    And that's fine if you control it and it sits on your desk and you own those things above. Now you setup a web server and anyone in the world

  • Maybe these authors who most people have never heard of should embrace it because it might elevate their own work out of obscurity.

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (3) Ha, ha, I can't believe they're actually going to adopt this sucker.

Working...