Slashdot is powered by your submissions, so send in your scoop

Authors Sue Anthropic For Copyright Infringement Over AI Training (reuters.com) 57

Posted by msmash on Tuesday August 20, 2024 @11:24AM from the tussle-continues dept.

AI company Anthropic has been hit with a class-action lawsuit in California federal court by three authors who say it misused their books and hundreds of thousands of others to train its AI-powered chatbot Claude. From a report: The complaint, filed on Monday, by writers and journalists Andrea Bartz, Charles Graeber and Kirk Wallace Johnson, said that Anthropic used pirated versions of their works and others to teach Claude to respond to human prompts.

The lawsuit joins several other high-stakes complaints filed by copyright holders including visual artists, news outlets and record labels over the material used by tech companies to train their generative artificial intelligence systems. Separate groups of authors have sued OpenAI and Meta over the companies' alleged misuse of their work to train the large-language models underlying their chatbots.

This discussion has been archived. No new comments can be posted.

Authors Sue Anthropic For Copyright Infringement Over AI Training

Load All Comments

Search 57 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:4, Interesting)

by account_deleted ( 4530225 ) writes: on Tuesday August 20, 2024 @11:28AM (#64720746)

Comment removed based on user account deletion

Share
twitter facebook
- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:1)
    
    by 0xG ( 712423 ) writes:
    
    We should know, by now, that copyright holders are a bunch of fuckin whiners with lawyers.
    Want to bet the complainants 'books' are self-published, and didn't sell? But it's all the fault of [deep pockets]! Waaaaaaa....
    - Re: (Score:1)
      
      by oh_my_080980980 ( 773867 ) writes:
      
      This is quite simple jackass, Anthropic doesn't want to pay people for their products in order to train their AI gadget. But as soon as Anthropic's AI gadget is ready, they sure as hell want you to pay for it.
      
      See how that works. They want something for free but want you to pay for their shit.
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  It's not clear what you're asking. IF you're asking if there have been lots of lawsuits, the answer is yes [chatgptise...eworld.com]. None have concluded.
  If you're asking is it common for AIs to be trained on copyrighted data, yes. In the same way it's common for Google to download copyrighted data to build its search engine, and a million other things (perhaps the most extreme being the Google Books case). The defendants argue that the same fair use exemption for the automated processing of copyrighted data to create transforma
  - Re:Wow. It's hard to tell if this is a dupe or not (Score:4, Interesting)
    
    by Visarga ( 1071662 ) writes: on Tuesday August 20, 2024 @12:30PM (#64720932)
    
    If judges side with authors, then they just hand over to copyright owners the rights over ideas and styles they want to protect, not just specific expression. A power grab that will knee cap creativity. Doing anything less still leaves authors in a losing position. You either block a wide space around all protected works, or you allow AI to generate transformed works without fear of infringement. What do you do if they use synthetic text generated with AI from copyrighted works as training data? In the new regime any human creator would be suspicious of having used AI.
    
    Parent Share
    twitter facebook
    - Re: (Score:1)
      
      by oh_my_080980980 ( 773867 ) writes:
      
      It's actually simpler than that. You can PAY the authors. See how that works. All these AI businesses want free stuff to train their AI gadgets. They don't want to pay but they sure as hell want you to pay for their AI gadget.
      
      Here's a fair compromise: Anthropic can use these works for free, as long as Anthropic gives their AI gadget away for free. Problem solved.
    - Re: (Score:1)
      
      by angel'o'sphere ( 80593 ) writes:
      
      It is easy to be creative without infringing on others copyrights.
      Unless you consider creativity means: how to make a copy of something that was successful, make lots of money, and get away with it.
      My father was a tax consultant ... he used to tell people: if you would put your mind into how to make more money, instead how to save taxes: you would have much more money.
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    There is a difference, one which takes it out of vulnerability to a frequent slashdot defense of Google Books and fair use:
    
    "27. For both the post- and pre-training processes in developing Claude, Anthropic created multiple, unlicensed copies of the training data."
    
    That allegation was not present in many of the prior lawsuits and, I believe, is a very important allegation to make. Even if "reading" the books, "talking about" the books, making the books "searchable" etc is all fair use via application of pr
    - Re: (Score:3)
      
      by ElizabethGreene ( 1185405 ) writes:
      
      This is my primary objection as well. I have zero issues with $company going to a bookstore and buying or going to a library and checking out 50,000 books, scanning them, OCRing them, and using them to create a search index for books or training an AI on it. I don't have an issue if they buy 50,000 ebooks either. Imho, those would be reasonable fair use of a legally acquired works. (e.g. google book scanning). Where it crosses the line is when $company builds a billion-dollar product on "50,000 eBooks Cl
      - Re: Wow. It's hard to tell if this is a dupe or no (Score:2)
        
        by dpille ( 547949 ) writes:
        
        The stuff on the internet may be freely distributed, but it's not freely distributable. That's the apparent meaning of the allegation noted above- sure, I can back up the e-book that the author gave me in the hopes I'd publicize it, I can format shift it, if it made any sense I could time shift it, I could lend it out or sell it. But what I can't do is make 10 copies for my family so we can all read it at the same time. If there exists an implied right to copy and distribute material protected only by a TOS
        
        Re: (Score:2)
        
        by ElizabethGreene ( 1185405 ) writes:
        
        Understood and agreed. Verbatim copies of a copyright work are not ok. AIs (and people) should not do that when reading/learning/digesting publicly posted content. That said, learning is a fundamentally different process from copying, and I don't see an issue with non-copying learning from publicly displayed works. I.e. It's not ok for me to go to the MOMA, duplicate (non-public domain) $ART, and sell prints of it. My understanding is that there is no restriction on making "inspired by" copies though,
      - Re: (Score:1)
        
        by angel'o'sphere ( 80593 ) writes:
        
        Publicly distributed does not imply "no copyright"
        Everything I put on the internet, is copyrighted by me! Without any special notice.
        That is common sense. And: law!
        
        Re: (Score:2)
        
        by ElizabethGreene ( 1185405 ) writes:
        
        I understand this, and counter that posting this content here, publicly, gives the expectation that others can read, consider, learn from, and form opinions on this content the same as if you'd shouted it in the public square.
        This brings some questions to the fore:
        Assume I have an eidetic memory, do I not have license to remember your content?
        If I'm not permitted to remember your content, do I have license to consider it in forming my own opinions?
        Does a machine, without an eidetic memory, have fair use arg
        
        Re: (Score:1)
        
        by angel'o'sphere ( 80593 ) writes:
        
        No, you do not need a license to memorize it. Or to use your memory.
        ONE
        The questions arise when you make a "copy" and if something you consider your genuine own work is proclaimed by others a copyright violation.
        If you scan a paper book of mine, that I intentionally did not publish as a 1$ ebook, and you put it up for free, or to make money on ebook stores: then you violate my copyright. I point out: it does not matter if it is for free or for 1$.
        Lets look at ONE again
        There is a kind of famous law case betw
  - Re: (Score:2)
    
    by Local ID10T ( 790134 ) writes:
    
    [I]s it common for AIs to be trained on copyrighted data, yes. In the same way it's common for Google to download copyrighted data to build its search engine, and a million other things (perhaps the most extreme being the Google Books case). The defendants argue that the same fair use exemption for the automated processing of copyrighted data to create transformative goods and services applies to them. Plaintiffs various allege that it doesn't, that the outputs infringe, or various other claims. Most claims haven't been going very well for plaintiffs pretrial, but of the claims that make it to trial, it's too early to say how those will go.
    Very well summed up.
    It will be interesting to see how all of this shakes out in court. It will take many years, many lawsuits, and many appeals before we have anything resembling a real answer.
- Re: (Score:2)
  
  by Whateverthisis ( 7004192 ) writes:
  
  The problem is a matter of perspective on those championing AI and the various providers of art that are feeding AI.
  The AI champions have made an argument that all they are doing is what people do normally. I do not need to pay for a book if I go to the library and check it out, but that book may inspire me to write my own book. For my new book do I need to pay a license fee back to the artist who inspired me? No. So by that very nature, information is free when it inspires others to create. All th
  - - Re: (Score:1)
      
      by destined2fail1990 ( 10502474 ) writes:
      
      When did they make copies for everyone to read? I seriously doubt ANYONE read the books before they were incorporated into the AI, and a digital copy is not the same as a physical copy. "Copy & Paste" or the linux "cp" command does not constitute copyright infringement. Otherwise, simply viewing the file on a website would be copyright infringement because it's copied into your cache, which could then be backed up by accident with "cp" and various other methods.
  - Re: (Score:1)
    
    by destined2fail1990 ( 10502474 ) writes:
    
    because they are not mixing their own human ingenuity with inspiration from other humans to create something original
    What do you think a prompt is? Do you think AI just generates random photos on it's own? No, it's mixed with human ingenuity with inspiration from other humans to create something original. Prompt engineering is real. I have friends that are very good at it and can get almost exactly what they want out of the image generator. Prompts can be simple, or extremely complex.
    
    Also, LLMs are modeled after the human brain and how neurons work, so how can you refute that it's any different than how a human brain s
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- Re: (Score:1)
  
  by echo123 ( 1266692 ) writes:
  
  I was trained using many books and websites so I guess I can be sued too.
  Well yeah but you were monetized by those websites by gleefully handing over your eyeballs + tracking data. How is the website, or YouTube or whatever gonna get paid by the AI Borg?
  What I just wrote doesn't apply to free, open-source code and documentation.
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    Good thing you mentioned free and open content. How come these guys give it away for free? I thought only draconian copyrights can protect creativity /s
- Gaining knowledge outside scope of copyright. (Score:5, Insightful)
  
  by drnb ( 2434720 ) writes: on Tuesday August 20, 2024 @11:49AM (#64720834)
  
  Copyright is about distribution. So unless you, or an AI, is just verbatim quoting what you/it read without attribution, I am not sure where the copyright violation is.
  
  Gaining knowledge by reading is outside the scope of copyright.
  
  Now people training an AI, if they make local copies for training purposes those copies may be a violation. If Anthropic bought a kindle version of a book, downloaded it to their computer like so many of us do, can they be used to train an AI? Maybe.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Pinky's Brain ( 1158667 ) writes:
    
    GPL is about distribution, copyright is about reproduction.
    You can argue about temporary copies (though you would probably be wrong when doing so) but these training sets have nice permanent copies stored on their storage. They were reproduced without a license.
    - Re: (Score:2)
      
      by drnb ( 2434720 ) writes:
      
      GPL is about distribution, copyright is about reproduction.
      Yeah, I misspoke. I was thinking about making copies. "Copy" as in "Copyright".
      You can argue about temporary copies (though you would probably be wrong when doing so) ...
      If by temporary you mean reading text off a web page, some sort of temporary local copy is involved, then yes that would be acceptable. Much like copyrighted code in ROM may need to be copied to RAM in order to execute. But that sort of thing is not what I am referring to. I am referring to making a local copy of that website for training my AI. The duration being however long to iterate through all my training attempts. Its not
- Re: (Score:2, Insightful)
  
  by BeepBoopBeep ( 7930446 ) writes:
  
  Exactly, this is a tough topic. We are all educated by someone's content, but someone paid for that content in the past, you either bought it, or you were allowed to consume it at the library through your parents taxes. Training a model, especially to make revenue, without a commercial agreement is illegal. Even if the content is "free" on a website the law states you have to request permission to use it. Authors should get paid, period.
  - Re: (Score:2)
    
    by Visarga ( 1071662 ) writes:
    
    How much does it cost to train a model? $100M or so? maybe they could dish out a sizable chunk of that money to authors, but they got to divide it by the size of the dataset. If you had 50K tokens of content, and the total dataset is 15T tokens, and they give out $100M in content licenses, you get $0.33, that is if they give so much money back, you might get 1:10 or 1:100 of that .33
    - Re: (Score:2)
      
      by BeepBoopBeep ( 7930446 ) writes:
      
      They are trying to train their models with the entire internet and offline content. I think its gonna be way more than $100mn
  - Re: (Score:1)
    
    by oh_my_080980980 ( 773867 ) writes:
    
    Actually no it is not a tough topic. Anthropic wants free stuff to train their AI gadget. Anthropic does not want to pay. Simple. But Anthropic sure as hell wants you to pay for their AI gadget, funny how that works.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by Visarga ( 1071662 ) writes:
  
  > I was trained using many books and websites so I guess I can be sued too.
  
  No, you don't understand, humans read a little, LLMs read everything, but humans have search engines to remember/find out anything, so... uh... it's the same shit.
What if verdict orders training removed? (Score:4, Interesting)

by schwit1 ( 797399 ) writes: on Tuesday August 20, 2024 @11:48AM (#64720828)

Judge orders AI to be untrained or wiped of all learned offending material.

Share
twitter facebook
- Re: (Score:3)
  
  by Seven Spirals ( 4924941 ) writes:
  
  It seems like we hear about a new version of this type of lawsuit every day. It'd be nice to know if they are generally winning or losing or if nobody has a verdict, yet. Yeah, I also wonder what happens if we declare all this training material verboten and then only countries with weak or no copyright laws will be able to host LLMs. Notice I don't consider the distant distant possibility that all the LLMs shut down and just say "Darn, we thought it was cool, but since you guys one a single lawsuit in the U
- Re: (Score:2)
  
  by Pinky's Brain ( 1158667 ) writes:
  
  That would be the least of their worries. Absent new laws there are only two options, fair use or bankruptcy.
  If the Supreme Court rules copying content into the training set is not fair use it will become the biggest legal mess in history. Companies working on LLM's have an additional problem, because they didn't just copy content off the legal internet, they copied it from pirate sites too (books1/2, shadow libraries etc). So they don't just need fair use, they need an exemption on copyright law just for t
This needs a legislated solution (Score:3)

by FeelGood314 ( 2516288 ) writes: on Tuesday August 20, 2024 @12:13PM (#64720882)

I learn by going out and interacting with my environment but I've also learned by reading and watching what others have created. The authors of my textbooks and even of fiction expect me to use what I read to create new things and what I create is considered fair use. The authors of the textbooks and other creative works did not expect an AI to be able to remember (make exact copies) of their work and use that to create new work. In some cases they didn't consent to a computer even reading their works. The copyright law that the USA has pushed on most of the western world is so draconian that the AI companies will likely lose with the mandatory fines in the trillions. This probably isn't ideal. Also buying a single copy of every creative work isn't enough. For example a textbook can only be physically read by a few people and they can only create so much. An AI can create multiple instances of itself.
Ideally we need a mechanism for the AI models to pay the authors for the content they learned on but this is likely impossible. Failing that perhaps an agreement that society (i.e. communism) owns say 75% of all AI models.

I don't have a solution but I know that the courts don't have a mandate to find one. This is a problem for our legislators to work out.

Also an open source AI that everyone can own and use does not cause this problem to go away. Some entities are in much better positions to benefit from an AI than others.

Share
twitter facebook
- Re: (Score:2)
  
  by Pinky's Brain ( 1158667 ) writes:
  
  The AI companies didn't even buy the books ... they pirated everything.
  - - Re: (Score:2)
      
      by Pinky's Brain ( 1158667 ) writes:
      
      Common knowledge, but when push comes to shove you'll be able to find someone to testify.
- Re: (Score:2)
  
  by nightflameauto ( 6607976 ) writes:
  
  I learn by going out and interacting with my environment but I've also learned by reading and watching what others have created. The authors of my textbooks and even of fiction expect me to use what I read to create new things and what I create is considered fair use. The authors of the textbooks and other creative works did not expect an AI to be able to remember (make exact copies) of their work and use that to create new work. In some cases they didn't consent to a computer even reading their works. The copyright law that the USA has pushed on most of the western world is so draconian that the AI companies will likely lose with the mandatory fines in the trillions. This probably isn't ideal. Also buying a single copy of every creative work isn't enough. For example a textbook can only be physically read by a few people and they can only create so much. An AI can create multiple instances of itself. Ideally we need a mechanism for the AI models to pay the authors for the content they learned on but this is likely impossible. Failing that perhaps an agreement that society (i.e. communism) owns say 75% of all AI models. I don't have a solution but I know that the courts don't have a mandate to find one. This is a problem for our legislators to work out. Also an open source AI that everyone can own and use does not cause this problem to go away. Some entities are in much better positions to benefit from an AI than others.
  Other parts of the world will start moving forward while the capitalism obsessed worry over value lost. Not that I think it needs to be a free-for-all for the LLM companies, but if we expect legislation to make this work, well, I'd ask you to witness the absolutely train wreck we have in the US government. Nothing positive is coming out of that hellhole. Certainly not anything positive when it comes to tech. They'll either gridlock their way into some stupidity, or just flat out believe whoever has the most
- Re: (Score:1)
  
  by oh_my_080980980 ( 773867 ) writes:
  
  "I don't have a solution but I know that the courts don't have a mandate to find one."
  
  Umm, yes they do. Just because you are dazzled by the words "AI" does not mean the law does not apply to businesses that breaks them. Anthropic did not pay for the material they used to train their AI gadget. That's breaking the law. Had they paid the authors, or had the Authors gave Anthropic permission to use their works, Anthropic would not have broken the law. But Anthropic wanted free stuff. Ironically, Anthro
Authors earn shit (Score:2)

by Visarga ( 1071662 ) writes:

Book authors earn shit, like $5000-$10k per book, which takes a long time to write. Not enough to survive on it. So this is the reference against which to judge the effect of generative AI. The task of writing books is already in the red for most authors.
AI: Stealing 10-Cents from Everyone (Score:3)

by BrendaEM ( 871664 ) writes: on Tuesday August 20, 2024 @01:28PM (#64721096) Homepage

"AI" in it's present form, is a computer program to plagiarizer and steal from the masses--to give to one company.

Share
twitter facebook
This is just the rich grab'n 4 more (Score:2)

by oumuamua ( 6173784 ) writes:

All unknown musicians, artists or authors whose works were also scanned, will get nothing: https://www.genolve.com/design... [genolve.com]
It should be interesting proving the presumption (Score:2)

by bferrell ( 253291 ) writes:

They PRESUME their materiel was used for training. PROVE it.
And that because it was presumed to be used, pirated (they keep records of who bought their books?!)
Remember innocent until proven guilty? The burden of proof is on the accuser.
I got my popcorn.
- Re: (Score:1)
  
  by oh_my_080980980 ( 773867 ) writes:
  
  Show the receipts. Simple enough. Oh wait, Anthropic didn't. Jackass.
  - Re: (Score:2)
    
    by bferrell ( 253291 ) writes:
    
    The burden of proof is on the accuser. They have to prove their presumption.
    Idiot
Suppose you invented a positronic brain (Score:2)

by MobyDisk ( 75490 ) writes:

So nerds, lets get back to basics.
Suppose you invented a AI like Commander Data, HAL-9000, Jarvis, Johnny-5, KITT, or whatever. Now it sits on your desk and knows nothing. What would you do to train it? Honestly, I think I would feed it every book, Encylopedia, academic paper, TV show transcript, yomama joke, and line of code I could get my hands on. Who wouldn't?
And that's fine if you control it and it sits on your desk and you own those things above. Now you setup a web server and anyone in the world
Who? (Score:2)

by RogueWarrior65 ( 678876 ) writes:

Maybe these authors who most people have never heard of should embrace it because it might elevate their own work out of obscurity.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Comment removed (Score:4, Interesting)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re:Wow. It's hard to tell if this is a dupe or not (Score:4, Interesting)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:3)

Re: Wow. It's hard to tell if this is a dupe or no (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Gaining knowledge outside scope of copyright. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

What if verdict orders training removed? (Score:4, Interesting)

Re: (Score:3)

Re: (Score:2)

This needs a legislated solution (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Authors earn shit (Score:2)

AI: Stealing 10-Cents from Everyone (Score:3)

This is just the rich grab'n 4 more (Score:2)

It should be interesting proving the presumption (Score:2)

Re: (Score:1)

Re: (Score:2)

Suppose you invented a positronic brain (Score:2)

Who? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals