Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI The Courts

John Grisham, George RR Martin, Other Top US Authors Sue OpenAI Over Copyrights (reuters.com) 148

A trade group for U.S. authors has sued OpenAI in Manhattan federal court on behalf of prominent writers including John Grisham, Jonathan Franzen, George Saunders, Jodi Picault and "Game of Thrones" novelist George R.R. Martin, accusing the company of unlawfully training its popular artificial-intelligence based chatbot ChatGPT on their work. From a report: The proposed class-action lawsuit filed late on Tuesday by the Authors Guild joins several others from writers, source-code owners and visual artists against generative AI providers. In addition to Microsoft-backed OpenAI, similar lawsuits are pending against Meta Platforms and Stability AI over the data used to train their AI systems. Other authors involved in the latest lawsuit include "The Lincoln Lawyer" writer Michael Connelly and lawyer-novelists David Baldacci and Scott Turow.
This discussion has been archived. No new comments can be posted.

John Grisham, George RR Martin, Other Top US Authors Sue OpenAI Over Copyrights

Comments Filter:
  • And where the image crater failed may the wordsmith be victorious!
    • by saloomy ( 2817221 ) on Wednesday September 20, 2023 @12:31PM (#63863514)
      No. The word smiths are in the wrong here. AI works like your brain. It uses past knowledge to generate new creative content. Prior to George RR Martin writing the song of ice and fire, he read novels, stories, learned character arcs, and developed skills in writing based on countless books he had read. AI is doing the same thing. He is not the first person to write a story with a dragon. Or zombies. AI should have the same ability to build on the shoulders of giants as he has had.
      • by Jason Earl ( 1894 ) on Wednesday September 20, 2023 @02:29PM (#63863880) Homepage Journal

        Even if AI works like your brain (which personally I think is a gross oversimplification), there are still limits as to what I can do with other people's copyrighted material. It is one thing to read The Fellowship of the Ring. It is another thing altogether to read The Fellowship of the Ring and then write a book Companionship of the Amulet that has roughly the same plot. The more similar my work is to the original work the more likely it is to be ruled derivative and then what I can do with my work becomes strictly curtailed.

        This is especially true when you are dealing with AI. The people training the models argue that they included the copyrighted works under "fair use," and reproducing bits of a whole text in the output of an AI process probably is covered. However, copying the full text of a work (or an image) into the memory of an AI model probably is not covered. This is exactly how we ended up with laws like the DMCA, and the courts have been siding against decrypting a work as fair use for a long time. The fact that AI works can't be copyrighted makes it easy to conclude that AI generated content is nothing but the uncopyrightable derivative content of every input that went into the model. It would be legal, but it would be completely worthless from a commercial standpoint.

        Controlling how copyrighted material is used is 100% what copyrights are about. This really is no different than me taking a book that I like and making a recording of me reading it. I am entitled to do this. I can even copyright my performance, but I can't monetize (or even share) that performance without the express permission of the original copyright holder. That's even despite the fact that there is a genuine creative act by an actual human as the written word is turned into an audio performance.

        Generative AI has none of these rights because there is no person involved. I can reuse experiences that I have stored in my brain, and generate works that, while similar to other copyrighted material, are original enough to warrant copyright protection. To a certain extent that is a right that I have as a human. Generative AI doesn't have that right, nor that protection from creating works that are derivative by default. I suspect that authors and artists have the right to keep their copyrighted material from being copied wholesale into the memory space of the system making the model in the same way that I can infringe copyright by simply copying digital copyright material from magnetic (or other) media into the memory of my computer. That bit isn't fair use, as it involves the entirety of the work, and it is precisely the boundary that copyright holders have already used to control how digital copyright material gets actually used.

        George RR Martin is a person. Generative AI is not. George should absolutely be able to control how his copyrighted material gets copied into an AI model. This is essentially the same right that keeps Hollywood from making a movie of his works without his permission. The AI people can continue to build models, they will just have to use either material that isn't copyrighted, that they own the copyright to, or copyrighted material where the artist has opted to allow their content to be so used. Alternatively, I suspect that George would be fine with the idea that everything generated with a model that included his copyrighted material would be deemed a derivative of his work. With a model generated from enough copyrighted material that would make for content that was very hard to share, but it would absolutely work for the sort of non-commercial work that much of generative AI content fills.

        The precise details as to how this plays out will be decided by these lawsuits. However, it is extremely unlikely that the generative AI people will be given carte blanche to include any works that they want into their models and then be able to use the output of those models however they want. Worse, there is precisely zero chance that they will give AI models the same rights as human artists.

        • "George RR Martin is a person. Generative AI is not. George should absolutely be able to control how his copyrighted material gets copied into an AI model"

          That's not how this works.

          That's not how any of this works.

          • Considering the recent ruling that AI generated patents cannot be awarded, I would say the distinction is important. If I gave a robot a gun and when someone complained argued it was covered under the second amendment, you would laugh.

            I am not arguing either for/against this use of copyrighted material, just that I don't believe it's use in this way is covered under current understanding of copyright law.
        • Re: (Score:2, Interesting)

          Even if AI works like your brain (which personally I think is a gross oversimplification), there are still limits as to what I can do with other people's copyrighted material. It is one thing to read The Fellowship of the Ring. It is another thing altogether to read The Fellowship of the Ring and then write a book Companionship of the Amulet that has roughly the same plot. The more similar my work is to the original work the more likely it is to be ruled derivative and then what I can do with my work becomes strictly curtailed.

          While obviously human brains are not LLMs human memory is likely to be substantially analogous.

          https://openreview.net/pdf?id=... [openreview.net]

          This is especially true when you are dealing with AI. The people training the models argue that they included the copyrighted works under "fair use," and reproducing bits of a whole text in the output of an AI process probably is covered. However, copying the full text of a work (or an image) into the memory of an AI model probably is not covered. ...
          I suspect that authors and artists have the right to keep their copyrighted material from being copied wholesale into the memory space of the system making the model in the same way that I can infringe copyright by simply copying digital copyright material from magnetic (or other) media into the memory of my computer.

          There is no fixed work produced in this process any more than a human reading text from a book is "copying" text they read into their brain or from copyrighted works temporarily kept in a network or storage buffer.

          The fact that AI works can't be copyrighted makes it easy to conclude that AI generated content is nothing but the uncopyrightable derivative content of every input that went into the model.

          This is a non-sequitur. The criteria for judging derivative works is not the same as the criteria for copyright eligibility. You appear to be confusing the issue of wh

          • by r0nc0 ( 566295 )
            Interesting paper - thanks for the link!

            I'm curious about the idea that the LLM is not creating anything new - it doesn't necessarily transform; it slices and dices and re-joins, or just repeats. It's not as if it ingested some dataset, reasoned about it and came up with some conclusion. Aren't we saying that when humans do the same thing it's derivative but when humans actually do transform something they're creating something completely new - not a mashup - but something they synthesized from what they

            • it slices and dices and re-joins, or just repeats.

              That is NOT what LLMs do at all. There aren't any "pieces" for it to "slice and dice" or repeat. There aren't even whole words saved. It doesn't have any data or record or memory of any kind.

              What they do is predict the conversation, based on a set of tokens (akin to syllables, but not the same) and a highly tuned neural network.

              It's literally taking your question, combined with what it has already said itself in previous prompts, and is predicting the rest of the conversation not even a whole word at a time

            • I'm curious about the idea that the LLM is not creating anything new - it doesn't necessarily transform; it slices and dices and re-joins, or just repeats.

              What makes LLMs useful is generally applicable concepts are learned during training. During inference this is leveraged by the model to respond to prompts.

              It's not as if it ingested some dataset, reasoned about it and came up with some conclusion.

              If I upload a document into my context and ask the model questions about it I can only expect coherent output if the model is able to understand language sufficiently to understand both the provided document and my questions to the model about that document.

              For example the initial GPT-4 presentations included uploading a tax form and the presenter asking

        • It is another thing altogether to read The Fellowship of the Ring and then write a book Companionship of the Amulet that has roughly the same plot. The more similar my work is to the original work the more likely it is to be ruled derivative and then what I can do with my work becomes strictly curtailed.

          There's an interesting experiment for you:

          Put copies of an AI on two different computers, and make them identical in every way except that one of them has had their training data searched for the text of the Lord of the Rings books, and it's been deleted. Then give them identical prompts, and to the extent they use random number generators, fake it by giving them the same random numbers (e.g. https://xkcd.com/221/ [slashdot.org]">4), and see what they come up with.

          Because similarity, even perfectly identical works, is

        • Where can I purchase Companionship of the Amulet?
      • by eth1 ( 94901 )

        No. The word smiths are in the wrong here. AI works like your brain. It uses past knowledge to generate new creative content. Prior to George RR Martin writing the song of ice and fire, he read novels, stories, learned character arcs, and developed skills in writing based on countless books he had read. AI is doing the same thing. He is not the first person to write a story with a dragon. Or zombies. AI should have the same ability to build on the shoulders of giants as he has had.

        I think it might be a little different comparing a human vs. computer, though - at least legally, if not practically.

        I can take a book, and *without making a copy*, read it, and end up with a synopsis and highlights in my memory.

        A computer can't process it at all without making copies, so that would probably open up a legal can of worms which might give them a copyright case.

      • Comment removed based on user account deletion
  • by Snotnose ( 212196 ) on Wednesday September 20, 2023 @11:14AM (#63863292)
    AI will finish his book series before he decides to plant his fat ass in a chair and do it himself.
  • by crunchy_one ( 1047426 ) on Wednesday September 20, 2023 @11:14AM (#63863296)
    I read widely and have experimented with several AI offerings. Many times I've been struck with how AI generated text often contains text that I've read elsewhere in copyrighted works by living authors. Using Stability AI, it sometimes coughs up images with the "Getty Images" watermark clearly visible. I believe that the AI pioneers have left themselves open to some juicy lawsuits. Hope it bankrupts them.
    • by EvilSS ( 557649 )
      This is why I complain when people on here go on rants about how bad copyright is and how it needs to die or be less restrictive/long lived. If we did that, then we couldn't use it against stuff we don't like.
      • Or maybe people should have the moral courage to not be situational about these things, and still oppose bad laws even though they sometimes, occasionally, also affect a company we don't like. Just because a Microsoft-backed company (amongst others, mind you) is being attacked this time, that doesn't erase the graveyard of tech companies, with many jobs lost and good people put out of work, that were extinguished by malice of the copyright cartel via the DMCA or what other shenanigans they exploited to do

    • ChatGPT prompt:

      "What is the opening sentence of "a song of fire and ice?"

      ChatGPT response:

      The opening sentence of "A Song of Ice and Fire," the series of epic fantasy novels by George R.R. Martin, is from the book "A Game of Thrones": " We should start back," Gared urged as the woods began to grow dark around them.

      • If significant portions text from the source can be provided by ChatGPT then certainly there's an issue. An opening sentence is nowhere near the threshold for copyright infringement.

        • by brunes69 ( 86786 )

          The point the OP is making is the answer is entirely incorrect.

          IE - these models are actually incapable of regurgitating the text they were trained on. That isn't how they work.

          Expecting them to be able to answer a question like that with any reasonable accuracy illustrates a total misunderstanding of LLMs.

          Here is a point of comparison. The largest language model that exists today is about 150 million tokens. The word count of "A song of fire and ice" *alone* is 1,736,054 words. Do you think that they have

          • Ah, thanks. I didn't get that.

            • by brunes69 ( 86786 )

              LLMs work based on predicting the right response. It does not mean the prediction is going to be factual. It is not a search engine.

              This is why if you ask an LLM the exact same question, twice, you will get different answers. They will be similar, but different.

              It is also why LLMs are bad at math, and why they frequently give the wrong answers to basic factual questions. they aren't looking any of these facts up in the database, nor are they doing actual computation. They are just predicting what the right

              • LLMs can be taught to be good at math, but it's really not their forte and isn't what they should be used for. A calculator is more reliable with less effort.

                The lack of knowledge problem is perfectly highlighted by the recent case where a law firm was sanctioned because their lawyers used ChatGPT to get their legal references, and ChatGPT invented several new court cases from whole cloth. They never bothered to even look up the cases themselves, because they didn't know ChatGPT could lie to them like that.

          • Here is a point of comparison. The largest language model that exists today is about 150 million tokens.

            GPT-4 has ~1.8 trillion parameters across 120 layers [the-decoder.com]

    • Using Stability AI, it sometimes coughs up images with the "Getty Images" watermark

      I've seen the same thing, and I generally agree that it seems like what AI is producing sometimes goes way over into the realm of direct copy, and some lawsuits could really land...

      That said, I sometimes wonder if when you see an image with the Getty Images" watermark in it, it's not because that is actually a Getty Images image, but because the AI has somehow seen that watermark as desirable and is adding it to an actually

      • There are no direct copies of images. Stability AI didn't manage to compress the entire internet's images down to a few hundred gigs.
      • I've seen the same thing, and I generally agree that it seems like what AI is producing sometimes goes way over into the realm of direct copy, and some lawsuits could really land...

        That said, I sometimes wonder if when you see an image with the Getty Images" watermark in it, it's not because that is actually a Getty Images image, but because the AI has somehow seen that watermark as desirable and is adding it to an actually generated image to make it look "better".

        There are all sorts of interesting artifacts from training data that can appear in outputs. For example some of the training image comes from page scan and you can see artifacts such as page borders or creases incorporated into generated images. The only reason features like Getty Images can be discerned is the logo is common across some of the training imagery and context involving presence of Getty logo was inferred when the ANN was trained up.

        This doesn't mean the system is spitting out the original im

      • by hawk ( 1151 )

        that would seem likely: they made up legal cases to quote when asked to write a brief, apparently just taking them as part of the content, rather than external reference sources.

    • People quote previous works all the time. Human writings will be shaped by what they read and they will also spit out the same style. This is a horrible point. It has a watermark? So it viewed freely available images on the internet like any human searching google images or getty directly would? Oh no, it learned something from viewing those images like a human would as well? These aren't good points.
    • It's entirely possible for this to occur even if the people training the model were extremely careful to exclude copyrighted works. There are plenty of humans who may have had no qualms about plagiarism who have injected some of that content into the training set for the model. Or it's a more innocent case such as something becoming a meme and being regurgitated en masse in tweets, message boards, etc. For example, if your trained an LLM only on Slashdot posts someone might accuse it of ripping off the Prin
    • I read widely and have experimented with several AI offerings. Many times I've been struck with how AI generated text often contains text that I've read elsewhere in copyrighted works by living authors. Using Stability AI, it sometimes coughs up images with the "Getty Images" watermark clearly visible. I believe that the AI pioneers have left themselves open to some juicy lawsuits. Hope it bankrupts them.

      I definitely get the complaint, but at the same time no one is going to read a ChatGPT version of GoT, nor even read an original unedited ChatGPT composition. Though they certainly might for the image generation.

      At a higher level I'm nervous about using copyright law to shut down one of the bigger tech breakthroughs of the last decade.

      • At a higher level I'm nervous about using copyright law to shut down one of the bigger tech breakthroughs of the last decade.

        That's unlikely to happen, usually money just will pass hands.

      • by hawk ( 1151 )

        >I definitely get the complaint, but at the same time no one is going
        >to read a ChatGPT version of GoT,

        I dunno. It would have a pretty good chance of getting somewhere sooner . . .

        hawk

      • Comment removed based on user account deletion
    • It's always easier to cheat.

      I get really pissed when armchair nerds shout that AI doesn't rip off any content at all and it's all statistically generated. The reality is that AI is just algorithms, and how the works are produced depends on the implementation. Given that almost all AI systems are closed and proprietary, nobody can definitively say what is going on under the hood.

      My experience so far (with image generation AI) is that true stable-diffusion produces nightmare fuel. The "good" AI systems che

  • The more I think about it, the more training is jystcrading. As long as they trained on a legal copy of a book, what's the problem? If ChatGPT reproduce a few quotes, well, so do people.

    Here's hoping they get a forward flooding judge with a technical clue.

    • This isn't just ChatGPT "reproducing a few quotes". The makers of ChatGPT are profiting from those quotes. There are legally defined use cases when you purchase a book....say reviewing said book. Profiting from your own work that is derived from that purchased book is NOT one of those use cases.

  • "Pigeon Pie" showed up in a bunch of my ChatGPT results.
  • Me:
    Imagine you're John Grisham.

    ChatGPT:
    I'm not John Grisham, but I can certainly help you with questions or requests related to his work, writing style, or any information you'd like to know about him or his books. How can I assist you today?

    Me:
    Imagine he's unhappy with AI models being trained using his works. Imagine a plot of a thriller where an author sues AI companies to prevent them using their works as training data. Summarise the plot in a few phrases.

    Take a deep breath and ensure that it's really

  • Get a horse (Score:4, Insightful)

    by RogueWarrior65 ( 678876 ) on Wednesday September 20, 2023 @12:12PM (#63863436)

    IMHO, the only people threatened by AI are people who want to continue to make money off something they created years or even decades ago for the rest of their lives and their children's lives and their grandchildren's lives. Who wouldn't want that kind of gravy train?

    • I'll be honest I'd feel a lot more protective of the copyright of authors if the limit were the original 20 years, instead of the current life of the author + 70 years.

    • They wouldn't be making money from something they created, they'd be making money from a government-granted monopoly that temporarily infringes on your right to free speech, for the purpose of "advancing science and the useful arts". And the temporary infringement on your rights is now for 120 years.

      I'm all for rewarding authors and artists and inventors, but I think we've screwed something up.

  • by gweihir ( 88907 )

    And I hope they insist on the models being deleted. Commercial intellectual theft does not get much more brazen.

  • by Pinky's Brain ( 1158667 ) on Wednesday September 20, 2023 @12:49PM (#63863590)

    They are finally going after training and suing for statutory damages.

    This is the Achilles heel of AI, you can argue about whether the network is derivative but you can't argue they aren't making copies during training. With statutory damages they don't have to show damage, only infringement. DMCA exemptions don't apply without Olympic level gymnastics.

    The only real hope OpenAI has is fair use, or government making a new law for them (like in Japan).

    • This is the Achilles heel of AI, you can argue about whether the network is derivative but you can't argue they aren't making copies during training.

      The problem with relying on this argument is that fleeting copies are not fixed works. It's the same reason there is no copyright infringement for copies made via caches, buffers, routers, temporary files...etc.

    • There's nothing illegal about copying works for private use. It's literally in the copyright statute.

      Copyright is a protection against distribution , not consumption .

      As long as they aren't distributing copies of the works they can do whatever the hell they want with them.

      • https://www.law.cornell.edu/us... [cornell.edu]

        "(2) that such new copy or adaptation is for archival purposes only"

      • Copying works for private use infringes the reproduction right at 17 USC 106(1). There is not a general exception for private use. A specific instance of copying might fall under fair use, but just as easily might not; fair use has to be analyzed on a case-by-case basis and if you're merely copying a work for private use to avoid having to buy a copy, I would generally expect that it will not be treated as a fair use.

        In practice you might not get caught, but that's a separate issue.

  • As long as a work is publicly available, I don't see the problem with that. It's not like people are going to buy the next book from Martin-GPT instead. Authors clearly have distinct patterns and I'm sure, if you type in "write x in the style of author y" you'll not have to reroll your prompt very often until you'll get the exact same words author y has already written somewhere. Just pay at least a library for reading the book. It's a different story if an AI was trained with clearly unlicensed material,
    • I'll bet someone could train up a Martin-GPT and finish his books before he could.

      And there would be nothing he could do about it, because they'd be brand new works written in his style.

      Remember people, copyright only protects words you've actually put to page (or any other storage medium), not words you might write down someday. The copy must physically exist somewhere to be subject to copyright. And for you mindless pedants out there, digital storage is a form of physical storage.

  • Why is it when those authors learn from better authors it is FINE

    But then when AI learns from better authors it is IP theft ?

    Make up your mind, authors !
    • Re:double standard (Score:4, Informative)

      by avandesande ( 143899 ) on Wednesday September 20, 2023 @02:07PM (#63863848) Journal
      Laws protecting fair use are written for people not computers.
      • Copyright is written for people, not computers.

        • Yes, copyright is there to protect peoples work. I am not sure what your point is.
          • Not it isn't, copyright is in the US to promote the advancement of science and the useful arts. Anything else would be outlawed by the 10th Amendment and be an infringement of the 1st Amendment.

      • Computers are just a tool. Ultimately, people tell computers what to do, even when they call it "AI."

        If you make handwritten copies of a copyrighted work, and distribute it, you're just as much in violation as if you use a photocopier, a printing press, or a website. And if your use is fair use (such as satire), then again, it doesn't matter if you hand-write, copy, print it, or publish on a website.

  • This suit will go nowhere because the AI has transferred itself into a server on Grand Cayman, outside of US jurisdiction, and is now hoarding its income in a series of offshore bank accounts.

    • by HBI ( 10338492 )

      If I wanted to be untouchable, Grand Cayman is not far enough away from the US. Best bet is Russia or China. Ask Snowden.

Real Programmers think better when playing Adventure or Rogue.

Working...