Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI The Courts Technology

OpenAI Disputes Authors' Claims That Every ChatGPT Response is Derivative Work 119

OpenAI has responded to a pair of nearly identical class-action lawsuits from book authors -- including Sarah Silverman, Paul Tremblay, Mona Awad, Chris Golden, and Richard Kadrey -- who earlier this summer alleged that ChatGPT was illegally trained on pirated copies of their books. From a report: In OpenAI's motion to dismiss (filed in both lawsuits), the company asked a US district court in California to toss all but one claim alleging direct copyright infringement, which OpenAI hopes to defeat at "a later stage of the case." The authors' other claims -- alleging vicarious copyright infringement, violation of the Digital Millennium Copyright Act (DMCA), unfair competition, negligence, and unjust enrichment -- need to be "trimmed" from the lawsuits "so that these cases do not proceed to discovery and beyond with legally infirm theories of liability," OpenAI argued.

OpenAI claimed that the authors "misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence." According to OpenAI, even if the authors' books were a "tiny part" of ChatGPT's massive dataset, "the use of copyrighted materials by innovators in transformative ways does not violate copyright." Unlike plagiarists who seek to directly profit off distributing copyrighted materials, OpenAI argued that its goal was "to teach its models to derive the rules underlying human language" in order to do things like help people "save time at work," "make daily life easier," or simply entertain themselves by typing prompts into ChatGPT.

The purpose of copyright law, OpenAI argued is "to promote the Progress of Science and useful Arts" by protecting the way authors express ideas, but "not the underlying idea itself, facts embodied within the author's articulated message, or other building blocks of creative," which are arguably the elements of authors' works that would be useful to ChatGPT's training model. Citing a notable copyright case involving Google Books, OpenAI reminded the court that "while an author may register a copyright in her book, the 'statistical information' pertaining to 'word frequencies, syntactic patterns, and thematic markers' in that book are beyond the scope of copyright protection."
This discussion has been archived. No new comments can be posted.

OpenAI Disputes Authors' Claims That Every ChatGPT Response is Derivative Work

Comments Filter:
  • by Aighearach ( 97333 ) on Wednesday August 30, 2023 @03:12PM (#63809746)

    misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use)

    Fair use is a defense, but you have to have otherwise violated the copyright to claim it. It is not a sound legal argument to say that fair use is outside the scope of copyright.

    Lawyers are expected to always file a motion to dismiss... that they included this argument shows how weak their case is. It seems pretty obvious that they copied all these authors works, without permission. The bot can recite whole sections...

    • by Xenx ( 2211586 )

      Fair use is a defense, but you have to have otherwise violated the copyright to claim it. It is not a sound legal argument to say that fair use is outside the scope of copyright.

      It is fair to say that if something is an exception to a rule/law, it's outside of the scope of it. That isn't a statement as to the validity of their claim, only one of how they worded it.

      Lawyers are expected to always file a motion to dismiss... that they included this argument shows how weak their case is.

      It does nothing of the sort. You just appear to have an incomplete understanding.

      It seems pretty obvious that they copied all these authors works, without permission.

      They go on to show there is legal precedence for being able to use existing works, assuming they're acquired legally, to derive data from the works. "OpenAI reminded the court that "while an author may register a copyright in her book, the 's

    • by Tora ( 65882 )

      Does a person who's good at memorizing books need permission to recite sections of said book from memory?

      • According to Big Brother live feed, just a few seconds is enough to get them to block the feed.

        "Stop singing", and reciting other copyrighted stuff gets the slam.

      • Of course they do. Otherwise anybody could just pay someone to memorize a book one paragraph at a time and retype it and it wouldn't be copyrighted anymore. The WORK is copyrighted. Even if a human reproduces the work, it is still subject to copyright.
    • by cowdung ( 702933 )

      "It seems pretty obvious that they copied all these authors works, without permission. The bot can recite whole sections..."

      Ok.. I've studied transformers like GPT. Can you explain to me how they "copy" authors' works?

      • A better question is, how can chat gpt come up with anything that is based from direct real world experience and not from someone else's work?
      • by BranMan ( 29917 )

        I think (and I have NOT studied transformers or GPT) that how that happens is the probabilities collapse to 100%. What I mean by that is like when I put in a super specific search phrase into Google and it comes back with ONE, and only ONE answer. (Happened to me about 6 months ago and was amazing at the time).

        GPT can only match on what it finds - if it's a subject no one anywhere writes about, and there is only one work for GPT to draw from that fits, then it only has one sequence of words to "choose fro

    • The bot can recite whole sections...

      Your ability to recite a whole section is not a copyright infringement. I can remember exactly how a Muse song sounds and can even sing the lyrics. My ability to do so is not copyright infringement.

      ACTUALLY DOING SO, would be.

      Also I'm suing you because your brain just copied this post into your memory without my permission you hypocrite!

      • If you recreated the song based on your memory then it should be infringement, however.
      • Your ability to recite a whole section is not a copyright infringement.

        This is the sort of stupid non-argument arguments people make on slashdot.

        No, the ability to recite whole sections is not copyright infringement. Actually doing it is. These bots don't have agency, and they don't have any code that blocks them from reciting those sections. So they do recite them.

        Why reply if you're gonna say something completely obtuse?

  • by presidenteloco ( 659168 ) on Wednesday August 30, 2023 @03:16PM (#63809754)
    An LLM's output is effectively a novel recombination of micro-patterns from thousands or millions of authors, literary and otherwise.

    This is strongly analogous to how a human's pondering and then utterances on a topic, during conversation, are a very complex function of many elements including the combined particular knowledge they've absorbed and abstracted and re-mixed. A good chunk of the knowledge (and fragments of ways of expressing) that the human has absorbed into their associative memory come from the many copyrighted works of literature and audio or video that they have imbibed.

    So should humans be constantly accused of copyright violation for expounding on things, based on a huge combination and re-mixing of copyrighted (and uncopyrighted) works/information experiences?

    If not, then why should the similarly functioning ChatGPT be accused of such violation?
    • Yes. Your thoughts were derived from two other authors. You are going to jail for a very long time.
    • by EvilSS ( 557649 )

      So should humans be constantly accused of copyright violation for expounding on things, based on a huge combination and re-mixing of copyrighted (and uncopyrighted) works/information experiences?

      Don't give the author's guild any ideas. This is, after all, the same group that sued over kindle's text to speech.

    • by taustin ( 171655 )

      Since the first cave man painted the first picture of a bison on the wall of his cave with black soot from his fire, all art has been informed by other art. That's how it works.

      The question here is, derivative or transformative.

      Lawyer on both sides know this. All else is waving flags at the potential jury pool.

      • Not true. If an artist sits down and paints their own interpretation of a bowl of fruit that is real and that they are looking at, how is that derivitive?
    • by Ichijo ( 607641 )

      So should humans be constantly accused of copyright violation for expounding on things...?

      They often are.

      If not, then why should the similarly functioning ChatGPT be accused of such violation?

      Because ChatGPT can always tell you where they got the idea from, and can also be made to forget something if the original author wishes it. It would be unjust to try to force a human to do the same just because ChatGPT can do it.

      • by xwin ( 848234 )

        So should humans be constantly accused of copyright violation for expounding on things...?

        They often are.

        This just shows how backward the system is. Copyright is not a natural thing, it is a legal thing so someone can extort money from someone else. It is well and good if it is done for a limited time, but the current system is broken. Copyright should end after say 10 years after registration and certainly should end upon death of the author.
        Regardless of copyright, LLM is not copying anything as the GP stated. It creates new text which is similar to the text it was trained on. If I register copyrights on p

      • So curious, you suggest ChatGPT can forget some author's work. Then does ChatGPT store all the data it used for training? Then would it be true that every new copy of the data set would violate copyright since each new training set has an unlicensed copy of the original author's work? I've no idea. But it seems like only one copy of the book could be stored for each training set if the training set contains a copy of all the inputs. Unless of course they buy a copy of all the input data with each copy of th
      • "Because ChatGPT can always tell you where they got the idea from" you claimed.

        That is incorrect. LLMs like ChatGPT store only a statistical abstraction of all of the sequences of words (from many billions of sources) that have been read into them in the neural-net training.
        • by Ichijo ( 607641 )

          LLMs like ChatGPT store only a statistical abstraction of all of the sequences of words (from many billions of sources) that have been read into them in the neural-net training.

          That is incorrect. ChatGPT can also summarize the content of almost any book written before 2021 [medium.com].

          • Summarizing is based on abstraction of the content, which is consistent with what I said.
            ChatGPT may keep around and have ability to access some or all of its source material, I don't know.
            But when it comes up with an answer to a typical prompt from a user, it is not referencing back to all of the original material (trillions of words) that it "read". Instead it is only consulting parts of its trained neural net that have had weights influenced by some of that material; whatever instances and portions of th
    • The comparison to an automated index or contextual search is a lot more appropriate than to a human. There are decently defined rules for when those do and do not violate copyright. At a minimum I think if one can get the LLM to reproduce a copyrighted work, then the author can receive damages. This means that the company that makes the LLM must also design it to not violate copyright with its output. Otherwise copyright is dead because every copyrighted work will just be imported into an LLM and then e
      • At a minimum I think if one can get the LLM to reproduce a copyrighted work, then the author can receive damages.

        If you ask an LLM for the lyrics to a popular theme song the response provided is a conveyance of fact not a performance.

        It's no different than a person memorizing the same theme song and reciting it when asked.

        This means that the company that makes the LLM must also design it to not violate copyright with its output. Otherwise copyright is dead because every copyrighted work will just be imported into an LLM and then everyone can just buy access to the LLM and use the appropriate prompt, like "please recite the work ... by ...". So every copyrighted work will be sold one time, to the company that makes the dominant LLM. The LLM just becomes a copyright washing machine.

        LLMs don't work this way. They may have better memories than some of us but none of them are that good.

        To give you an idea try asking an LLM something very specific but not widely known. Ask it for example to tell you the callsigns of a random cruise ship. If you dump the context window between pr

        • But that only proves further that it is derivitive, because it can't tell you about anything that it hasn't sampled. What has it sampled that isn't someone else's work?
          • But that only proves further that it is derivitive, because it can't tell you about anything that it hasn't sampled. What has it sampled that isn't someone else's work?

            LLMs are not a search index. What makes the technology useful is that generally applicable principals are being learned during training that are subsequently generally applicable within and across domains in response to prompting. The ability to apply knowledge is what sets AI apart from a search thru a database.

            If you ask it to write you a joke or story a thousand times and dump the context window between each attempt you will get a thousand jokes and stories from the same exact prompt. Perhaps some by

            • A six sided die gives you a different answer any time. Is a ten or twenty sided die more creative?

              In the case of mathematics, it is not finding new mathematical theories. It may be stringing X mathematical methods together but all methods would be someone else's work. But it is not able to conclude something that no one has ever known or written, like exactly how dark matter works. It can only spit out combinations of what has been written before. It can make inferences but that is not in it self cre
    • by evanh ( 627108 )

      If there's money involved, yes, humans are subject to copyright violation for expounding on things.

  • If AI is primed to do everything for us then yes, we will break copyright to train AI. Why? Because other countries will do it and their AI will be better.

    We live in a global economy, we will not stop progress.
  • ...learn by studying the work of others
    Using human generated work to train AI is fair use
    I would never want to read a book created by AI. Only people can make creative art
    Unfortunately, those who control entertainment, hate creative work and prefer sequels, reboots, remakes, spinoffs, etc. Much of what they produce might as well be created by AI
    Hopefully, people will get bored and demand original, creative work by human artists

    • by ShanghaiBill ( 739463 ) on Wednesday August 30, 2023 @03:34PM (#63809834)

      Only people can make creative art

      If that is true, then human judges of creativity should be able to easily distinguish between human art and AI art.

      Guess what? They can't.

      They'll be even less able in the future.

      • Well in the US, AI cannot patent https://www.nature.com/article... [nature.com] and I believe copyright is also not permitted. So today, legally, only people can make creative works that are protected.
      • Guess what? They can't.

        I'm not sure what retarded judges you're talking to, but largely AI generated art is dead easy to spot. In any case the topic has nothing to do with the quality of the output and everything to do with creative expression. There's a reason e.g. midjourney images are easy to spot, they all *look the same*. They lack any kind of creativity, and above all they require creative input in order to generate any useful outcome at all.

        • I'm not sure what retarded judges you're talking to, but largely AI generated art is dead easy to spot.
          In any case the topic has nothing to do with the quality of the output and everything to do with creative expression. There's a reason e.g. midjourney images are easy to spot, they all *look the same*. They lack any kind of creativity, and above all they require creative input in order to generate any useful outcome at all.

          Literally every point made above is completely backwards.

        • Midjourney images "look the same" because of product management decisions made by Midjourney's product engineering and marketing teams to offer a consistent, solid user experience that can reliably deliver art that meets or exceeds the needs and expectations of the product's target audience, at a price they would be willing to pay.

          Try installing Stable Diffusion locally on your computer, and you can tailor the "AI art style" to whatever you see fit. No need to make inaccurate generalizations about AI.

      • Care to tell us where you get the numbers ?
        Studies ?

  • Time to change how society works as AI is coming for all our jobs.
    To the tired old "Capitalism promotes innovation and efficient distribution of goods and services'
    AGI will do it better!
  • by bubblyceiling ( 7940768 ) on Wednesday August 30, 2023 @03:26PM (#63809804)
    AI is sure starting to look like the next bubble. All the claims seem to have fallen flat on their face and now legal troubles. What comes next?
    • All the claims seem to have fallen flat on their face

      If you believe that, you're not paying attention.

      LLMs make mistakes. Sometimes hilarious mistakes. But more often than not, they are correct, and the error rate will fall rapidly with improved training and faster hardware.

    • by gweihir ( 88907 )

      For ChatAI type AI, it has been the "next bubble" for quite some time. There are grande, massively overstated claims, and when you look, rather simplistic demos on the skill level of a beginner. There are massive, massive problems that have been demonstrated, like model poisoning, unfixable damage to models by recursion (https://arxiv.org/pdf/2305.17493v2.pdf), the impossibility to make LLMs safe (https://arxiv.org/pdf/2209.15259.pdf), ChatAI getting dumber in most/all other areas when you try to fix proble

  • Also obviously, the claim is true and can be mathematically proven. Statistical models _cannot_ be original. That is fundamentally impossible. Only deductive AI models can theoretically be original and they drown in complexity before they get there.

    • Also obviously, the claim is true and can be mathematically proven. Statistical models _cannot_ be original. That is fundamentally impossible. Only deductive AI models can theoretically be original and they drown in complexity before they get there.

      I could claim the opposite. There is no creativity in deductive models since all the true statements are already determined by the axioms. If you claim that the creativity is finding interesting true statements, given the complexity, then the creativity is in

      • by gweihir ( 88907 )

        That is stupid. "Creativity" != "creating original information". https://en.wikipedia.org/wiki/... [wikipedia.org]

        Incidentally, you just stated that either creativity is impossible or limited to sentient beings _and_ that sentience is an extra-physical phenomenon. Are you sure you wanted to do that?

        • Creativity requires experiencing the real world.
          • Creativity requires experiencing the real world.

            Would you agree someone who has never been able to move, see, feel, smell but can listen can't be creative because they've never experienced any of the things described to them in the real world?

            Is star trek creative? Afterall nobody has ever been in a starship before or gone to any strange new worlds. Without experiencing a real starship or other planets, without any relevant experience how can sci-fi be creative? If the answer is some form of extrapolation and application of learned experience the foll

            • Well if they could only listen, then they could draw (assuming they can move) a visualization of what they think the sound that they heard looks like and it would be creative. They could draw anything as described to them and it would be creative.

              As for Star Trek, we witness the stars and space. We witness the laws of physics and different species on earth. We witness gravity and by 1965 knew about lack of gravity in space. The first being in orbit was 1957, the first spacewalk was 1961. Sure there w
        • That is stupid. "Creativity" != "creating original information"

          So you don't think the LLMs create original information? Deduction sure doesn't do that. In Shannon's information theory, randomness plays an essential role, so if you actually defined things, you might be stuck with probability/statistics.

          Also, most LLMs provably generate original text. The output range of these LLMs is enormous. They can't help but generate original text. Of course, a monkey can easily generate original "text", so th

    • Also obviously, the claim is true and can be mathematically proven. Statistical models _cannot_ be original. That is fundamentally impossible. Only deductive AI models can theoretically be original and they drown in complexity before they get there.

      It can write bedtime stories.

      PROMPT: Please write a bedtime story about a grump who shits on everything he doesn't understand.

      Once upon a time, there was a grumpy old man named Mr. Grumps. He lived in a small town with his cat and dog, but he wasn't very happy. You see, Mr. Grumps didn't like anything new or different. If something didn't fit into his idea of how things should be, he would get angry and start complaining.

      Mr. Grumps was especially grumpy about technology. He thought it was a

      • It was a good little robot. The story ended well for tech. All is good.
      • by gweihir ( 88907 )

        Also obviously, the claim is true and can be mathematically proven. Statistical models _cannot_ be original. That is fundamentally impossible. Only deductive AI models can theoretically be original and they drown in complexity before they get there.

        It can write bedtime stories.

        It can do a lot of simplistic things (with low reliability), because simplistic things can be easily and statistically derived from its training data. Most people forget how _much_ training data went into these systems. These systems however cannot go beyond that training data and always fall short of what the training data would have allowed something with real intelligence to do with it. Statistical derivation of things is always incredibly shallow. There is not even one real deduction step in there.

        • It can do a lot of simplistic things (with low reliability), because simplistic things can be easily and statistically derived from its training data.

          Well the machine did manage to create an original bedtime story despite your claim it can't be original.

          Come to think of it previously you admitted to never even having tried GPT-4.

          You previously claimed "ChatGPT cannot even do a simple addition of two arbitrary numbers, the model is simply incapable of doing something like that. " which didn't age well when you were instantly proven wrong.

          Before that you said "The only impressive thing about LLMs is the language interface, not the utterly dumb "reasoning

          • by gweihir ( 88907 )

            Well the machine did manage to create an original bedtime story despite your claim it can't be original.

            Your evaluation is flawed. This story is not original, but deeply derivative. All this shows is your lack of insight.

            • Your evaluation is flawed. This story is not original, but deeply derivative. All this shows is your lack of insight.

              What if anything is the objective basis for this claim? If it was deeply derivative and not original what was it deeply derived from? What objective criteria do you believe must be met for something to be considered original that was not met?

              Can you for example point to an original bedtime story and contrast that with the machine generated story showing how and why the definition applies to the "original" yet why the machine generated story falls short?

              Is there an objective falsifiable means of discrimina

  • Things like "Please output page 1 paragraph 2 of the Book ____" just won't work. The entire works are not contained within the existing system.
    • by Njovich ( 553857 )

      Wow, Robin Thicke should have picked you as a lawyer for his copyright case as clearly you know better than his lawyers, the judges, and the experts. Making a full copy of a work is not needed for copyright infringement. The fact is that if it can be shown that the training included a work, and that it can be triggered to output something very similar, that's a lawsuit that can go either way.

    • what about asking "Please provide a paragraph of text in the style of __author__ about a child's feelings toward a good father". Repeat for the style of __author2__ and an abusive father. I just tried the following successfully on ChatGPT: Write a four line poem in the style of Bob Dylan's acoustic period about the hard life of a teenage boy trying to find a job after high school? If you repeat the same four more times giving slightly different conditions for the content (finding a wife, raising a child,
      • ...go on strike if you are worried by the idea of repeating the same idea but asking for TV show script lines in the style of named person when that named person is yourself and you work as a TV writer. Currently the overhead is lower to ask the writer for the lines but it will get easier to the point where you can give it text documents with multiple questions and drive the authoring from there without much real invention involved beyond the broadest definition of the scenario involved. Which suggests to m
  • So If I borrow a book from a friend, or read an article in the book store, or download a PDF of a book and read it. Them am I liable for what I read from that if I tell someone else, or summarize that? If I watch a TV show, say on Youtube. Then that is copyrighted material in my brain if I go an use it?

    • You aren't liable for a memory, of course not. But if you write a story based on that memory without applying any real world experience of your own then it is a violation.
  • We need laws that clarify it is ok for AI to be trained on publicly accessible and/or purchased content. If you make it publicly accessible I should have the right to train my AI on it. When you put information out into the world you can’t expect a cut based on what someone does with that info. If I read a book on aerodynamics and build & sell airplanes I don’t owe the author of that book any money.

  • We should be able to recreate anything they found.

    Reviews and comments for books were certainly ingested and they can be lengthy and contain considerable plot detail and summarization. So I would expect the LLMs to be aware of these works and be able to summarize them.

    I'd like to see how they "know" a certain text was used. The devil is in the details (not the summaries).

    I don't have any of these nor read them. I might try looking for Stephen King details, he has also claimed this.

  • ... artist or private enthusiast using whatever machinery on copyrighted input. But if the machinery is owned by a large corporation, then of course it's all fine. That's why Google won similar lawsuits before...
  • Would be interesting if one of those authors ever tried to copyright an "original" blues song.

    Yes... I know. "All works are derivative" is annoying click-bait. The statement is both completely true and entirely useless.

    • by xwin ( 848234 )
      The only non derivative work would be if an author learned the alphabet and memorized 100K words from a dictionary and then wrote a piece of literature or science. Everything else is derivative work.
  • by ErikKnepfler ( 4242189 ) on Wednesday August 30, 2023 @07:00PM (#63810594)
    more fun would be to claim that ChatGPT is a life form and then simply argue that it's being taught like a child reading from the same books, just faster
    • But it would be easy to prove false given that ChatGPT is unable to understand even the most basic concepts. People need to realise that fundamentally repeating something, mixing something, and understanding the context of something is not the same thing.

  • I don't see how it can be a derivative work. Derivative works are copyrighted works based on something else. The OpenAI responses were created by an AI which cannot be the author of a copyrighted work so they can't be derivative works.

    • What does chatgpt do that isn't based on something else? It's just a calculation of a million things someone else did. It cannot interject anything it has learned from the real world because it cannot sense the real world.
  • All human creativity is derivative to a large extent.

Heard that the next Space Shuttle is supposed to carry several Guernsey cows? It's gonna be the herd shot 'round the world.

Working...