Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Courts

OpenAI Gets Some of Sarah Silverman's Suit Cut in Mixed Ruling (bloomberglaw.com) 64

OpenAI must face a claim that it violated California unfair competition law by using copyrighted books from comedian Sarah Silverman and other authors to train ChatGPT without permission. From a report: But US District Judge Araceli Martinez-Olguin on Monday also dismissed a number of Silverman and her coplaintiffs' other legal claims, including allegations of vicarious copyright infringement, violations of the Digital Millennium Copyright Act, negligence, and unjust enrichment. The judge gave the authors the opportunity to amend their proposed class action by March 13 to fix the defects in the complaint.

The core of the lawsuit remains alive, as OpenAI's motion to dismiss, filed last summer, didn't address Silverman's claim of direct copyright infringement for copying millions of books across the internet without permission. Courts haven't yet determined whether using copyrighted work to train AI models falls under copyright law's fair use doctrine, shielding the companies from liability. Although Martinez-Olguin allowed the unfair competition claim to advance, she said the claim could be preempted by the federal Copyright Act, which prohibits state law claims that allege the same violation as a copyright claim.

This discussion has been archived. No new comments can be posted.

OpenAI Gets Some of Sarah Silverman's Suit Cut in Mixed Ruling

Comments Filter:
  • But... (Score:1, Offtopic)

    by christoban ( 3028573 )

    Her book is the funniest comedian authored book I've ever listened to. The only funny one, actually.

    Mostly about her bedwetting. A lot funnier than that sounds.

  • If it's not fair use (Score:2, Interesting)

    by Anonymous Coward

    If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?

    How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?

    Am I misreading this, or is Sarah Silverman's argument really that she doesn't want machines reading her work without a pay-per-read license?

    • by myowntrueself ( 607117 ) on Tuesday February 13, 2024 @05:18PM (#64237484)

      If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?

      How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?

      Am I misreading this, or is Sarah Silverman's argument really that she doesn't want machines reading her work without a pay-per-read license?

      This would cause problems for text-to-speech used by the blind... a machine reading her work and then plagiarising it, reading it out loud to someone? THEIVES!

      • This would cause problems for text-to-speech used by the blind... a machine reading her work and then plagiarising it, reading it out loud to someone? THEIVES!

        I remember this being a big area of interest about 15 years ago, but there doesn't seem to have been much about it lately. So long as it's done in real time and just for the reader, and not an audience, I would be inclined to say that it's not infringing the reproduction right, derivative right, or most significantly, the public performance right.

        • This would cause problems for text-to-speech used by the blind... a machine reading her work and then plagiarising it, reading it out loud to someone? THEIVES!

          I remember this being a big area of interest about 15 years ago, but there doesn't seem to have been much about it lately. So long as it's done in real time and just for the reader, and not an audience, I would be inclined to say that it's not infringing the reproduction right, derivative right, or most significantly, the public performance right.

          As for training LLM's, honestly I don't see that (outside of bugs or faults in that software where it inadvertently and not by design, regurgitates training data) the output of an LLM is anything more than an opinion or review, and hence not a 'derivative work' of any kind. And, therefore, it should be considered fair use. Not a lawyer but I wish I was because its basically a license to print money.

    • by jdagius ( 589920 ) on Tuesday February 13, 2024 @05:25PM (#64237498)

      "How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?"

      Interesting question. It is well known that there are tons of copyrighted intellectual property (IP) embedded in the datasets used to train LLM's. And I think it is also known that some clever users of these LLM's have figured out ways to coerce the models to regurgitate significant portions of this IP verbatim (more or less), which could (theoretically) violate the "fair use" standards of copyright laws.

      So the administrators and governments will probably see the need to create additional regulations, rules and laws to minimize the impact of this problem.

      • by dskoll ( 99328 ) on Tuesday February 13, 2024 @05:40PM (#64237518) Homepage

        The act of reading a book is not copyright violation. A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation. But the act of memorizing it and then reciting it to large groups of people for pay probably is. Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?

        • by EvilSS ( 557649 )

          A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation.

          Don't forget Amazon was sued for doing this with their Kindle. https://www.theguardian.com/te... [theguardian.com]

          Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?

          What constitutes a derivative work though? A quote? An analysis? Using the ideas of a book in abstract to answer a question the book touches on?

          • What constitutes a derivative work though? A quote? An analysis? Using the ideas of a book in abstract to answer a question the book touches on?

            Gotcha covered:

            17 USC 501(a): Anyone who violates any of the exclusive rights of the copyright owner as provided by sections 106 through 122 ... is an infringer of the copyright ....

            17 USC 106: Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following: ... to prepare derivative works based upon the copyrighted work ....

            17 USC 101: A âoederivative workâ is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a âoederivative workâ.

            So there's your answer.

            A quote is not a derivative work because it's not based on a preexisting work. Instead, that's a reproduction of part of the work (a separate exclusive right under 106, however). A literary analysis is not a derivative work, but if you dug too deep and merely produced an annotated work or adaptation, then it would be. It's not too hard to stay on the correct side of that line.

            Using the ideas of a book in abstract to answer a question the book touches on?

            Ideas can always be used. Facts -- or things claimed to be a real-life fact -- can always b

          • by dgatwood ( 11270 )

            A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation.

            Don't forget Amazon was sued for doing this with their Kindle. https://www.theguardian.com/te... [theguardian.com]

            Amazon caved because they lacked sufficient concern for the blind, and felt that they had to suck up to the Authors Guild or else they would lose content. Right after that, Arizona State University got sued by blind users [arstechnica.com] for violating the ADA because they used Kindle in their classes, and shortly thereafter, the DOJ encouraged three other universities to stop using Kindle [slashdot.org].

            I'm surprised nobody has sued Amazon over that decision, because it looks like a pretty deliberate action on Amazon's part that enabled

        • The act of reading a book is not copyright violation. A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation. But the act of memorizing it and then reciting it to large groups of people for pay probably is.

          I don't see any evidence that OpenAI is trying to have ChatGPT recite memorized work. Outside of some fairly specific queries that would probably be a non-useful output.

          Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?

          I don't think that's what derivative means from a legal sense. It's not just copying someone's style, you need to copy a specific work.

          Consider all the satirical news shows that spun off from the Daily Show (often on different networks). You might consider them "derivative" in the creative sense, but none of them got sued by Comedy Central.

          • You can't copyright the concept of "a late night comedy show".

            Someone else posted the legal definition of derivative work or you can google it. That will clear up your understanding of the term. And keep in mind it is a legal term and will be determined by a court and judges do not play words games like "how is me reading a book and memorizing it not also a copyright violation?". That kind of sophomoric shit will get a lawyer crushed by the judge.

        • But the act of memorizing it and then reciting it to large groups of people for pay probably is.

          Memorizing a book is not infringing. Reciting to a large group of people -- by any means, whether from memory or not -- would be, if the book is copyrighted.

          • by dgatwood ( 11270 )

            But the act of memorizing it and then reciting it to large groups of people for pay probably is.

            Memorizing a book is not infringing. Reciting to a large group of people -- by any means, whether from memory or not -- would be, if the book is copyrighted.

            Similarly, one could reasonably argue that using copyrighted works in training data is not a copyright violation, at least up to the point where inadequate training data size or inadequate limits on is behavior causes it to recite such a work to a large group of people. This lawsuit seems premature.

        • Whether it is derivative or transformative will be decided on a case by case basis. Not all LLM outputs are derivative, LLMs are known for compositional generalization, they can create novel combinations of concepts. The simple fact that they train on huge amounts of text shows that any one piece of training text is not very important for the final result.
        • It might be wise to contemplating finding the complete text and images from the book in the LLM's storage. The images and text may be fragmented; but, it must all be there given the phrasing in the complaint. The real question here is did OpenAI purchase a copy for the AI that was not shared around to other readers? But, even THAT is not a problem. I can share books I purchase with friends and family as long as I do not get remuneration for doing so.

          So-and-so smells money, talks to lawyer who smells money,

        • Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?

          Swap "is" and "may be" and you'd be more on point. There's a lot of grey area in what is a derivative work, and in many cases fair use does in fact apply. /Disclaimer: This post is a derivative work of the Oxford English dictionary.

      • You need to read a book in order to copy it, whether you are a scribe in a monastery or a machine. The claim here is that OpenAI is copying the book and then storing it using a lossy algorithm, which is basically what an LLM is. The question is how lossy should lossy be before it becomes no longer an issue of copyright. The same conversations were had about JPEG a long time ago.

        Of course {Open}AI and its supporters will claim it doesnâ(TM)t have a full copy of anything, but it is a graph database of wo

      • > some clever users of these LLM's have figured out ways to coerce the models to regurgitate significant portions of this IP verbatim

        That sounds complicated and not very precise, LLM regurgitations are known to change some words here and there. Why not simply download the book from a pirate site or from the Books3 dataset they use in training? On top of that, only a few samples can be extracted verbatim, most of them are simply hallucinated based on the prompt. Is this the worst copying machine ever
    • by EvilSS ( 557649 )

      If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?

      "Yes please!" - Text Book Publishers

    • How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?

      How does a machine storing the entire contents of a book differ from an eBook which is also an electronic copy of the entire work? Ultimately I think it will depend on how much of the work is stored. If an AI can recite large passages of a book then it is hard to see that as fair use but, it if only remembers the gist of the story and characters plus a few fragments of text then that sounds much more like a human and so clearly fair use.

    • by r0nc0 ( 566295 )
      Isn't this about getting access to the material in the first place? Did OpenAI purchase a legal copy of each of those books? Are they permitted to use a library pass or equivalent to access those books (IANAL but this seems a violation of terms but I have no idea). Once you have the book you don't need a license to read it again -- unless you bought it an electronic DRM you can no longer access or something, right?
    • How does a machine reading a book fundamentally differ from a human

      For this, we'd have to delve into the fringes of epistemology. What exactly does it mean "to read"? Is this the transfer of written word into concepts in the mind?

      If so, then must not a machine be confirmed to have a mind before it is said to be reading rather than merely observing and using an input (text) to create an output (word probability map)?

      Assuming no mind, does the act of a computerized camera observing text as an input equate to a copyright violation? If so, there are probably millions of c

      • The issue isn't about the AI reading the book or not.

        If I photocopy a book I have clearly violated copyright but absolutely no one would say the photocopier has read it. The machine, be it AI or photocopier is only a device used in the process of copyright violation. They simply make it easier and faster than copying a book by hand with a pen.

        • The issue here is - will we allow copyright holders to extend their rights over all possible texts generated by AI? It is one thing to hold a right over a specific wording or expression of an idea. It is another to hold all other descriptions of the same idea in infringement. Like moving my claims from "my-document.txt" to "*-similar-document.txt". Is this a power grab or just defending against AI competition?
          • That's an interesting framing but I doubt the court will see it that way.

            They fed copyright works into their tool. (Given how many it is unlikely they bought a copy of each). Then they resold that tool's output to generate profit. And despite their protests to the contrary people have been able to extract both PII and large chunks of whole text from those systems making the "we don't store it" defense moot. How is what they're doing any different than what I can do with a photocopier? I apply copyright

    • If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?

      You've never had to buy a license to read a book. Perhaps if you're doing other things in association with your book reading, you might need a license for that but not for just reading.

      How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?

      We've yet to invent a general-purpose AI. These things aren't reading books the way we do. My understanding is that they're basically compiling statistical models, and figuring out how each word or part of word relates to others. Kind of a more complicated version of playing the autocomplete game on your phone, where you

    • by godrik ( 1287354 )

      I do not think a fair use defense will work here.

      There are four factors considered when deciding if a fair use defense applies:
      1/ the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
      Here OpenAI charges for using their LLM, so it is unlikely to be considered nonprofit educational purposes, but rather commercial. Though the purpose is different

      2/ the nature of the copyrighted work;
      Not sure how that plays here.

      3/ the amount and sub

      • 2/ the nature of the copyrighted work;

        Users are allowed more freedom with regard to non-fiction, since the facts therein are uncopyrightable, and the general organization may be as well. Fiction, being more creative, enjoys a bit more protection from potential fair users.

        There is no way a fair use defense will pass muster here. Of course, I'm not a lawyer. But I have talked to a few about the scope of fair use before reusing data in my projects.

        I'm a lawyer. I would suggest you take a look at the court opinions concerning Google Book Search. You may be surprised. Try Authors Guild, Inc. v. Google Inc., 954 F. Supp. 2d 282 (S.D.N.Y. 2013) which was at the trial level, and then Authors Guild, Inc. v. Google Inc., 80

        • by Anonymous Coward
          Google Books is pretty directly distinguishable in that OpenAI can't avail themselves of the initial step being obvious fair use. Format shifting your principal's dead trees to digital form is hardly arguable as infringement. Copying something from the internet likely covered by a terms of use you didn't read let alone comply with and distributing it (hell, probably copying it a few more times) as you mess with it "to train your AI," well, that's a different kettle.

          Unless you're a copyright lawyer I'd av
          • Unless you're a copyright lawyer I'd avoid the "I'm a lawyer" credential in these discussions.

            Oh boy, guess what? Even have an LL.M in IP.

            the initial step being obvious fair use

            There's really no such thing as an obvious fair use. It's always fact-intensive, always case-by-case. And there's always the risk of times changing. Format shifting comes to us from the RIAA v. Diamond case, and it's terrifying to think of how differently that gone had it been litigated just a few years later when the iTunes Music Store was open. I'll agree though that the use of pirated books for training was a bad idea, in that it does not help how a court wi

      • Assume the book is 300K tokens, that comes out to $9, at the brave price of 0.03/1K tokens. It's a steal!
    • For the last time, (just kidding, I'm sure I'll have to type this at least a hundred more times) a machine is not a human. Machines don't have the right to fair use. If a monkey is not a human, than a machine is not a human either, and if a monkey doesn't deserve human rights, neither does a machine, so attempting to classify it as fair use is false equivalence. Your line of questioning is just a subversive attempt at social engineering an end-run around the legal proceedings that would first have to occur

      • When a search engine web crawler indexes a web page and presumably loads its contents into memory for the purpose of indexing, has its creator just created committed a copyright violation? Most would say "no, that's fair use", but by your line of reasoning the machine doing the web crawling doesn't have fair use protections.

        • No, you're misrepresenting my reasoning with another straw-man argument that is also false equivalence, but for what it's worth, the search engine giants have lost plenty of court cases over their scraping of news data from commercial news sources. They do still drive some traffic though so they usually get to adjust their delivery methods slightly in one or two countries and then keep going about "business as usual" but it is absolutely, definitively, not because they didn't violate "fair-use" doctrine and

          • What is your reasoning then, if not that "Machines don't have the right to fair use"?

            Re: scraping news data -

            Those lawsuits have been about scraping data and then presenting takeaways/summaries of it to users on the search page, so they never need to click through to the source website. To my knowledge there haven't been successful lawsuits against search engines over the fact that they indexed a page so that it can appear in search results.

    • Machine and humans are not working the same way this argument is tiresome. LLM need to copy their training data. Also, on top of preventing competition in the same market, the doctrine of fair use does not exist outside US and I doubt that openAI only used US based materials.
    • "If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?"

      Only if you pissed yourself laughing.

  • We read to learn, and an AI must also read to learn. If we want AI to be useful, it has to read /more/ than a human can.

    • Funny thing, Microsoft tried to see what would happen if you train a model on purely synthetic data. So they generated 150B tokens with chatGPT, and trained a model called Phi. Turns out the new model was pretty good and efficient. At some point LLMs will just train on their own outputs. It won't degenerate if it is only done once or the outputs are created with extra help (tools, human in the loop). This pre-training on copyrighted data issue will go away, just growing pains for a new field.
  • by Required Snark ( 1702878 ) on Tuesday February 13, 2024 @09:08PM (#64237856)
    Consider what happens when fan fiction is posted. Take-downs can happen in milliseconds, accounts are deleted, and individuals banned permanently. This can happen even by mistake or maliciously and there is no recourse.

    Now a well funded gang of tech-bro types pillage copywrited work and somehow it's OK. There is one law for the peasants and another for the powerful and connected. This is just one more example of how you have no rights to anything (like your browsing history) and the rich can steal form damn near anyone.

    • by hawk ( 1151 )

      >Consider what happens when fan fiction is posted.
      >Take-downs can happen in milliseconds,

      and almost always, the world is better for this!

      Particularly, those who might have otherwise read the drivel . . .

      hawk

  • I hope they bought her a new one. Business formal attire is expensive.

Some people manage by the book, even though they don't know who wrote the book or even what book.

Working...