OpenAI Gets Some of Sarah Silverman's Suit Cut in Mixed Ruling (bloomberglaw.com) 64
OpenAI must face a claim that it violated California unfair competition law by using copyrighted books from comedian Sarah Silverman and other authors to train ChatGPT without permission. From a report: But US District Judge Araceli Martinez-Olguin on Monday also dismissed a number of Silverman and her coplaintiffs' other legal claims, including allegations of vicarious copyright infringement, violations of the Digital Millennium Copyright Act, negligence, and unjust enrichment. The judge gave the authors the opportunity to amend their proposed class action by March 13 to fix the defects in the complaint.
The core of the lawsuit remains alive, as OpenAI's motion to dismiss, filed last summer, didn't address Silverman's claim of direct copyright infringement for copying millions of books across the internet without permission. Courts haven't yet determined whether using copyrighted work to train AI models falls under copyright law's fair use doctrine, shielding the companies from liability. Although Martinez-Olguin allowed the unfair competition claim to advance, she said the claim could be preempted by the federal Copyright Act, which prohibits state law claims that allege the same violation as a copyright claim.
The core of the lawsuit remains alive, as OpenAI's motion to dismiss, filed last summer, didn't address Silverman's claim of direct copyright infringement for copying millions of books across the internet without permission. Courts haven't yet determined whether using copyrighted work to train AI models falls under copyright law's fair use doctrine, shielding the companies from liability. Although Martinez-Olguin allowed the unfair competition claim to advance, she said the claim could be preempted by the federal Copyright Act, which prohibits state law claims that allege the same violation as a copyright claim.
But... (Score:1, Offtopic)
Her book is the funniest comedian authored book I've ever listened to. The only funny one, actually.
Mostly about her bedwetting. A lot funnier than that sounds.
If it's not fair use (Score:2, Interesting)
If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?
How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?
Am I misreading this, or is Sarah Silverman's argument really that she doesn't want machines reading her work without a pay-per-read license?
Re:If it's not fair use (Score:4, Insightful)
If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?
How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?
Am I misreading this, or is Sarah Silverman's argument really that she doesn't want machines reading her work without a pay-per-read license?
This would cause problems for text-to-speech used by the blind... a machine reading her work and then plagiarising it, reading it out loud to someone? THEIVES!
Re: (Score:2)
This would cause problems for text-to-speech used by the blind... a machine reading her work and then plagiarising it, reading it out loud to someone? THEIVES!
I remember this being a big area of interest about 15 years ago, but there doesn't seem to have been much about it lately. So long as it's done in real time and just for the reader, and not an audience, I would be inclined to say that it's not infringing the reproduction right, derivative right, or most significantly, the public performance right.
Re: (Score:2)
This would cause problems for text-to-speech used by the blind... a machine reading her work and then plagiarising it, reading it out loud to someone? THEIVES!
I remember this being a big area of interest about 15 years ago, but there doesn't seem to have been much about it lately. So long as it's done in real time and just for the reader, and not an audience, I would be inclined to say that it's not infringing the reproduction right, derivative right, or most significantly, the public performance right.
As for training LLM's, honestly I don't see that (outside of bugs or faults in that software where it inadvertently and not by design, regurgitates training data) the output of an LLM is anything more than an opinion or review, and hence not a 'derivative work' of any kind. And, therefore, it should be considered fair use. Not a lawyer but I wish I was because its basically a license to print money.
Re: (Score:2)
Re: (Score:1)
It's not magic. Those reproductions are proof they used originals somewhere.
Re: (Score:3)
Do you violate copyright every time you remember a fact you learned from a book in school? Does watching Bob Ross videos mean every landscape painting you create violate copyright? Does visualizing the periodic table of elements in your head mean you're violating the copyright of the creator of the poster you remember?
Copyright means you cannot take the original and reproduce it exactly. Using it for derivative works is allowed, otherwise the first person to draw a horse would claim copyright for all hor
Re: (Score:1)
Your first sentence starts with more false equivalence, so I didn't bother reading the rest. See my other post under this story.
Re:If it's not fair use (Score:5, Interesting)
"How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?"
Interesting question. It is well known that there are tons of copyrighted intellectual property (IP) embedded in the datasets used to train LLM's. And I think it is also known that some clever users of these LLM's have figured out ways to coerce the models to regurgitate significant portions of this IP verbatim (more or less), which could (theoretically) violate the "fair use" standards of copyright laws.
So the administrators and governments will probably see the need to create additional regulations, rules and laws to minimize the impact of this problem.
Re:If it's not fair use (Score:5, Insightful)
The act of reading a book is not copyright violation. A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation. But the act of memorizing it and then reciting it to large groups of people for pay probably is. Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?
Re: (Score:2)
A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation.
Don't forget Amazon was sued for doing this with their Kindle. https://www.theguardian.com/te... [theguardian.com]
Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?
What constitutes a derivative work though? A quote? An analysis? Using the ideas of a book in abstract to answer a question the book touches on?
Re: (Score:3)
What constitutes a derivative work though? A quote? An analysis? Using the ideas of a book in abstract to answer a question the book touches on?
Gotcha covered:
17 USC 501(a): Anyone who violates any of the exclusive rights of the copyright owner as provided by sections 106 through 122 ... is an infringer of the copyright ....
17 USC 106: Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following: ... to prepare derivative works based upon the copyrighted work ....
17 USC 101: A âoederivative workâ is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a âoederivative workâ.
So there's your answer.
A quote is not a derivative work because it's not based on a preexisting work. Instead, that's a reproduction of part of the work (a separate exclusive right under 106, however). A literary analysis is not a derivative work, but if you dug too deep and merely produced an annotated work or adaptation, then it would be. It's not too hard to stay on the correct side of that line.
Using the ideas of a book in abstract to answer a question the book touches on?
Ideas can always be used. Facts -- or things claimed to be a real-life fact -- can always b
Re: (Score:2)
If an infringing use is also a fair use, it is rendered non-infringing, and that's the end of the analysis. There are no additional steps after fair use.
Re: (Score:2)
It is?
I assume that you're replying to the part about fair use turning prima facie infringements into non-infringing uses, but it's difficult to tell.
If so, well, that's the statute at work:
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work ... is not an infringement of copyright.
You don't think it's at all strange that the Supreme Court insists on explicit analysis of all four fair use factors, yet if while doing so you refer or even rely on the work's transformative nature, hey, don't sweat thinking about the right to prepare derivative works, close enough I'm sure.
No factor is determinative in fair use, even though often enough the fourth factor is. Always do the full test, every time. And recognize that much comes down to which side of the bed the judge got up on. It's not as bad as the copyright utility doctrine and conceptual separability, but it's not great. Look at time shifting;
Re: (Score:2)
A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation.
Don't forget Amazon was sued for doing this with their Kindle. https://www.theguardian.com/te... [theguardian.com]
Amazon caved because they lacked sufficient concern for the blind, and felt that they had to suck up to the Authors Guild or else they would lose content. Right after that, Arizona State University got sued by blind users [arstechnica.com] for violating the ADA because they used Kindle in their classes, and shortly thereafter, the DOJ encouraged three other universities to stop using Kindle [slashdot.org].
I'm surprised nobody has sued Amazon over that decision, because it looks like a pretty deliberate action on Amazon's part that enabled
Re: (Score:2)
The act of reading a book is not copyright violation. A machine that reads a book to someone who has the right to read the book but can't because they're blind isn't copyright violation. But the act of memorizing it and then reciting it to large groups of people for pay probably is.
I don't see any evidence that OpenAI is trying to have ChatGPT recite memorized work. Outside of some fairly specific queries that would probably be a non-useful output.
Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?
I don't think that's what derivative means from a legal sense. It's not just copying someone's style, you need to copy a specific work.
Consider all the satirical news shows that spun off from the Daily Show (often on different networks). You might consider them "derivative" in the creative sense, but none of them got sued by Comedy Central.
Re: (Score:2)
You can't copyright the concept of "a late night comedy show".
Someone else posted the legal definition of derivative work or you can google it. That will clear up your understanding of the term. And keep in mind it is a legal term and will be determined by a court and judges do not play words games like "how is me reading a book and memorizing it not also a copyright violation?". That kind of sophomoric shit will get a lawyer crushed by the judge.
Re: (Score:2)
But the act of memorizing it and then reciting it to large groups of people for pay probably is.
Memorizing a book is not infringing. Reciting to a large group of people -- by any means, whether from memory or not -- would be, if the book is copyrighted.
Re: (Score:2)
But the act of memorizing it and then reciting it to large groups of people for pay probably is.
Memorizing a book is not infringing. Reciting to a large group of people -- by any means, whether from memory or not -- would be, if the book is copyrighted.
Similarly, one could reasonably argue that using copyrighted works in training data is not a copyright violation, at least up to the point where inadequate training data size or inadequate limits on is behavior causes it to recite such a work to a large group of people. This lawsuit seems premature.
Re: (Score:2)
Re: (Score:2)
It might be wise to contemplating finding the complete text and images from the book in the LLM's storage. The images and text may be fragmented; but, it must all be there given the phrasing in the complaint. The real question here is did OpenAI purchase a copy for the AI that was not shared around to other readers? But, even THAT is not a problem. I can share books I purchase with friends and family as long as I do not get remuneration for doing so.
So-and-so smells money, talks to lawyer who smells money,
Re: (Score:2)
Even the act of making a derivative work and then selling it is a violation, and that's what OpenAI is alleged to be doing, is it not?
Swap "is" and "may be" and you'd be more on point. There's a lot of grey area in what is a derivative work, and in many cases fair use does in fact apply. /Disclaimer: This post is a derivative work of the Oxford English dictionary.
Re: If it's not fair use (Score:2)
You need to read a book in order to copy it, whether you are a scribe in a monastery or a machine. The claim here is that OpenAI is copying the book and then storing it using a lossy algorithm, which is basically what an LLM is. The question is how lossy should lossy be before it becomes no longer an issue of copyright. The same conversations were had about JPEG a long time ago.
Of course {Open}AI and its supporters will claim it doesnâ(TM)t have a full copy of anything, but it is a graph database of wo
Re: (Score:2)
That sounds complicated and not very precise, LLM regurgitations are known to change some words here and there. Why not simply download the book from a pirate site or from the Books3 dataset they use in training? On top of that, only a few samples can be extracted verbatim, most of them are simply hallucinated based on the prompt. Is this the worst copying machine ever
Re: (Score:2)
If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?
"Yes please!" - Text Book Publishers
Depends on what is remembered (Score:2)
How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?
How does a machine storing the entire contents of a book differ from an eBook which is also an electronic copy of the entire work? Ultimately I think it will depend on how much of the work is stored. If an AI can recite large passages of a book then it is hard to see that as fair use but, it if only remembers the gist of the story and characters plus a few fragments of text then that sounds much more like a human and so clearly fair use.
Eidetic Memory (Score:3)
Yeah, %*@$ people with eidetic memories.
You might want to look up exactly what eidetic memory [wikipedia.org] is because it is not what you seem to think it is. The ability to briefly look at a page and then recall everything on it is more correctly called photographic memory and, so far, it has never been proven to exist.
Rote memorization and recitation used to be a desirable trait in learned folk;
Yes back in the time before printing was cheap enough to render the effort pointless. As for Monty Python fans being the closest we have today, I think there are a lot of actors out there who would strenuously disagree with you, but I'm sure
Re: (Score:2)
Re: If it's not fair use (Score:2)
I'll like to revoke every degree issues in the past 2 decades for anyone who used pirated textbooks to obtain their degrees.
Re: (Score:2)
Re: (Score:2)
How does a machine reading a book fundamentally differ from a human
For this, we'd have to delve into the fringes of epistemology. What exactly does it mean "to read"? Is this the transfer of written word into concepts in the mind?
If so, then must not a machine be confirmed to have a mind before it is said to be reading rather than merely observing and using an input (text) to create an output (word probability map)?
Assuming no mind, does the act of a computerized camera observing text as an input equate to a copyright violation? If so, there are probably millions of c
Re: (Score:2)
The issue isn't about the AI reading the book or not.
If I photocopy a book I have clearly violated copyright but absolutely no one would say the photocopier has read it. The machine, be it AI or photocopier is only a device used in the process of copyright violation. They simply make it easier and faster than copying a book by hand with a pen.
Re: (Score:2)
Re: (Score:2)
That's an interesting framing but I doubt the court will see it that way.
They fed copyright works into their tool. (Given how many it is unlikely they bought a copy of each). Then they resold that tool's output to generate profit. And despite their protests to the contrary people have been able to extract both PII and large chunks of whole text from those systems making the "we don't store it" defense moot. How is what they're doing any different than what I can do with a photocopier? I apply copyright
Re: (Score:3)
If it's not fair use, and the court rules that it's not, do we then have to buy a new license every time we want to read a book once more?
You've never had to buy a license to read a book. Perhaps if you're doing other things in association with your book reading, you might need a license for that but not for just reading.
How does a machine reading a book fundamentally differ from a human, and why would the act of reading constitute a copyright violation?
We've yet to invent a general-purpose AI. These things aren't reading books the way we do. My understanding is that they're basically compiling statistical models, and figuring out how each word or part of word relates to others. Kind of a more complicated version of playing the autocomplete game on your phone, where you
Re: (Score:2)
I do not think a fair use defense will work here.
There are four factors considered when deciding if a fair use defense applies:
1/ the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
Here OpenAI charges for using their LLM, so it is unlikely to be considered nonprofit educational purposes, but rather commercial. Though the purpose is different
2/ the nature of the copyrighted work;
Not sure how that plays here.
3/ the amount and sub
Re: (Score:2)
2/ the nature of the copyrighted work;
Users are allowed more freedom with regard to non-fiction, since the facts therein are uncopyrightable, and the general organization may be as well. Fiction, being more creative, enjoys a bit more protection from potential fair users.
There is no way a fair use defense will pass muster here. Of course, I'm not a lawyer. But I have talked to a few about the scope of fair use before reusing data in my projects.
I'm a lawyer. I would suggest you take a look at the court opinions concerning Google Book Search. You may be surprised. Try Authors Guild, Inc. v. Google Inc., 954 F. Supp. 2d 282 (S.D.N.Y. 2013) which was at the trial level, and then Authors Guild, Inc. v. Google Inc., 80
Re: (Score:1)
Unless you're a copyright lawyer I'd av
Re: (Score:2)
Unless you're a copyright lawyer I'd avoid the "I'm a lawyer" credential in these discussions.
Oh boy, guess what? Even have an LL.M in IP.
the initial step being obvious fair use
There's really no such thing as an obvious fair use. It's always fact-intensive, always case-by-case. And there's always the risk of times changing. Format shifting comes to us from the RIAA v. Diamond case, and it's terrifying to think of how differently that gone had it been litigated just a few years later when the iTunes Music Store was open. I'll agree though that the use of pirated books for training was a bad idea, in that it does not help how a court wi
Re: (Score:2)
Personally, I like their chances. Because you're focusing on the output.
No, because the data is incorporated as weights, not as text, and because both the storage form and the output make it pretty clearly transformative use, IMO.
In my mind, using copyrighted works as training data doesn't seem like it should be a copyright violation, though it is easily possible that if someone took an LLM trained with copyrighted works and created something that incorporates too much of a single copyrighted work, then distributing the resulting new work might be a copyright violation (and ost
Re: (Score:2)
Re: (Score:2)
For the last time, (just kidding, I'm sure I'll have to type this at least a hundred more times) a machine is not a human. Machines don't have the right to fair use. If a monkey is not a human, than a machine is not a human either, and if a monkey doesn't deserve human rights, neither does a machine, so attempting to classify it as fair use is false equivalence. Your line of questioning is just a subversive attempt at social engineering an end-run around the legal proceedings that would first have to occur
Re: (Score:2)
When a search engine web crawler indexes a web page and presumably loads its contents into memory for the purpose of indexing, has its creator just created committed a copyright violation? Most would say "no, that's fair use", but by your line of reasoning the machine doing the web crawling doesn't have fair use protections.
Re: (Score:1)
No, you're misrepresenting my reasoning with another straw-man argument that is also false equivalence, but for what it's worth, the search engine giants have lost plenty of court cases over their scraping of news data from commercial news sources. They do still drive some traffic though so they usually get to adjust their delivery methods slightly in one or two countries and then keep going about "business as usual" but it is absolutely, definitively, not because they didn't violate "fair-use" doctrine and
Re: (Score:2)
What is your reasoning then, if not that "Machines don't have the right to fair use"?
Re: scraping news data -
Those lawsuits have been about scraping data and then presenting takeaways/summaries of it to users on the search page, so they never need to click through to the source website. To my knowledge there haven't been successful lawsuits against search engines over the fact that they indexed a page so that it can appear in search results.
Re: (Score:1)
Re: (Score:2)
Only if you pissed yourself laughing.
A human must read copyrighted works to learn, how (Score:2)
We read to learn, and an AI must also read to learn. If we want AI to be useful, it has to read /more/ than a human can.
Re: (Score:2)
You try posting derivative work (Score:4, Funny)
Now a well funded gang of tech-bro types pillage copywrited work and somehow it's OK. There is one law for the peasants and another for the powerful and connected. This is just one more example of how you have no rights to anything (like your browsing history) and the rich can steal form damn near anyone.
Re: (Score:2)
>Consider what happens when fan fiction is posted.
>Take-downs can happen in milliseconds,
and almost always, the world is better for this!
Particularly, those who might have otherwise read the drivel . . .
hawk
OpenAI Gets Some of Sarah Silverman's Suit Cut (Score:2)
I hope they bought her a new one. Business formal attire is expensive.