Sarah Silverman Hits Stumbling Block in AI Copyright Infringement Lawsuit Against Meta (hollywoodreporter.com) 93
Winston Cho writes via The Hollywood Reporter: A federal judge has dismissed most of Sarah Silverman's lawsuit against Meta over the unauthorized use of authors' copyrighted books to train its generative artificial intelligence model, marking the second ruling from a court siding with AI firms on novel intellectual property questions presented in the legal battle. U.S. District Judge Vince Chhabria on Monday offered a full-throated denial of one of the authors' core theories that Meta's AI system is itself an infringing derivative work made possible only by information extracted from copyrighted material. "This is nonsensical," he wrote in the order. "There is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs' books."
Another of Silverman's arguments that every result produced by Meta's AI tools constitutes copyright infringement was dismissed because she didn't offer evidence that any of the outputs "could be understood as recasting, transforming, or adapting the plaintiffs' books." Chhabria gave her lawyers a chance to replead the claim, along with five others that weren't allowed to advance. Notably, Meta didn't move to dismiss the allegation that the copying of books for purposes of training its AI model rises to the level of copyright infringement. In July, Silverman and two authors filed a class action lawsuit against Meta and OpenAI for allegedly using their content without permission to train AI language models.
Another of Silverman's arguments that every result produced by Meta's AI tools constitutes copyright infringement was dismissed because she didn't offer evidence that any of the outputs "could be understood as recasting, transforming, or adapting the plaintiffs' books." Chhabria gave her lawyers a chance to replead the claim, along with five others that weren't allowed to advance. Notably, Meta didn't move to dismiss the allegation that the copying of books for purposes of training its AI model rises to the level of copyright infringement. In July, Silverman and two authors filed a class action lawsuit against Meta and OpenAI for allegedly using their content without permission to train AI language models.
What a gain for AI (Score:1)
Now AI can digest immature, boring fart jokes.
Re: (Score:3)
Re: (Score:2)
and let them pass?
Well played, and thanks for the laugh - now where are those damned mod points when I need them?
Re: (Score:3)
I dislike her stand up for that reason, nothing but fart and talking vagina idiocy. But her book is hysterical, and The Sarah Silverman Program was one of the funniest things I've ever watched! I seriously loved both!
Re: (Score:2)
Re: (Score:2)
Kaka-doody! I definitely get it!
I don't understand this (Score:4, Interesting)
Author: You trained your AI on my books!
Meta: We bought your book and our AI read it. So?
And...where do we go from here? If they purchased a copy of the books, the AI "reads" the books and can quote limited amounts. Just like people. WTF is the issue?
Re: (Score:2)
And if they did? Sarah's book is available in print -- for which scanners are available, digitally as Kindle, and an audio book. All of these are trivial for a computer to consume.
Re: (Score:2)
Which case law are you relying on for the proposition that format shifting or backup duplication may be undertaken for more than personal use?
Probably Authors Guild, Inc. v. Google, Inc. [wikipedia.org], which, remember, Google won at trial.
Re: (Score:2)
Only one real factor applies, and that's the effect upon the potential market, and not really directly in any of the existing lawsuits.
When someone opens up a "generate a fiction book" service, that becomes questionable.
But ultimately, complaining about an LLM digesting a book is like complaining that the neurons in your head are now slightly altered due to you reading some book, and that author now gets to
Re: (Score:2)
As to whether or not a factor applies to the current argument is not analogous to deciding whether or not the factors are interdependent.
Re: (Score:2)
As to whether or not a factor applies to the current argument is not analogous to deciding whether or not the factors are interdependent.
That all must be explored and weighed together (not independently) does not imply that all apply to the current case. That does not follow, and precedent does not say that.
You're conflating 2 separate concepts.
See Betamax. I'm certain you're familiar with the case.
Re: I don't understand this (Score:2)
Re: (Score:2)
Do you imply AI can't parse formats used for e-books?
Re: (Score:3)
Most humans do not have photographic memories.
Neither does a deep learning network.
The database of a billion dollar company could easily last much longer than a human life.
How is that relevant to copyright law?
Re: (Score:2)
The databases used for training the AI contain copies of copyrighted works, and those copies are being used to create a commercial product.
Turns out, that's not a copyright violation, provided the commercial product being creates is transformative rather than derivative. [wikipedia.org]
Re: (Score:2)
Databases are not used to train AI, data sets are. And neither databases nor data set are part of any complaint, only the training itself.
"That is the only point that these copyright suits have working in their favour."
Only when you don't understand. If there were a copyright issue in storing copyrighted works, then let the copyright holders sue for that.
Re: (Score:2)
>> Neither does a deep learning network.
>Prove it.
Prove a human doesn't either but just can't recall it word for word. In fact it probably easier to prove it for AI since we have a better understanding of that than the human brain.
Re: (Score:1)
Re: (Score:2)
Deep learning networks do not "memorize", just because you don't know that doesn't burden others with proving it to you. You're the one thinking that neural networks have photographic memories. They would NOT work as designed if they did.
Re: (Score:2)
Prove it.
This is flatly simple.
A NN is simply a big network of sigmoid functions and weights.
It cannot possibly memorize any input in any form that would make any sense to anyone.
The original argument was that a computer "reading" a book is the same as a human reading a book.
They are grossly analogous.
When was a human reading a book ever considered a copyright violation?
Mmhmm ;)
Copying words from a book into a computer database has always been a copyright violation, does not matter if it is a fancy database that keeps a lot more than just the words themselves or even if the database is lossy.
Well that's just not true. I thought you were an IP lawyer.
You have format-shift fair use rights.
And in this case, the argument that the format-shift is utterly destructive of any identifying properties of the source material will be compelling.
MP3s are lossy but making your own copy and giving it away is still a copyright violation.
MP3s reproduce a facsimile of the input.
An NN does not.
A
Re: (Score:2)
Silverman and company are claiming that any use produced by the ANN is a derivative work of theirs.
Betamax clearly established that there is no liability for the format shifter if it has significant non-infringing uses.
I.e., if you're not using MyFavoriteLLMGoesHere to produce works that compete with the person who's works may have gone into training it, then there is simply no argument at all.
A Betamax makes a copy o
Re: (Score:2)
It's like claiming copyright over a painting I made because I may have seen yours at some point in my life. It's fucking absurd.
Bravo, man!!!
Re: (Score:2)
Most humans do not have photographic memories.
Neither does a deep learning network.
The database of a billion dollar company could easily last much longer than a human life.
How is that relevant to copyright law?
Both are extremely salient points, sir!
Re: (Score:2)
>Most humans do not have photographic memories.
Disney will now charge you more based on how good your memory is. People with photographic memories will pay up to 10x the normal rate.
Re: (Score:2)
Most humans do not have photographic memories.
That is irrelevant. We don't go after humans who do have photographic memories either. What we do is go after created infringing works produced by said humans. Your abilities are irrelevant in this case. Sarah Silverman is upset that a computer read her book. There's no legal grounds for that. She has yet to show an example of an actual infringing result of what the computer did with its "knowledge".
Re: (Score:2)
All of that is true, but pointing out that neural networks do not store exact copies was done specifically to refute a significant misunderstanding, along with all the bad assumptions that flow from it.
Re: (Score:2)
There are different licenses for commercial use or "reading the book" over personal use. They likely need a license to use a book for commercial purposes, which they are clearly doing.
Re: (Score:2)
Re: (Score:2)
If a literature major reads a book, is that commercial use? Or only after they get a degree?
Prove that an LLM ever "uses" its "reading the book". Prove it even knowing how training data is used.
Re: (Score:2)
Commercial Use: Any use of copyrighted material for commercial purposes, such as selling, distributing, or using it in a business context, typically requires permission from the copyright owner. Commercial use without permission may lead to legal consequences.
I'll leave it to you to decide the difference between a literature major and the major corporation OpenAI.
Re: (Score:2)
If I were a budding standup comic and bought Silverman's books to expose myself to her style of comedy, do I need to buy it under a different kind of license?
No. Silverman would have to sue me if/when I create material that's an infringing reproduction of her work and include it in my Netflix special.
Re: (Score:2)
Re: (Score:2)
AI is not human not sentient. A program that uses copyrighted property to build its functionality without compensation to the rights holder is committing theft.
Re: (Score:1)
What is sentient is still needs to be defined. NNs work similar to human brains, so ....
We do not charge writers extra for books just because they can learn from what they read.
Re: (Score:1)
Yea, but we also don't give human rights to machines. And human rights include being fairly compensated for your work... be careful where you tread with this argument, it's going to loop back on itself and eat your point.
Re: (Score:1)
Truly, I do not want to spiral down a hole here. Laws are complicated thing. But considering that NNs do not explicitly duplicate the information, bu
Re: (Score:2)
Well, you got one thing right. I guess you think sentient beings do not commit theft but non-sentient ones do?
Re: (Score:1)
Come on, the computer didn't go shopping for books to read on its own. It was fed them by a human who knew full well he was breaking the law.
Re: (Score:2)
If I buy her book, then go on a comedy tour using her jokes in that book....that's an issue.
Re: (Score:3)
Ask Disney. They are not too happy that Bing's AI image generation can produce Disney characters and the Disney logo. Bing tried to stop it by blocking the word "Disney" from prompts, but if you just say "a popular children's cartoon company whose name starts with D and ends in Y logo" and "a cartoon mermaid with red hair and a shell bikini" you get a picture of Ariel with their logo on it.
The key to these lawsuits is going to be showing that if you give the AI the right prompt, it produces copyright infrin
Re: (Score:2)
There are claims Github's Copilot exactly reproduces chunks of code it was fed as training data. Lawsuit is still on-going
Re: (Score:2)
It's a chatbot, not AI.
The chatbots shouldn't be smarter than you, but here we are.
ghost-writing (Score:4, Interesting)
You are probably wrong about the ghost writing thing. Comedians know how to write things. It's easy. The Bedwetter was published by Harper, so they probably had an editor make it flow better and give feedback about what belongs and what doesn't belong. That actually is a major and time-consuming task (FYI I've been a book editor with 15+ years experience). Other celebrities -- like athletes, politicians etc. might have difficulty doing the writing part, so they team up with someone who interviews them and transcribes their answers. But it's unlikely for this to happen here.
Re: (Score:3)
Sarah Silverman may be wrong on the AI lawsuit, but she's not dumb, she's not "washed up" and it should be noted that many comedians are employed as professional writers. The OP made about the most stupid comment possible.
Re: (Score:2)
She may or may not be dumb, I've known lots of stupid funny people. She's not washed up. But she IS acting dumb when it comes to this AI shit, she doesn't even vaguely know how it works and that shows every time she talks about it. This has been coming for a long time and these authors and other creators are just now discovering the whole idea. It's definitely dumb to restrict yourself to your own little world and not pay attention to anything going on outside of it. That just leaves you with a high chance
Copyright purpose (Score:2)
Preventing training using copyrighted materials would be counter to the stated purpose of US Copyright.
Re: (Score:2)
samples of what qualifies as "bad" are useful in training
Re: (Score:2)
You just need to wait for that "limited time" to expire.
It's a shame that "limited time" now means "longer than anyone has ever lived"
AI or HUMAN is irrlevant. (Score:3)
If I buy a book and keep it my brain and use the knowledge from it to make decisions I don't owe the author any additional money.
For that matter if I start an autoshop repairing cars and read library books to gain the knowledge I STILL don't ow the author a dime.
AI vs HUMAN is 100% irrelevant.
Re: (Score:2)
I agree and offer another example:
When I do research, I most often use the Internet, where I find enormous amounts of information that does not belong to me. I may quote snippets to serve as a reference or I may make my own conclusions based directly on source material, be it news, chatbots, YouTube, or others.
Because presentations are based entirely on what I glean from the Internet, I'm sure I'm violating some IP pseudo-copyright if the law is stretched far enough.
I think this person's search for copyrigh
Re: (Score:2)
If I read the Linux code and write my own operating system, is it under the GPL?
If an AI reads open-source code and then spits out a program I use in a proprietary program, am I violating any open source licenses?
If co
Re: (Score:2)
If I read the Linux code and write my own operating system, is it under the GPL?
That depends. The question you and the OP posing are fundamentally different. The OP has done nothing with their knowledge. You have produced something with yours. Now there's a specific case to investigate. Does what you *produced* specifically violate copyright. So far no claims have been made that any production has violated a copyright in the books case, rather the claim is being made that the act of reading is the copyright infringement.
Re: (Score:2)
If I read the Linux code and write my own operating system, is it under the GPL?
If an AI reads open-source code and then spits out a program I use in a proprietary program, am I violating any open source licenses?
Modality is irrelevant. Whether or not a work is derivative or not depends on the judgement of the court.
If by unlucky circumstance you happen to produce something someone else has copyrighted whether you had any prior knowledge, whether it came from your mind or an AI model, whether you previously had any exposure to the work are all mostly irrelevant. If the work is deemed copyrightable and a court deems your work close enough as to be judged as a copy/derivative you lose.
Re: (Score:2)
Actually to be convicted of copying something in a copyright case does take being exposed to the work that you are accused of copying. For common works it can be a hard defence to claim that you never were exposed to it. Consider how the IBM PC BIOS was reversed engineered by Phoenix using a clean room approach. To quote https://en.wikipedia.org/wiki/... [wikipedia.org],
Re: (Score:2)
>If I read the Linux code and write my own operating system, is it under the GPL?
Only if you, I don't know, release it as GPL. Or actually use code from the GPL'd Linux project...
If I felt masochistic enough, in theory I could re-write Linux from scratch, implementing all of the functions of the kernel and OS without using actual GPL'd code, I could then release it as software that is as closed source as Windows is. You know, exactly like Linux was originally written as a clone of UNIX...
>If an AI rea
Re: (Score:2)
According to /. logic, if you repair that car using TOOLs, then because the tools are not sentient, the tools are committing theft.
AI is different than human interpretation. (Score:1)
Re: (Score:2)
If I buy a book and keep it my brain and use the knowledge from it to make decisions I don't owe the author any additional money.
That is sane and currently correct; however, look at what is going on around you. We are leaving the enlightened era and moving into very dark times. The progeny of your progeny will have to pay the author every time that knowledge is used. (of course, that author will have been long dead, but a corporation will be there to collect the dues)
Somewhere along the line, humanity sold its soul (and not for rock and roll either!). Boring and depressing.
Re: (Score:2)
If I build a machine that uses the knowledge of books I've bought to make decisions based on the information in that book, then sell the machine to someone else, do I owe the creator of the information my machine used anything? Without that information used to create it, it would be useless and no one would buy it.
Custom query to force infringement (Score:4, Interesting)
Meta AI model is equivalent to a library of congress where all pages (or even words, sentences, paragraphs) are stored in random order and a massive index to query them (this is a just a crude analogy to only illustrate the copyright issue discussed here).
What court seems to imply is that storing information this way by itself is not a copyright infringement and the authors have not demonstrated that it is possible to query in a way to get results containing more than what fair copyright laws allow.
Could someone craft a query which can result in getting output which can be viewed as copyrighted material from her books? In that case, you can prove the harm done.
Re:Custom query to force infringement (Score:4, Informative)
Meta AI model is equivalent to a library of congress where all pages (or even words, sentences, paragraphs) are stored in random order and a massive index to query them (this is a just a crude analogy to only illustrate the copyright issue discussed here).
Actually, that's a very accurate description of Authors Guild, Inc. v. Google, Inc. [wikipedia.org] (which Google won at trial (and won on appeal as well) on the basis that a) they weren't returning parts of books large enough to not qualify as fair use, and b) the searchable index was not the same thing as the books themselves, and was thus transformative (which does not require permission), not derivative (which does)).
I suspect that case was cited by the defense. Extensively.
Re: (Score:2)
Could someone craft a query which can result in getting output which can be viewed as copyrighted material from her books? In that case, you can prove the harm done.
Could someone set up a series of rules to produce copyrighted material from a dictionary?
Or as @pjayevans put it:
Nice book. Too bad it was all plagiarized from the dictionary.
Re: (Score:2)
Training an AI is about generating information. That generated information is then stored, but the training data is NOT stored. The AI does not "store" the training data, it learns how to make inferences by being "taught" by the training data.
Re: (Score:2)
And therefore, it's impossible to "remove" a single book from the model, just as when you calculate an average of 10 numbers, discard the original 10 values and then want to recalculate the average with one of the numbers removed. Right?
Re: (Score:2)
It's such a crude analogy that it's worthless. It's worse than worthless, it's wrong.
Also, if an amount that was reproducible didn't matter, there could be nothing left to copyright. 7 bit ASCII would cover all possible works.
Don't twist copyright to justify theft (Score:2)
If Microsoft Word scanned every page of Stephen King's library of works, and used it to formulate help articles and suggest sentences while you type, Mr. King is entitles to compensation.
Fair use does not apply when intellectual property is used for profit, as Vanilla Ice found out in the David Bowie lawsuit.
Re: (Score:2)
"If Microsoft Word scanned every page of Stephen King's library of works, and used it to formulate help articles and suggest sentences while you type, Mr. King is entitles to compensation."
False, unless "Mr. King" actually wrote the suggestions.
"Fair use does not apply when intellectual property is used for profit..."
Yes it does.
"...as Vanilla Ice found out in the David Bowie lawsuit."
LOL now we know not only that you're wrong, but why you're wrong.
News flash! (Score:2)
Everyone who has written a book read books first, that doesn't make them copyright infringers.
Same goes for artists appreciating art.
All works somehow build upon what came before and that's a good thing. Get over it already!
It's not intelligent - just like Sarah Silverman (Score:1)
Imagine if you made a thing (so it's been created by you)... and that makes it artificial. Sarah Silverman was created by her parents.
Imagine if the think you made isn't sentient, self-aware, or capable of it. That means it's not intelligent. Sarah Silverman is not intelligent.
How can one non-AI claim another non-AI is somehow infringing rights a non-AI has?
Boggles the mind. Good thing the judge is intelligent and not artificial.
Just put some AI generated stuff in there (Score:2)
Model collapse will teach the thieving scum at Meta.
Plagerization is Legal if You Hide Behind Machine? (Score:2)
Re: (Score:2)
Well, if the machine produced an exact copy, or even approximate copy, that argument might have been relevant. But that's not what's happening.
A kid's homework about a book (or any other resource) would include reading the book, summing it up and quoting parts from it. That's completely within fair use. If it wasn't, the school system wouldn't have been able to exist. That's also what the AI is doing, reading the source material and using it both to learn language and produce answers.