Microsoft Accuses the New York Times of Doom-Mongering in OpenAI Lawsuit (engadget.com) 55
Microsoft has filed a motion seeking to dismiss key parts of a lawsuit The New York Times filed against the company and Open AI, accusing them of copyright infringement. From a report: If you'll recall, The Times sued both companies for using its published articles to train their GPT large language models (LLMs) without permission and compensation. In its filing, the company has accused The Times of pushing "doomsday futurology" by claiming that AI technologies pose a threat to independent journalism. It follows OpenAI's court filing from late February that's also seeking to dismiss some important elements on the case.
Like OpenAI before it, Microsoft accused The Times of crafting "unrealistic prompts" in an effort to "coax the GPT-based tools" to spit out responses matching its content. It also compared the media organization's lawsuit to Hollywood studios' efforts to " stop a groundbreaking new technology:" The VCR. Instead of destroying Hollywood, Microsoft explained, the VCR helped the entertainment industry flourish by opening up revenue streams. LLMs are a breakthrough in artificial intelligence, it continued, and Microsoft collaborated with OpenAI to "help bring their extraordinary power to the public" because it "firmly believes in LLMs' capacity to improve the way people live and work."
Like OpenAI before it, Microsoft accused The Times of crafting "unrealistic prompts" in an effort to "coax the GPT-based tools" to spit out responses matching its content. It also compared the media organization's lawsuit to Hollywood studios' efforts to " stop a groundbreaking new technology:" The VCR. Instead of destroying Hollywood, Microsoft explained, the VCR helped the entertainment industry flourish by opening up revenue streams. LLMs are a breakthrough in artificial intelligence, it continued, and Microsoft collaborated with OpenAI to "help bring their extraordinary power to the public" because it "firmly believes in LLMs' capacity to improve the way people live and work."
Who'd have thought? (Score:1)
Just a (minor) rant (Score:5, Insightful)
OpenAI is not "open" at all. (Neither is it "AI", but that's beside the point.)
They refuse to even let a person play with a prompt without first given a non-VOIP phone number for tracking purposes.
They refuse to expose the algorithms underlying the model.
In a perfect world, Microsoft would be mauled and eaten by bears.
--
I stick my neck out for everybody. [With apologies to Humphrey Bogart, "Casablanca".]
Re: (Score:3)
I wouldn't inflict that foul taste on some poor innocent bears.
Re: (Score:2)
WaySmarter,
Did I say "bears"? I meant "chairs"... electric chairs.
(Damned autocorrect.)
Sincerely,
Mrs. Velma Fissbinder
Porridge, Arkansas
--
Why the World Should Be Run by the Banking Cartel [rumble.com]
Re: (Score:2)
Lol
Let 'em go at it (Score:3)
They can peck at each other until they're both raw to the benefit of most of us, and I say this in the best possible way.
Re: (Score:2)
Re: (Score:3)
I don't know why your response made me think this, but:
The only thing that stops a bad guy with an AI model is a good guy with an AI model.
In this case, the NYT appears to have used Microsoft's own AI model to prove that it's being ripped off by Microsoft. I don't care what the prompt was. Well done.
(*golf clap*)
Re: (Score:2)
Only if we have a good software company and a good newspaper waiting to fill in the void they leave behind.
Why would we replace shit with something good? Replace it with more shit! That would be much more profitable!
Wikipedia (Score:1)
The suit is without merit (Score:1)
People learn by studying the work of others, and have been doing it for all of history
We need less IP laws, not more
That said, people who use AI for malicious purposes are a very serious threat, and we need strong defenses
Re: (Score:2, Insightful)
People learn by studying the work of others
That's right. They studied the works of others. They did regurgitate word for word or line by line what they studied. If they did, they cited their documentation.
Re: (Score:2)
If the defense can sustain the allegation that the prompts were specifically crafted to pull up entire articles verbatim, their job is done. There's lots of case law saying that because tool can be used illegally doesn't make the tool itself illegal.
Re: (Score:2)
A lot of web sites legitimately license their content (which is common practice throughout the "news" media). That's what the defense is talking about - the articles at issue are all over the place. Depending on how the training material is weighted, that could make it a lot easier to get the AI to regurgitate it verbatim. But even then (according to the defense), the investigators had to work at it.
Fuck the NY Times (Score:3)
It would be so nice if the effort for accountability were NOT being led by a paper whose opinion pages propagate the false conservative-liberal divide, that publishes less and less reportage of any significance and that manipulates its readers the same way that social media do. It's not easy to root for the asshole, but, it seems, one has no choice when both sides are assholes.
Hooray for the lesser asshole!
Re: (Score:2)
The issue is fair use, and it'll result ultimately in required licensing of content to be used in training a LLM model. The basic reason why is that it'll be impossible to gauge what is fair use and what isn't within the constraints of that model. The judges will have to throw in the towel and just err on the side of copyright law.
This presumes no law is passed which obviates the issue. I don't expect that; too many actors on the copyright side who aren't going to want that.
Re: (Score:3)
The issue is fair use, and it'll result ultimately in required licensing of content to be used in training a LLM model.
Fair use only applies to production or performance of works and derivatives. It doesn't impose any limits on how copyrighted material can be used beyond that. Neither can courts create new requirements out of thin air.
The basic reason why is that it'll be impossible to gauge what is fair use and what isn't within the constraints of that model.
AI models are clearly transformative.
The judges will have to throw in the towel and just err on the side of copyright law.
There is no such thing as we can't prove you didn't so we'll assume you did in the US legal system.
Re: (Score:2)
I suppose we'll find out when the first ruling happens.
Re: (Score:3)
I'm stunned that all of the people that use Google's apps, use their mail, surf Google streams, aren't up in arms that 100% of their life content there is inside their training data.
The provenance of training data is protected conceptually, and not subject to fair use unless you consider AI derivative or a performance. In the proven cases-- it's totally NOT THAT. It's verbatim regurgitation.
Every "content producer" alive whose work could be sucked into the vacuum of a training model has been similarly rippe
Re: Fuck the NY Times (Score:2)
Re: (Score:2)
You can convert to ASCII, a token, an encrypted seed or hash, and still bring it back to its original meaning, intent, and punctuation.
Theft is theft is theft is theft.
Permission is permission is permission is permission.
It's someone else's. Their ownership wasn't protected. It is not derivative.
Re: Fuck the NY Times (Score:2)
Re: (Score:2)
Re: (Score:2)
Do you realize how silly this is? You could COPY the articles with less effort than using a LLM and the starting phrase as a trigger to induce regurgitation. It costs less to copy, it's faster, doesn't make errors, and you can probably find the articles online somewhere. But the regurgitation method only works a few times out of many, most of the times the LLM will hallucinate
Re: (Score:2)
Re: (Score:2)
What you are suggesting is something along the lines of the clean room design used to copy the IBM PC BIOS back in the 1980s.
I don't think that works well with literature. And reference material essentially can't be copyrighted. I don't see how this ever ends up with a good LLM. But yes, they can skirt copyright that way, I suppose. With great human effort...
Re: (Score:2)
DJs and rappers don't have to pay royalties for songs they were inspired by.
Re: (Score:3)
No, but if they sample the original song, they do. This issue with LLMs is going to turn out rather like that.
Re: (Score:2)
Uhm, no. Sampling is "copying and duplicating", which is not at all how LLMs work.
Re: (Score:2)
The problem is proving that. If you can get significant portions of original text out of it...it's a recording mechanism. This is going to be _really_ hard to get any other kind of result out of the courts.
Re: (Score:3)
But, they happen to be correct here. OpenAI's business has derived much value from NY Times content, among the output of many, many others.
So what? Is deriving value a crime?
If djs and rappers have to pay royalties for the songs and bits of songs they sample, then why wouldn't OpenAI have to do the same thing?
Facts and knowledge are not subject to copyright.
Re: (Score:2)
So what? Is deriving value a crime?
No, but not acknowledging the creator with money is. It's considered theft.
Facts and knowledge are not subject to copyright.
Those are not the only things that chatgpt serves.
Re: (Score:2)
No, but not acknowledging the creator with money is. It's considered theft.
Copyright law does not require acknowledgement. It simply imposes constraints on who can perform or (re)produce a (derivative) work.
If you spent millions of dollars to surface new facts or labor to compile a book of phone numbers copyright law does nothing to prevent others from benefiting from your labor. I can use those facts and data any way I want without acknowledging or paying you anything.
Ah yes, the many, many others (Score:2)
Furthermore it is impossible to pay everyone who contributed, just processing that many payments would cost millions in and of itself.
This is just a cash grab by NYT
The meme: https://www.genolve.com/design... [genolve.com]
Re: (Score:2)
Power grab. They want to make copyright more powerful. For example now that I wrote this phrase, it is automatically copyrighted to me. But if someone were to say the same thing in other words, would that constitute an infringement? Not under current law. But they want it to become so. They want to expand protection from "exact phrase" to any phrase(!) that conveys the same concepts.
If they get their wish and have copyright protection on ideas, not expression, then
Re: (Score:2)
And that is exactly why it doesn't matter as much. A few million tokens in a sea of trillions of tokens don't carry so much impact. Yes, the NYT content is beneficial to LLMs, but it's not that large to really matter. Because combining one million pieces taken from one million works of art in a mosaic doesn't really infringe on any of them.
Unrealistic prompts: bullshit (Score:3)
If a user can enter it then it is a valid and realistic prompt.
Just because you didn't think to block or filter it doesn't mean someone won't eventually figure it out, share it on the net and make it a very realistic prompt.
Re: (Score:2)
Re: (Score:2)
Because articles are partially behind paywalls today so with the opening paragraph and a broken AI I can get the whole thing.
And either way it demonstrates the lie that they don't store and have the ability to retrieve large sections of text in violation of copyright.
These broken AI can also retrieve PII. I shouldn't have to explain why that's bad.
Strawman (Score:2)
> It also compared the media organization's lawsuit to Hollywood studios' efforts to "stop a groundbreaking new technology:" The VCR. Instead of destroying
> Hollywood, Microsoft explained, the VCR helped the entertainment industry flourish by opening up revenue streams.
Re: (Score:2)
You're wrong.
How much Hollywood benefited from VCRs isn't particularly relevant, but the fact that the lawsuits against it were based on the same claims is. The claim was the the VCR can be used to commit infringement, and therefore is illegal. The defense pointed out that being able to use a tool to do illegal things does not make the tool illegal (because if it did, all tools would be illegal). The defense also sustained the claim that illegal uses was not the primary use of VCRs, and those who used one i
Re: (Score:2)
Re: (Score:2)
The real legal question is whether the output is derivative of the training material, which would require permission from the copyright holders, or transformative, which would not.
It's new technology, and the law doesn't address it. Even when it gets to the Supreme Court (and it will), it won't be settled, because this is statutory law, not constitutional, and Congress can then decide whether or not to change the law.
You will note that the NYT lawsuit doesn't really address that issue, only that if you go o
Re: (Score:2)
Transformative - if it takes so much data to train the model, it is not mere derivation. The distinction between transformative and derivative is crucial here. One is fair use, the other is infringement. The gradients from trillions of tokens have been averaged up to make the trained model, they all compose and interact, unlike JPEG or MPEG where slices of the input are encoded separately, in a LLM they are all s
Re: (Score:2)
Re: (Score:2)
I don't know about you (Score:2)
but I don't spend much time looking at historical news stories from any newspaper. News is supposed to be current-events, and the "newspaper" should be constantly generating related content at that pace. If OpenAI or anyone else uses it after the fact, it's by definition "old news," otherwise knows as "history." I'm fine with AI compiling historical information. I don't think it is particularly trustworthy for doing that, but it's not harming "news."
Um, what about the tech bros (Score:1)
But, when non-tech bros try to argue that AI is destroying their little slice of the world, they are doom-mongering?
That is definitely one of the key characteristics of people who believe themselves to be the ruling class: they are never wrong and the peasants are never right. It's a form of gas-lighting used to try to keep the peasants in-line.