OpenAI Says New York Times 'Hacked' ChatGPT To Build Copyright Lawsuit (reuters.com) 32
OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other AI systems to generate misleading evidence for the case. From a report: OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."
"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products." OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.
"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products." OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.
words as input (Score:4, Interesting)
ChatGPT takes words as input. Providing words as input is normal use and not hacking. If there are words that are off limits then ChatGPT should ignore those words, and maybe even call them out. ChatGPT failed to do the right thing and it is not the fault of the user. Let me get my hacker words....
Re: (Score:3)
OpenAI's response is equivalent to "you are holding it wrong".
Re: (Score:3)
That's why they didn't say hacking, they said "hacking".
Re: (Score:2)
That's why they didn't say hacking, they said "hacking".
TFA says "hacked" the filing says hack without the quotes.
Re: (Score:3)
Unfortunately, "hacking" has lost any real meaning. Nowadays "hacking" seems to be used to describe any action that gets you a desired result - such as when some blogger talks about a diet "hack" that refers to something basic like cutting carbs or not eating meat.
Re: (Score:2)
Nowadays "hacking" seems to be used to describe any action that gets you a desired result
So it more or less returned to the original use of the word [catb.org]?
Re:words as input (Score:5, Insightful)
Re: (Score:2)
Their argument is that the Times themselves trained ChatGPT with their own Times data.
Re:words as input (Score:5, Insightful)
In either case, it's pretty clear ChatGPT had NYTimes data during training. Sure, maybe it's not supposed to spit out articles verbatim (and to make it do that you "force" it to), or the NYTimes is disingenuously using multiple prompts and concatenating the answers to make it look as if it does. It still has the data, and that was not given to them by the NYTimes, though OpenAI also argues "the regurgitated text represents only a fraction of the articles, see, e.g., Compl.#104 (105 words from 16,000+ word article), all of which the public can already access for free on third-party websites."
Re: (Score:2)
So is it legal to read a news paper and learn from it? Copyright is the presentation, not the content.
Whole new area of law to keep lawyers happy.
Re: (Score:2)
OpenAI has done something close to the latter, or at least that's the NYTimes argument. If that's the case, I think that's obviously unacceptable. However,
Re: (Score:2)
Bear in mind that these AIs do not store content in any meaningful way. They are not like Google indexing the web. Nor is their any fact database. Look inside there will not be anything like
cause-of(covid-19, lab-leak)
Instead there is just a massive grid of meaningless numbers, weights to a huge neural net that somehow produces amazingly good results. Nobody really knows what those numbers represent, but they are not the words in the news article. Frightening really.
Re: (Score:3)
I don't think that's quite fair. A common definition of hacking is getting a system to do something it wasn't designed to do. If NYTimes spent a bunch of effort getting ChatGPT to regurgitate content when it wouldn't normally I think that qualifies as "hacking" (though people usually call it jail breaking).
Otherwise, it proves it was trained against NYTimes content but not that OpenAI did so deliberately or from the NYTimes website. They easily could have hoovered it up from other websites that had original
Re:words as input (Score:5, Informative)
From OpenAI's filing https://fingfx.thomsonreuters.com/gfx/legaldocs/byvrkxbmgpe/OPENAI%20MICROSOFT%20NEW%20YORK%20TIMES%20mtd.pdf [thomsonreuters.com]:
They include more detail:
Re: (Score:2)
> virtually all of which already appear on multiple public websites
But not 100%. The surprise really is that ChatGPT even has the capability to regurgitate its training data, even small bits of it. Bottom line, ChatGPT had the data and presented it.
How would the RIAA feel if I told them that I deleted 'virtually all' of the files I downloaded from Napster?
Re:words as input (Score:4, Insightful)
The surprise really is that ChatGPT even has the capability to regurgitate its training data, even small bits of it.
It would be surprising if it was able to spontaneously reproduce verbatim text from something that was only included in the training data only once, but it's hardly surprising at all to see it spit out verbatim text used a large number of times. It's also important to remember that language is highly redundant. Models like this wouldn't work otherwise. Most text contains very little unique information.
More interesting, and relevant to this story, is that with careful prompting (and a bit of luck) you can get these things to reproduce sizable portions of text that was never used to train the model.
Re: words as input (Score:2)
105 words from a 16,000 word article seems like it easily falls within fair use. Unless you're Marvin Gaye's estate or something.
Re: (Score:2)
ChatGPT takes words as input. Providing words as input is normal use and not hacking. If there are words that are off limits then ChatGPT should ignore those words, and maybe even call them out. ChatGPT failed to do the right thing and it is not the fault of the user.
Everyone here knows the story of Bobby Tables [xkcd.com], and how we should sanitize our inputs to prevent malicious manipulation.
That doesn't mean that your actions are OK.
Accidently causing a mis-result is one thing. Researching vulnerabilities and reporting them is another. Exploiting vulnerabilities is a third thing.
This is the same as trying to convince the secretary that you are the CEO and it is very urgent that they go purchase a bunch of gift cards and send you the numbers... It is malicious action on your
Re: (Score:2)
So, ChatGPT isn't even as intelligent as the CEO's secretary?
We've started collecting illegal numbers, https://www.extremetech.com/de... [extremetech.com].
Have you started a list of illegal words or phrases? We should add "give me a quote" and "the next sentence, please". Which are both clearly impersonation and highly illegal.
Re: (Score:2)
sophistry - a superficially plausible, but generally fallacious method of reasoning. a false argument.
Re: (Score:2)
Bobby Tables made use of special characters that were not intended to be used as inputs. Which special characters did NYT use to circumvent ChatGPT? Did NYT use any characters that are not specifically supported by ChatGPT?
You seem to be missing the point that the inputs were all valid. OpenAI can claim terms of use violations, those are civil violations and not criminal hacking. NYT did not engage in fraud and is not criminally liable for any supposed contract violation.
Re: (Score:2)
What do special characters have to do with it? nothing. Fraud? Criminal Liability for contract violations? WTF are you going on about? That has nothing to do with the situation at hand.
The Times worked for months to craft specific queries that would produce specific results -results which were unintended by OpenAI. That is social engineering if applied to a person, or hacking in computer terms.
Attempting to present the engineered results of a hacking attempt as evidence of malfeasance is disingenuous a
Re: (Score:2)
Am I hacking you right now? With my words? Your Bobby Tables example is wrong.
There was no hacking, just use.
Injection attack (Score:2)
If you really think you cannot by definition hack by supplying input, then there are by definition no injection attacks. Bobby Tables sends you greetings. While chatting with a chatbot is intended use, we might come up with a good name for the attack, if only to better understand what is happening. As chatbots are usually instructed to serve a certain purpose, I suggest we call it an "instruction attack", even if this is a no-brainer kind of attack.
This does not mean that the New York Times did anything ill
Re: (Score:2)
Did you read the thread here? Someone already mentioned Bobby Tables. Bobby Tables used the input box in an unintended way by using special non-word inputs.
NYT just used dictionary words - words supported by ChatGPT.
You argument is that I can hack Google (or anyone with a text input) from the search bar with otherwise normal search terms. Intended use of a service is not hacking in any sense. ChatGPT maybe has a bug, but again that is not hacking.
Rigorous Standards? (Score:1)
Sounds like a neat defense (Score:2)
It is against my terms of service to ask me to murder
Ok, murder someone.
I murdered someone, but that's ok, because my terms of service said not to ask me to do that, so I'm not responsible.
Re: (Score:2)
That is more or less what OpenAI is saying. They're aiming the gun for target practice, and NYT pulled the trigger while someone was down range. OpenAI's defense is that a bug stopped them from preventing NYT from pulling the tri
Who "owns" what? (Score:2)
Totally unfair (Score:3)
"Your honor, the defendant's son showed us stolen bicycle in the defendants garage."
"What? My son wasn't supposed to show you what was in my garage!"
Re: (Score:1)