Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI The Courts Technology

OpenAI Says New York Times 'Hacked' ChatGPT To Build Copyright Lawsuit (reuters.com) 32

OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other AI systems to generate misleading evidence for the case. From a report: OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."

"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products." OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.

This discussion has been archived. No new comments can be posted.

OpenAI Says New York Times 'Hacked' ChatGPT To Build Copyright Lawsuit

Comments Filter:
  • words as input (Score:4, Interesting)

    by awwshit ( 6214476 ) on Tuesday February 27, 2024 @01:49PM (#64273104)

    ChatGPT takes words as input. Providing words as input is normal use and not hacking. If there are words that are off limits then ChatGPT should ignore those words, and maybe even call them out. ChatGPT failed to do the right thing and it is not the fault of the user. Let me get my hacker words....

    • OpenAI's response is equivalent to "you are holding it wrong".

    • That's why they didn't say hacking, they said "hacking".

    • Unfortunately, "hacking" has lost any real meaning. Nowadays "hacking" seems to be used to describe any action that gets you a desired result - such as when some blogger talks about a diet "hack" that refers to something basic like cutting carbs or not eating meat.

    • Re:words as input (Score:5, Insightful)

      by AnOnyxMouseCoward ( 3693517 ) on Tuesday February 27, 2024 @02:05PM (#64273160)
      Translation of OpenAI's argument: "Yes ok we did use your data to train our model... but it wasn't supposed to be detectable, and the model wasn't supposed to spit out the training data verbatim. The fact you accessed it means you input some prompt that we didn't design against, and that unveiling of our sneakiness is tantamount to hacking. You're a hacker, you're a bad person, you're wrong, QED."
      • by ls671 ( 1122017 )

        Their argument is that the Times themselves trained ChatGPT with their own Times data.

        • Re:words as input (Score:5, Insightful)

          by AnOnyxMouseCoward ( 3693517 ) on Tuesday February 27, 2024 @03:55PM (#64273632)
          No, not at all. If you read the actual filing [thomsonreuters.com], it says the Times made the model reproduce passages of articles ("regurgitation"), but either 1) they could have done so by asking ChatGPT to "act like a New York Times reporter and reproduce verbatim text from news articles" or 2) asking questions about specific Times articles and requesting quotes, which ChatGPT gave, but then "reordered those outputs (and used ellipses to obscure their original location) to create the false impression that ChatGPT regurgitated sequential and uninterrupted snippets of the articles".

          In either case, it's pretty clear ChatGPT had NYTimes data during training. Sure, maybe it's not supposed to spit out articles verbatim (and to make it do that you "force" it to), or the NYTimes is disingenuously using multiple prompts and concatenating the answers to make it look as if it does. It still has the data, and that was not given to them by the NYTimes, though OpenAI also argues "the regurgitated text represents only a fraction of the articles, see, e.g., Compl.#104 (105 words from 16,000+ word article), all of which the public can already access for free on third-party websites."
          • So is it legal to read a news paper and learn from it? Copyright is the presentation, not the content.

            Whole new area of law to keep lawyers happy.

            • That's fair, I do think it's a new area of law and the rules are not well defined. It's legal for a human to read newspapers and other sources, and slowly build their writing expertise. It would be illegal for a human to read newspapers, memorize the content of an article, and then manually transcribe it into their own newspaper and sell that newspaper.

              OpenAI has done something close to the latter, or at least that's the NYTimes argument. If that's the case, I think that's obviously unacceptable. However,
              • Bear in mind that these AIs do not store content in any meaningful way. They are not like Google indexing the web. Nor is their any fact database. Look inside there will not be anything like

                cause-of(covid-19, lab-leak)

                Instead there is just a massive grid of meaningless numbers, weights to a huge neural net that somehow produces amazingly good results. Nobody really knows what those numbers represent, but they are not the words in the news article. Frightening really.

      • I don't think that's quite fair. A common definition of hacking is getting a system to do something it wasn't designed to do. If NYTimes spent a bunch of effort getting ChatGPT to regurgitate content when it wouldn't normally I think that qualifies as "hacking" (though people usually call it jail breaking).

        Otherwise, it proves it was trained against NYTimes content but not that OpenAI did so deliberately or from the NYTimes website. They easily could have hoovered it up from other websites that had original

    • Re:words as input (Score:5, Informative)

      by Col. Klink (retired) ( 11632 ) on Tuesday February 27, 2024 @03:09PM (#64273420)

      From OpenAI's filing https://fingfx.thomsonreuters.com/gfx/legaldocs/byvrkxbmgpe/OPENAI%20MICROSOFT%20NEW%20YORK%20TIMES%20mtd.pdf [thomsonreuters.com]:

      It took them tens of thousands of attempts to generate the highly anomalous results that make up Exhibit J to the Complaint. They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI's terms of use. And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites.

      They include more detail:

      The Complaint includes two examples of ChatGPT allegedly regurgitating training data consisting of Times articles. In both, the Times asked ChatGPT questions about popular Times articles, including by requesting quotes. See, e.g., id. P 106 (requesting "opening paragraphs," then "the next sentence," then "the next sentence," etc.). Each time, ChatGPT provided scattered and out-of-order quotes from the articles in question. In its Complaint, the Times reordered those outputs (and used ellipses to obscure their original location) to create the false impression that ChatGPT regurgitated sequential and uninterrupted snippets of the articles. In any case, the regurgitated text represents only a fraction of the articles, see, e.g., Compl. P 104 (105 words from 16,000+ word article), all of which the public can already access for free on third-party websites.

      • > virtually all of which already appear on multiple public websites

        But not 100%. The surprise really is that ChatGPT even has the capability to regurgitate its training data, even small bits of it. Bottom line, ChatGPT had the data and presented it.

        How would the RIAA feel if I told them that I deleted 'virtually all' of the files I downloaded from Napster?

        • Re:words as input (Score:4, Insightful)

          by narcc ( 412956 ) on Tuesday February 27, 2024 @04:48PM (#64273816) Journal

          The surprise really is that ChatGPT even has the capability to regurgitate its training data, even small bits of it.

          It would be surprising if it was able to spontaneously reproduce verbatim text from something that was only included in the training data only once, but it's hardly surprising at all to see it spit out verbatim text used a large number of times. It's also important to remember that language is highly redundant. Models like this wouldn't work otherwise. Most text contains very little unique information.

          More interesting, and relevant to this story, is that with careful prompting (and a bit of luck) you can get these things to reproduce sizable portions of text that was never used to train the model.

      • 105 words from a 16,000 word article seems like it easily falls within fair use. Unless you're Marvin Gaye's estate or something.

    • ChatGPT takes words as input. Providing words as input is normal use and not hacking. If there are words that are off limits then ChatGPT should ignore those words, and maybe even call them out. ChatGPT failed to do the right thing and it is not the fault of the user.

      Everyone here knows the story of Bobby Tables [xkcd.com], and how we should sanitize our inputs to prevent malicious manipulation.

      That doesn't mean that your actions are OK.

      Accidently causing a mis-result is one thing. Researching vulnerabilities and reporting them is another. Exploiting vulnerabilities is a third thing.

      This is the same as trying to convince the secretary that you are the CEO and it is very urgent that they go purchase a bunch of gift cards and send you the numbers... It is malicious action on your

      • So, ChatGPT isn't even as intelligent as the CEO's secretary?

        We've started collecting illegal numbers, https://www.extremetech.com/de... [extremetech.com].

        Have you started a list of illegal words or phrases? We should add "give me a quote" and "the next sentence, please". Which are both clearly impersonation and highly illegal.

        • sophistry - a superficially plausible, but generally fallacious method of reasoning. a false argument.

          • Bobby Tables made use of special characters that were not intended to be used as inputs. Which special characters did NYT use to circumvent ChatGPT? Did NYT use any characters that are not specifically supported by ChatGPT?

            You seem to be missing the point that the inputs were all valid. OpenAI can claim terms of use violations, those are civil violations and not criminal hacking. NYT did not engage in fraud and is not criminally liable for any supposed contract violation.

            • What do special characters have to do with it? nothing. Fraud? Criminal Liability for contract violations? WTF are you going on about? That has nothing to do with the situation at hand.

              The Times worked for months to craft specific queries that would produce specific results -results which were unintended by OpenAI. That is social engineering if applied to a person, or hacking in computer terms.

              Attempting to present the engineered results of a hacking attempt as evidence of malfeasance is disingenuous a

              • Am I hacking you right now? With my words? Your Bobby Tables example is wrong.

                There was no hacking, just use.

    • If you really think you cannot by definition hack by supplying input, then there are by definition no injection attacks. Bobby Tables sends you greetings. While chatting with a chatbot is intended use, we might come up with a good name for the attack, if only to better understand what is happening. As chatbots are usually instructed to serve a certain purpose, I suggest we call it an "instruction attack", even if this is a no-brainer kind of attack.

      This does not mean that the New York Times did anything ill

      • Did you read the thread here? Someone already mentioned Bobby Tables. Bobby Tables used the input box in an unintended way by using special non-word inputs.
        NYT just used dictionary words - words supported by ChatGPT.

        You argument is that I can hack Google (or anyone with a text input) from the search bar with otherwise normal search terms. Intended use of a service is not hacking in any sense. ChatGPT maybe has a bug, but again that is not hacking.

  • The New York Times rigorous standards are long since abandoned in the internet age. The standards are the same as anything else on the internet. Does it attract eyeballs for advertisers. Does it serve our sources that help feed us stories that attract eyeballs. Does it please our target audience and help affirm their world view. The New York Times has a reputation for "rigorous standards" because it pleases their audience to think they do. So they go to great lengths to appear to have "rigorous standards" t
  • It is against my terms of service to ask me to murder

    Ok, murder someone.

    I murdered someone, but that's ok, because my terms of service said not to ask me to do that, so I'm not responsible.

    • by Xenx ( 2211586 )
      That isn't an accurate analogy. This would be closer to them asking you to aim a loaded gun, in a general direction, and then they pull the trigger. Your involvement up until the gun is fired could be be legal, it would depend on the specifics, but after that point is at question.

      That is more or less what OpenAI is saying. They're aiming the gun for target practice, and NYT pulled the trigger while someone was down range. OpenAI's defense is that a bug stopped them from preventing NYT from pulling the tri
  • This is a basic argument over who owns information and images. We allow the media to use our information about us for their own purposes and then watch as they claim ownership of that information when someone else tries to use it for some other purpose of which they don't approve. I remember stories of people taking photos and being accused of stealing the soul of the person. This was treated as silly. But it may have been very close to the truth. What if sharing an image of a person, naked or not, was vio
  • by medusa-v2 ( 3669719 ) on Tuesday February 27, 2024 @04:14PM (#64273708)

    "Your honor, the defendant's son showed us stolen bicycle in the defendants garage."
    "What? My son wasn't supposed to show you what was in my garage!"

To communicate is the beginning of understanding. -- AT&T

Working...