Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Microsoft The Courts

Microsoft Accuses the New York Times of Doom-Mongering in OpenAI Lawsuit (engadget.com) 55

Microsoft has filed a motion seeking to dismiss key parts of a lawsuit The New York Times filed against the company and Open AI, accusing them of copyright infringement. From a report: If you'll recall, The Times sued both companies for using its published articles to train their GPT large language models (LLMs) without permission and compensation. In its filing, the company has accused The Times of pushing "doomsday futurology" by claiming that AI technologies pose a threat to independent journalism. It follows OpenAI's court filing from late February that's also seeking to dismiss some important elements on the case.

Like OpenAI before it, Microsoft accused The Times of crafting "unrealistic prompts" in an effort to "coax the GPT-based tools" to spit out responses matching its content. It also compared the media organization's lawsuit to Hollywood studios' efforts to " stop a groundbreaking new technology:" The VCR. Instead of destroying Hollywood, Microsoft explained, the VCR helped the entertainment industry flourish by opening up revenue streams. LLMs are a breakthrough in artificial intelligence, it continued, and Microsoft collaborated with OpenAI to "help bring their extraordinary power to the public" because it "firmly believes in LLMs' capacity to improve the way people live and work."

This discussion has been archived. No new comments can be posted.

Microsoft Accuses the New York Times of Doom-Mongering in OpenAI Lawsuit

Comments Filter:
  • That I'd be rooting for and agreeing with microsoft on something.
  • by mcnster ( 2043720 ) on Tuesday March 05, 2024 @12:43PM (#64291726)

    OpenAI is not "open" at all. (Neither is it "AI", but that's beside the point.)

    They refuse to even let a person play with a prompt without first given a non-VOIP phone number for tracking purposes.

    They refuse to expose the algorithms underlying the model.

    In a perfect world, Microsoft would be mauled and eaten by bears.

    --
    I stick my neck out for everybody. [With apologies to Humphrey Bogart, "Casablanca".]

  • by tchdab1 ( 164848 ) on Tuesday March 05, 2024 @12:45PM (#64291732) Homepage

    They can peck at each other until they're both raw to the benefit of most of us, and I say this in the best possible way.

    • Only if we have a good software company and a good newspaper waiting to fill in the void they leave behind.
      • by Torodung ( 31985 )

        I don't know why your response made me think this, but:

        The only thing that stops a bad guy with an AI model is a good guy with an AI model.

        In this case, the NYT appears to have used Microsoft's own AI model to prove that it's being ripped off by Microsoft. I don't care what the prompt was. Well done.

        (*golf clap*)

      • Only if we have a good software company and a good newspaper waiting to fill in the void they leave behind.

        Why would we replace shit with something good? Replace it with more shit! That would be much more profitable!

  • Will this affect their ratings in wikipedia's preferred sources list? https://en.wikipedia.org/wiki/... [wikipedia.org]
  • People learn by studying the work of others, and have been doing it for all of history
    We need less IP laws, not more
    That said, people who use AI for malicious purposes are a very serious threat, and we need strong defenses

    • Re: (Score:2, Insightful)

      by quonset ( 4839537 )

      People learn by studying the work of others

      That's right. They studied the works of others. They did regurgitate word for word or line by line what they studied. If they did, they cited their documentation.

      • by taustin ( 171655 )

        If the defense can sustain the allegation that the prompts were specifically crafted to pull up entire articles verbatim, their job is done. There's lots of case law saying that because tool can be used illegally doesn't make the tool itself illegal.

  • by BishopBerkeley ( 734647 ) on Tuesday March 05, 2024 @12:54PM (#64291752) Journal
    But, they happen to be correct here. OpenAI's business has derived much value from NY Times content, among the output of many, many others. If djs and rappers have to pay royalties for the songs and bits of songs they sample, then why wouldn't OpenAI have to do the same thing?

    It would be so nice if the effort for accountability were NOT being led by a paper whose opinion pages propagate the false conservative-liberal divide, that publishes less and less reportage of any significance and that manipulates its readers the same way that social media do. It's not easy to root for the asshole, but, it seems, one has no choice when both sides are assholes.

    Hooray for the lesser asshole!
    • by HBI ( 10338492 )

      The issue is fair use, and it'll result ultimately in required licensing of content to be used in training a LLM model. The basic reason why is that it'll be impossible to gauge what is fair use and what isn't within the constraints of that model. The judges will have to throw in the towel and just err on the side of copyright law.

      This presumes no law is passed which obviates the issue. I don't expect that; too many actors on the copyright side who aren't going to want that.

      • The issue is fair use, and it'll result ultimately in required licensing of content to be used in training a LLM model.

        Fair use only applies to production or performance of works and derivatives. It doesn't impose any limits on how copyrighted material can be used beyond that. Neither can courts create new requirements out of thin air.

        The basic reason why is that it'll be impossible to gauge what is fair use and what isn't within the constraints of that model.

        AI models are clearly transformative.

        The judges will have to throw in the towel and just err on the side of copyright law.

        There is no such thing as we can't prove you didn't so we'll assume you did in the US legal system.

        • by HBI ( 10338492 )

          I suppose we'll find out when the first ruling happens.

        • I'm stunned that all of the people that use Google's apps, use their mail, surf Google streams, aren't up in arms that 100% of their life content there is inside their training data.

          The provenance of training data is protected conceptually, and not subject to fair use unless you consider AI derivative or a performance. In the proven cases-- it's totally NOT THAT. It's verbatim regurgitation.

          Every "content producer" alive whose work could be sucked into the vacuum of a training model has been similarly rippe

          • Except, it's almost certainly not storing the text verbatim. It's storing a tokenized version, in a way that makes the original sequence of those tokens probable under certain conditions, sometimes.
            • You can convert to ASCII, a token, an encrypted seed or hash, and still bring it back to its original meaning, intent, and punctuation.

              Theft is theft is theft is theft.

              Permission is permission is permission is permission.

              It's someone else's. Their ownership wasn't protected. It is not derivative.

              • The best analogy would be lossy compression. Not compression of the document, but creation of a dictionary used to compress OTHER documents. The docs used to create the dictionary would be compressed very well and perhaps less lossy than others. But the dictionary isn't, per se, those documents. It's just created from them.
              • You can use a LLM to memorize and reproduce things, but it's not its intended purpose, it's an undesired side effect and happens exceedingly rarely. You could also say that computers *could* infringe copyright if used in certain ways, should the computer manufacturer be liable for contributory infringement? How about paint manufacturers, should they be liable if their paints are used to paint infringing works? Camera manufacturers - it's trivial to take a photo of a copyrighted work, should they have to ens
          • > Every "content producer" alive whose work could be sucked into the vacuum of a training model has been similarly ripped off.

            Do you realize how silly this is? You could COPY the articles with less effort than using a LLM and the starting phrase as a trigger to induce regurgitation. It costs less to copy, it's faster, doesn't make errors, and you can probably find the articles online somewhere. But the regurgitation method only works a few times out of many, most of the times the LLM will hallucinate
      • I think the deeper issue is that LLMs can "whitewash copyright" by paraphrasing, summarizing, or turning a source material in a series of Q&A pairs to stand in for the original copyrighted text, without reproducing its wording. So a new model trained on replacement content would carry the information over but never be able to regurgitate the originals, as they were never seen by the model during training.
        • by HBI ( 10338492 )

          What you are suggesting is something along the lines of the clean room design used to copy the IBM PC BIOS back in the 1980s.

          I don't think that works well with literature. And reference material essentially can't be copyrighted. I don't see how this ever ends up with a good LLM. But yes, they can skirt copyright that way, I suppose. With great human effort...

    • If djs and rappers have to pay royalties for the songs and bits of songs they sample, then why wouldn't OpenAI have to do the same thing?

      DJs and rappers don't have to pay royalties for songs they were inspired by.

      • by HBI ( 10338492 )

        No, but if they sample the original song, they do. This issue with LLMs is going to turn out rather like that.

        • Uhm, no. Sampling is "copying and duplicating", which is not at all how LLMs work.

          • by HBI ( 10338492 )

            The problem is proving that. If you can get significant portions of original text out of it...it's a recording mechanism. This is going to be _really_ hard to get any other kind of result out of the courts.

    • But, they happen to be correct here. OpenAI's business has derived much value from NY Times content, among the output of many, many others.

      So what? Is deriving value a crime?

      If djs and rappers have to pay royalties for the songs and bits of songs they sample, then why wouldn't OpenAI have to do the same thing?

      Facts and knowledge are not subject to copyright.

      • So what? Is deriving value a crime?

        No, but not acknowledging the creator with money is. It's considered theft.

        Facts and knowledge are not subject to copyright.

        Those are not the only things that chatgpt serves.

        • No, but not acknowledging the creator with money is. It's considered theft.

          Copyright law does not require acknowledgement. It simply imposes constraints on who can perform or (re)produce a (derivative) work.

          If you spent millions of dollars to surface new facts or labor to compile a book of phone numbers copyright law does nothing to prevent others from benefiting from your labor. I can use those facts and data any way I want without acknowledging or paying you anything.

    • You realize that the NYT is like 0.01% of the training data and that those many many others won't see a penny?
      Furthermore it is impossible to pay everyone who contributed, just processing that many payments would cost millions in and of itself.
      This is just a cash grab by NYT
      The meme: https://www.genolve.com/design... [genolve.com]
      • > This is just a cash grab by NYT

        Power grab. They want to make copyright more powerful. For example now that I wrote this phrase, it is automatically copyrighted to me. But if someone were to say the same thing in other words, would that constitute an infringement? Not under current law. But they want it to become so. They want to expand protection from "exact phrase" to any phrase(!) that conveys the same concepts.

        If they get their wish and have copyright protection on ideas, not expression, then
    • > OpenAI's business has derived much value from NY Times content, among the output of many, many others.

      And that is exactly why it doesn't matter as much. A few million tokens in a sea of trillions of tokens don't carry so much impact. Yes, the NYT content is beneficial to LLMs, but it's not that large to really matter. Because combining one million pieces taken from one million works of art in a mosaic doesn't really infringe on any of them.
  • by iAmWaySmarterThanYou ( 10095012 ) on Tuesday March 05, 2024 @02:24PM (#64291972)

    If a user can enter it then it is a valid and realistic prompt.

    Just because you didn't think to block or filter it doesn't mean someone won't eventually figure it out, share it on the net and make it a very realistic prompt.

    • But if you need to already have the target text to prompt the model to regurgitate the target text, what's the point? Without a chunk of the article this trick doesn't work. It's like having a key to a door, you can't open other doors by mistake without the right key.
      • Because articles are partially behind paywalls today so with the opening paragraph and a broken AI I can get the whole thing.

        And either way it demonstrates the lie that they don't store and have the ability to retrieve large sections of text in violation of copyright.

        These broken AI can also retrieve PII. I shouldn't have to explain why that's bad.

  • Correct me if I'm wrong but the following rebuttal of the case against OpenAI/M$ sure looks like a strawman:

    > It also compared the media organization's lawsuit to Hollywood studios' efforts to "stop a groundbreaking new technology:" The VCR. Instead of destroying
    > Hollywood, Microsoft explained, the VCR helped the entertainment industry flourish by opening up revenue streams.
    • by taustin ( 171655 )

      You're wrong.

      How much Hollywood benefited from VCRs isn't particularly relevant, but the fact that the lawsuits against it were based on the same claims is. The claim was the the VCR can be used to commit infringement, and therefore is illegal. The defense pointed out that being able to use a tool to do illegal things does not make the tool illegal (because if it did, all tools would be illegal). The defense also sustained the claim that illegal uses was not the primary use of VCRs, and those who used one i

      • I am not very familiar with this case but here, it seems, the *primary* use of an LLM is to create works derived from as much content as possible. In one sense, it takes so little from any one thing that it's essentially zero. In another, though, the entire output is one big derivative work of all of the inputs.
        • by taustin ( 171655 )

          The real legal question is whether the output is derivative of the training material, which would require permission from the copyright holders, or transformative, which would not.

          It's new technology, and the law doesn't address it. Even when it gets to the Supreme Court (and it will), it won't be settled, because this is statutory law, not constitutional, and Congress can then decide whether or not to change the law.

          You will note that the NYT lawsuit doesn't really address that issue, only that if you go o

        • > In another, though, the entire output is one big derivative work of all of the inputs.

          Transformative - if it takes so much data to train the model, it is not mere derivation. The distinction between transformative and derivative is crucial here. One is fair use, the other is infringement. The gradients from trillions of tokens have been averaged up to make the trained model, they all compose and interact, unlike JPEG or MPEG where slices of the input are encoded separately, in a LLM they are all s
    • That's what Facebook said, and it destroyed the newspapers instead.
  • Comment removed based on user account deletion
  • but I don't spend much time looking at historical news stories from any newspaper. News is supposed to be current-events, and the "newspaper" should be constantly generating related content at that pace. If OpenAI or anyone else uses it after the fact, it's by definition "old news," otherwise knows as "history." I'm fine with AI compiling historical information. I don't think it is particularly trustworthy for doing that, but it's not harming "news."

  • For the last year plus, the tech bros have been claiming that AI has hit the tipping point and that it is gonna destroy civilization.

    But, when non-tech bros try to argue that AI is destroying their little slice of the world, they are doom-mongering?

    That is definitely one of the key characteristics of people who believe themselves to be the ruling class: they are never wrong and the peasants are never right. It's a form of gas-lighting used to try to keep the peasants in-line.

There are never any bugs you haven't found yet.

Working...