Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
The Courts

OpenAI Accidentally Deleted Potential Evidence in New York Times Copyright Lawsuit (techcrunch.com) 45

An anonymous reader shares a report: Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case. Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets.

In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. OpenAI tried to recover the data -- and was mostly successful. However, because the folder structure and file names were "irretrievably" lost, the recovered data "cannot be used to determine where the news plaintiffs' copied articles were used to build [OpenAI's] models," per the letter. "News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time," counsel for The Times and Daily News wrote.

OpenAI Accidentally Deleted Potential Evidence in New York Times Copyright Lawsuit

Comments Filter:
  • You misspelled.. (Score:4, Insightful)

    by The Faywood Assassin ( 542375 ) <benyjr@@@yahoo...ca> on Thursday November 21, 2024 @09:07AM (#64962101) Homepage

    on purpose.

    "Accidentally" my aunt fanny.

    • by HiThere ( 15173 )

      Maybe. Considering the problems I've read about where there was NO benefit, I'm willing to believe that it was unintentional. But it's still OpenAI's responsibility, and they need to pay all relevant expenses, including any legal expenses (extra lawyer hours), etc., more expenses for additional court time, etc., etc. And there should be notice by the court that it MAY have been intentional.

      • by burtosis ( 1124179 ) on Thursday November 21, 2024 @10:33AM (#64962343)

        Maybe. Considering the problems I've read about where there was NO benefit, I'm willing to believe that it was unintentional. But it's still OpenAI's responsibility, and they need to pay all relevant expenses, including any legal expenses (extra lawyer hours), etc., more expenses for additional court time, etc., etc. And there should be notice by the court that it MAY have been intentional.

        Intentional should barely factor into anything. If the deleted materials meet the definition of reasonable suspicion then it should be the full crackdown of the law with felony charges. People should be shitting themselves continuously until they have backed up and preserved it seven ways to Sunday. None of this oopsies crap should ever fly.

        • Reasonable suspicion is an absurd standard that only applies to officer safety for detained subjects. So long as the lawyers are able to recreate the necessary analysis this will just be a billing problem. It gets messy if openai succeeds on defending their position because they won't likely be on the hook for opposing legal fees except for these extra ones caused by their negligence(?)

          • by HiThere ( 15173 )

            Yes, but they *should* be on the hook for these expenses before the trial even starts.

            • Yes, but they *should* be on the hook for these expenses before the trial even starts.

              Precisely. By inaction or malicious action a crime has been committed beyond reasonable doubt.

          • Reasonable suspicion is an absurd standard that only applies to officer safety for detained subjects. So long as the lawyers are able to recreate the necessary analysis this will just be a billing problem. It gets messy if openai succeeds on defending their position because they won't likely be on the hook for opposing legal fees except for these extra ones caused by their negligence(?)

            So a cop pulls me over because I’m driving erratically. After having reasonable suspicion to search the vehicle, as soon as it starts I push a button and disappear lots of items from the vehicle. That is guilt, tautologically and would be a felony if that’s how physics worked. Now I have that same device and after being advised of the search I know it has a touchy delete button and it goes off accidentally. That’s also an action that should be a felony because I knew the risk and faile

      • When the penalties for "accidentally" deleting the data will be FAR, FAR less than they would be for those from all of the IP they used without permission, "accidentally" becomes rather suspiciously convenient.

    • Absolutely.
  • by khchung ( 462899 ) on Thursday November 21, 2024 @09:09AM (#64962109) Journal

    "My dog ate my homework"

    Yeah, we totally believed it. /s

    Did OpenAI hire teenagers to work as "engineers"? Have they ever heard of taking backups? Do they have no disaster recovery plan? Oh wait, is this their disaster recovery plan against the disaster of being sued?

    • > the disaster of being sued?

      You got it, bud.

      The LLM companies appear to be taking the Uber strategy - burn VC money doing something wildly illegal but wildly popular to force a legal reform.

      Can I root against both these companies somehow?

      • Fake hiring without paying taxes is on a different level than doing statistical analysis of text. Nobody could prove harm from generative AI "infringement" in court so far. I mean, even if you wanted to recreate the whole training set from the model, it would not be possible. Or any long text, impossible to recreate. If someone wanted to infringe copyright, we have internet for free copying that works 1000x better than AI models, and it copies things precisely. Who would prefer to read the LLM hallucinated
    • Do they have no disaster recovery plan?

      Of course they do. ChatGPT came up with it for them.

    • Actually in this case it's more "Teacher's dog ate your homework" and now you need to do it again.

  • That's a great way to turn a civil issue into a criminal one. Sounds like it can be resolved monetarily but they better watch those fat fingers next time.

    • Unfortunately, the 'whoopsie, I lost the incriminating evidence' cases are very hard to prove (which is why so many of these accidents happen...)
      • by cob666 ( 656740 )

        Unfortunately, the 'whoopsie, I lost the incriminating evidence' cases are very hard to prove (which is why so many of these accidents happen...)

        I would normally agree with you, but in this case (yes, pun intended) what was deleted wasn't potentially incriminating evidence, it was the results of the plaintiff's search for incriminating evidence on their system.

      • I think the dude in NY who said "I changed the password to keep the evidence safe, but then I forgot it" had the best excuse. The burden of proof is on the government.

  • Bad For OpenAI (Score:5, Interesting)

    by StormReaver ( 59959 ) on Thursday November 21, 2024 @09:16AM (#64962133)

    In most cases, failure to preserve evidence results in an adverse inference against the failing party. If that happens here, the court could instruct the jury to assume the allegations against OpenAI are true. That would end OpenAI.

    I would think that people who destroy evidence have concluded that the deletion of evidence would result in a penalty less severe than if the evidence had been preserved. I suspect there are some really, really devastating proofs in that deleted data, so much so that OpenAI concluded that facing the consequences of deleting it is better than the consequences of it being made public.

    • Re:Bad For OpenAI (Score:5, Informative)

      by DarkOx ( 621550 ) on Thursday November 21, 2024 @09:35AM (#64962197) Journal

      IANAL but I have been involved in enough discovery processes, given testimony, and seen the outcome of enough cases that I can say once you have been ordered to preserve someone or facilitate discovery, if someone tells you to instead destroy evidence that is bad advice.

      This will hurt their cause in court, it will hurt their cause a lot of the Judge comes to believe it was wilful.

    • Except literally nobody wants there to be an assumption. Per the article, "The plaintiffs’ counsel makes clear that they have no reason to believe the deletion was intentional."

      Usually the goal is to win the case. Unless you want to set a precedent. You want rock solid proof, not any sort of default assumption. Because you then use this to go after everyone else doing the same thing.

    • It wouldnt end OpenAI. This is all about $$$. OpenAI scraped a ton of protected info because it would help them make $$$, and the legal owners of that info want their share of the $$$. OpenAI is essentially a huge linear algebra matrix, connected to the internet by a bit of clever programming, supported by small team of humans. Their assets are as intangible as it gets, but investors are irrational and have priced the company based on the assumption that it will eventually dominate “all the business o
      • I bet NYT will still lose. What they are after is copyright over abstract ideas. That would kill creativity. Copyright is a dead man walking, it means nothing to generative models. You can't reverse the trend, we used to be passive consumers of books, music, radio and TV. Now we are interactive, we prefer to play games, use social networks and web search. We create the content we are reading, like here on /. This process has been going on for 25 years, generative AI is just the last nail in the coffin. Auth
    • If it was discoverable, then failure to protect that information is as much as admitting the plaintiff's claims. This could lead to such wonderful things as summary judgment, which should happen. Generative AI companies need to understand this and while the amount of data may be in the petabyte range, it still has to be preserved.

      • Why are you rooting for copyright hoarders? Authors have been unable to make a living off royalties for a very long time. Books, music, art - they are all incapable of paying for a decent living. Instead we have corporations collecting as much of these copy rights as they can. And they use it for ad revenue which leads to shit content, just attention grabbers. The high quality work is now done collaboratively in open source and public domain.

        On the other hand, as a user of generative AI you get all the p
    • Never attribute to malice...

      This doesn't really pass a smell test. Not just the fact that they only deleted data from one virtual machine, but they didn't delete source data just the work completed, meaning that work can with some effort be redone, and they also attempted to recover the data.

      If this was a coverup it was truly an incompetent one.

    • I think everyone aside from OpenAI would love to be surprised and see that a massive $billion firm is held to the same standard, and punished the same way that normal people would be punished if we destroyed material evidence in a case.

    • In most cases, failure to preserve evidence results in an adverse inference against the failing party. If that happens here, the court could instruct the jury to assume the allegations against OpenAI are true. That would end OpenAI.

      I would think that people who destroy evidence have concluded that the deletion of evidence would result in a penalty less severe than if the evidence had been preserved. I suspect there are some really, really devastating proofs in that deleted data, so much so that OpenAI concluded that facing the consequences of deleting it is better than the consequences of it being made public.

      Read the summary more closely.

      OpenAI gave the NYTimes a couple VMs so its experts could search through OpenAI's training data.

      OpenAI accidentally deleted some of the analysis that the experts generated, but the original training data is still there.

      The only consequence to this is the experts need to spend some time regenerating that analysis. As "failure to preserve evidence goes" this is largely the equivalent of accidentally knocking over a stack of papers on someone's desk.

  • The AI has a mind of its own! That's Open AI's excuse.
    • Maybe it was the very same NYT articles that gave the AI this idea.... we need to investigate this for 10 years to learn.
  • Teehee
  • I'm surprised the author of this resisted the urge to surround accidentally with quotes.
  • Ooopsie. I'm so sorry that relevant evidence was destroyed.
    happens all the time, especially when it could be damaging to us. .... chuckles...
  • Even if this was actually accidental rather than 'accidental'; it's a bad look. Someone just deleted litigation-critical data? No data classification? No "let the retention policy handle the deletion instead of just cowboy purging stuff"? And then they tried to recover the data in a way that lost all file and folder names, rather than just restoring backups from your backup mechanism?

    I'd honestly be curious to know what OpenAI's IT operations look like. Charitably presuming that it's not just outright de
  • And no one so much as implemented, much less followed, 3 2 1 backup schemes?

  • Until the penalties for destroying evidence exceed the penalties for the crimes they are accused of, this will always be a thing.

    Companies simply weigh which is the lesser of two evils and go with that.

    That said, if OpenAI is so incompetent with data they've been ordered to retain, imagine how incompetent they will be with any
    of the data they will collect and store on you.

    • Exactly, this speaks more about OpenAI incompetence than anything else. They only destroyed the lawyer's work (which can be redone) so no evidence is destroyed, but how can you trust OpenAI provided all of the files used for training if they can't even perform a backup properly?
  • #OOOPS

  • Hanlon's Razor [wikipedia.org] suggests that yes, this was an accident due to incompetence.

    On the other hand, all real-world evidence coming from OpenAI strongly indicates that no one should give them the benefit of the doubt.
  • Didn't read the article, but from the summary it seems OpenAI only deleted some of the analysis results of the evidence. The evidence (OpenAI's training data potentially containing NYT IP) is still intact; the foul up just means the search for NYT IP in that evidence has to be redone.

  • Unbelievable, unretrievable and not even the devil could get that lucky.

    OpenAI version of magical realism for an excuse

Ocean: A body of water occupying about two-thirds of a world made for man -- who has no gills. -- Ambrose Bierce

Working...