OpenAI Accidentally Deleted Potential Evidence in New York Times Copyright Lawsuit (techcrunch.com) 45
An anonymous reader shares a report: Lawyers for The New York Times and Daily News, which are suing OpenAI for allegedly scraping their works to train its AI models without permission, say OpenAI engineers accidentally deleted data potentially relevant to the case. Earlier this fall, OpenAI agreed to provide two virtual machines so that counsel for The Times and Daily News could perform searches for their copyrighted content in its AI training sets.
In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. OpenAI tried to recover the data -- and was mostly successful. However, because the folder structure and file names were "irretrievably" lost, the recovered data "cannot be used to determine where the news plaintiffs' copied articles were used to build [OpenAI's] models," per the letter. "News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time," counsel for The Times and Daily News wrote.
In a letter, attorneys for the publishers say that they and experts they hired have spent over 150 hours since November 1 searching OpenAI's training data. But on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines, according to the aforementioned letter, which was filed in the U.S. District Court for the Southern District of New York late Wednesday. OpenAI tried to recover the data -- and was mostly successful. However, because the folder structure and file names were "irretrievably" lost, the recovered data "cannot be used to determine where the news plaintiffs' copied articles were used to build [OpenAI's] models," per the letter. "News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time," counsel for The Times and Daily News wrote.
You misspelled.. (Score:4, Insightful)
on purpose.
"Accidentally" my aunt fanny.
Re: (Score:3)
Maybe. Considering the problems I've read about where there was NO benefit, I'm willing to believe that it was unintentional. But it's still OpenAI's responsibility, and they need to pay all relevant expenses, including any legal expenses (extra lawyer hours), etc., more expenses for additional court time, etc., etc. And there should be notice by the court that it MAY have been intentional.
Re:You misspelled.. (Score:4, Insightful)
Maybe. Considering the problems I've read about where there was NO benefit, I'm willing to believe that it was unintentional. But it's still OpenAI's responsibility, and they need to pay all relevant expenses, including any legal expenses (extra lawyer hours), etc., more expenses for additional court time, etc., etc. And there should be notice by the court that it MAY have been intentional.
Intentional should barely factor into anything. If the deleted materials meet the definition of reasonable suspicion then it should be the full crackdown of the law with felony charges. People should be shitting themselves continuously until they have backed up and preserved it seven ways to Sunday. None of this oopsies crap should ever fly.
Re: You misspelled.. (Score:2)
Reasonable suspicion is an absurd standard that only applies to officer safety for detained subjects. So long as the lawyers are able to recreate the necessary analysis this will just be a billing problem. It gets messy if openai succeeds on defending their position because they won't likely be on the hook for opposing legal fees except for these extra ones caused by their negligence(?)
Re: (Score:2)
Yes, but they *should* be on the hook for these expenses before the trial even starts.
Re: (Score:2)
Yes, but they *should* be on the hook for these expenses before the trial even starts.
Precisely. By inaction or malicious action a crime has been committed beyond reasonable doubt.
Re: (Score:2)
Reasonable suspicion is an absurd standard that only applies to officer safety for detained subjects. So long as the lawyers are able to recreate the necessary analysis this will just be a billing problem. It gets messy if openai succeeds on defending their position because they won't likely be on the hook for opposing legal fees except for these extra ones caused by their negligence(?)
So a cop pulls me over because I’m driving erratically. After having reasonable suspicion to search the vehicle, as soon as it starts I push a button and disappear lots of items from the vehicle. That is guilt, tautologically and would be a felony if that’s how physics worked. Now I have that same device and after being advised of the search I know it has a touchy delete button and it goes off accidentally. That’s also an action that should be a felony because I knew the risk and faile
Re: (Score:2)
When the penalties for "accidentally" deleting the data will be FAR, FAR less than they would be for those from all of the IP they used without permission, "accidentally" becomes rather suspiciously convenient.
Re: (Score:2)
"Accidentally" (Score:3)
"My dog ate my homework"
Yeah, we totally believed it. /s
Did OpenAI hire teenagers to work as "engineers"? Have they ever heard of taking backups? Do they have no disaster recovery plan? Oh wait, is this their disaster recovery plan against the disaster of being sued?
Re: (Score:3)
> the disaster of being sued?
You got it, bud.
The LLM companies appear to be taking the Uber strategy - burn VC money doing something wildly illegal but wildly popular to force a legal reform.
Can I root against both these companies somehow?
Re: (Score:2)
Re: (Score:2)
Of course they do. ChatGPT came up with it for them.
Re: (Score:3)
Actually in this case it's more "Teacher's dog ate your homework" and now you need to do it again.
Crime (Score:2)
That's a great way to turn a civil issue into a criminal one. Sounds like it can be resolved monetarily but they better watch those fat fingers next time.
Re: (Score:1)
Re: (Score:2)
Unfortunately, the 'whoopsie, I lost the incriminating evidence' cases are very hard to prove (which is why so many of these accidents happen...)
I would normally agree with you, but in this case (yes, pun intended) what was deleted wasn't potentially incriminating evidence, it was the results of the plaintiff's search for incriminating evidence on their system.
Re: (Score:2)
I think the dude in NY who said "I changed the password to keep the evidence safe, but then I forgot it" had the best excuse. The burden of proof is on the government.
Re: (Score:2)
The government? It is a civil case.
Re:Crime (Score:4, Funny)
Wake me when a suit from a company does jail time.
That would add a whole new - and very welcome - connotation to the word "lawsuit".
Bad For OpenAI (Score:5, Interesting)
In most cases, failure to preserve evidence results in an adverse inference against the failing party. If that happens here, the court could instruct the jury to assume the allegations against OpenAI are true. That would end OpenAI.
I would think that people who destroy evidence have concluded that the deletion of evidence would result in a penalty less severe than if the evidence had been preserved. I suspect there are some really, really devastating proofs in that deleted data, so much so that OpenAI concluded that facing the consequences of deleting it is better than the consequences of it being made public.
Re:Bad For OpenAI (Score:5, Informative)
IANAL but I have been involved in enough discovery processes, given testimony, and seen the outcome of enough cases that I can say once you have been ordered to preserve someone or facilitate discovery, if someone tells you to instead destroy evidence that is bad advice.
This will hurt their cause in court, it will hurt their cause a lot of the Judge comes to believe it was wilful.
Re: (Score:2)
Except literally nobody wants there to be an assumption. Per the article, "The plaintiffs’ counsel makes clear that they have no reason to believe the deletion was intentional."
Usually the goal is to win the case. Unless you want to set a precedent. You want rock solid proof, not any sort of default assumption. Because you then use this to go after everyone else doing the same thing.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
If it was discoverable, then failure to protect that information is as much as admitting the plaintiff's claims. This could lead to such wonderful things as summary judgment, which should happen. Generative AI companies need to understand this and while the amount of data may be in the petabyte range, it still has to be preserved.
Re: (Score:2)
On the other hand, as a user of generative AI you get all the p
Re: (Score:2)
Never attribute to malice...
This doesn't really pass a smell test. Not just the fact that they only deleted data from one virtual machine, but they didn't delete source data just the work completed, meaning that work can with some effort be redone, and they also attempted to recover the data.
If this was a coverup it was truly an incompetent one.
Re: (Score:2)
I think everyone aside from OpenAI would love to be surprised and see that a massive $billion firm is held to the same standard, and punished the same way that normal people would be punished if we destroyed material evidence in a case.
Re: (Score:2)
In most cases, failure to preserve evidence results in an adverse inference against the failing party. If that happens here, the court could instruct the jury to assume the allegations against OpenAI are true. That would end OpenAI.
I would think that people who destroy evidence have concluded that the deletion of evidence would result in a penalty less severe than if the evidence had been preserved. I suspect there are some really, really devastating proofs in that deleted data, so much so that OpenAI concluded that facing the consequences of deleting it is better than the consequences of it being made public.
Read the summary more closely.
OpenAI gave the NYTimes a couple VMs so its experts could search through OpenAI's training data.
OpenAI accidentally deleted some of the analysis that the experts generated, but the original training data is still there.
The only consequence to this is the experts need to spend some time regenerating that analysis. As "failure to preserve evidence goes" this is largely the equivalent of accidentally knocking over a stack of papers on someone's desk.
ChatGPT Did it! (Score:2)
Re: (Score:2)
Oopsie (Score:2)
Quote (Score:2)
ha ha! I'd say that too! (Score:2)
happens all the time, especially when it could be damaging to us.
Sloppy at best (Score:2)
I'd honestly be curious to know what OpenAI's IT operations look like. Charitably presuming that it's not just outright de
All those big brains (Score:2)
And no one so much as implemented, much less followed, 3 2 1 backup schemes?
Of course it was :| (Score:2)
Until the penalties for destroying evidence exceed the penalties for the crimes they are accused of, this will always be a thing.
Companies simply weigh which is the lesser of two evils and go with that.
That said, if OpenAI is so incompetent with data they've been ordered to retain, imagine how incompetent they will be with any
of the data they will collect and store on you.
Re: (Score:2)
Hashtag (Score:2)
#OOOPS
Hanlon's Razor (Score:2)
On the other hand, all real-world evidence coming from OpenAI strongly indicates that no one should give them the benefit of the doubt.
Didn't delete the evidence (Score:1)
Didn't read the article, but from the summary it seems OpenAI only deleted some of the analysis results of the evidence. The evidence (OpenAI's training data potentially containing NYT IP) is still intact; the foul up just means the search for NYT IP in that evidence has to be redone.
meh, hand of God made them do it (Score:2)
Unbelievable, unretrievable and not even the devil could get that lucky.
OpenAI version of magical realism for an excuse