

Meta Claims Torrenting Pirated Books Isn't Illegal Without Proof of Seeding (arstechnica.com) 170
An anonymous reader quotes a report from Ars Technica: Just because Meta admitted to torrenting a dataset of pirated books for AI training purposes, that doesn't necessarily mean that Meta seeded the file after downloading it, the social media company claimed in a court filing (PDF) this week. Evidence instead shows that Meta "took precautions not to 'seed' any downloaded files," Meta's filing said. Seeding refers to sharing a torrented file after the download completes, and because there's allegedly no proof of such "seeding," Meta insisted that authors cannot prove Meta shared the pirated books with anyone during the torrenting process.
[...] Meta ... is hoping to convince the court that torrenting is not in and of itself illegal, but is, rather, a "widely-used protocol to download large files." According to Meta, the decision to download the pirated books dataset from pirate libraries like LibGen and Z-Library was simply a move to access "data from a 'well-known online repository' that was publicly available via torrents." To defend its torrenting, Meta has basically scrubbed the word "pirate" from the characterization of its activity. The company alleges that authors can't claim that Meta gained unauthorized access to their data under CDAFA. Instead, all they can claim is that "Meta allegedly accessed and downloaded datasets that Plaintiffs did not create, containing the text of published books that anyone can read in a public library, from public websites Plaintiffs do not operate or own."
While Meta may claim there's no evidence of seeding, there is some testimony that might be compelling to the court. Previously, a Meta executive in charge of project management, Michael Clark, had testified (PDF) that Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur," which seems to support authors' claims that some seeding occurred. And an internal message (PDF) from Meta researcher Frank Zhang appeared to show that Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers. Once this information came to light, authors asked the court for a chance to depose Meta executives again, alleging that new facts "contradict prior deposition testimony." "Meta has been 'silent so far on claims about sharing data while 'leeching' (downloading) but told the court it plans to fight the seeding claims at summary judgement," notes Ars.
[...] Meta ... is hoping to convince the court that torrenting is not in and of itself illegal, but is, rather, a "widely-used protocol to download large files." According to Meta, the decision to download the pirated books dataset from pirate libraries like LibGen and Z-Library was simply a move to access "data from a 'well-known online repository' that was publicly available via torrents." To defend its torrenting, Meta has basically scrubbed the word "pirate" from the characterization of its activity. The company alleges that authors can't claim that Meta gained unauthorized access to their data under CDAFA. Instead, all they can claim is that "Meta allegedly accessed and downloaded datasets that Plaintiffs did not create, containing the text of published books that anyone can read in a public library, from public websites Plaintiffs do not operate or own."
While Meta may claim there's no evidence of seeding, there is some testimony that might be compelling to the court. Previously, a Meta executive in charge of project management, Michael Clark, had testified (PDF) that Meta allegedly modified torrenting settings "so that the smallest amount of seeding possible could occur," which seems to support authors' claims that some seeding occurred. And an internal message (PDF) from Meta researcher Frank Zhang appeared to show that Meta allegedly tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers. Once this information came to light, authors asked the court for a chance to depose Meta executives again, alleging that new facts "contradict prior deposition testimony." "Meta has been 'silent so far on claims about sharing data while 'leeching' (downloading) but told the court it plans to fight the seeding claims at summary judgement," notes Ars.
Show me the seed (Score:4, Funny)
We didn't fuck unless I seeded.
Re:Show me the seed (Score:4, Funny)
Re:Show me the seed (Score:4, Funny)
Depends on the definition of "is".
bs (Score:4, Insightful)
Nothing was "stolen" (Score:3, Insightful)
Nothing was stolen, everyone still has all the files they had before. This is about copyright infringement. Copyright is (speaking in digital terms) a temporary monopoly on certain manipulations of certain strings of bits. Infringement is stepping on that.
Re: (Score:2, Insightful)
Arguing with definitions is a useless way to not say anything at all.
Engage with the argument, stamping your feet and demand people use words in the way you approve contributes nothing.
Re:Nothing was "stolen" (Score:4, Informative)
Re:Nothing was "stolen" (Score:4, Insightful)
What is your point? How does that add anything constructive to the point the person was making.
Pretending you dont understand what is being said because you pretend you dont understand their meaning is a disengenous way to engage in discourse.
Re: (Score:3)
But your quoted use of "stolen" was intended only to claim that a crime was committed, not what the legal definition of that crime was. Knowing that, you chose to engage in a pointless argument over precisely the definition of the crime, as if the crime not being "theft" invalidates that a crime was committed. Worse yet, your argument is tired and useless, it is decades old. What you are doing is worse than merely "stamping your feet", you're having the same temper tantrum that countless people more clev
Re:Nothing was "stolen" (Score:4, Interesting)
The lack of precision in speech is exactly what lead to the current Zeitgeist.
Pirating is not theft. It not being theft does not absolve it of ethical misconduct, but theft it ain't.
There is a reason we have many different words for killing a human, too. Not every death by the hand of another is murder.
Re: (Score:3)
"The lack of precision in speech is exactly what lead to the current Zeitgeist."
No, it is merely a symptom.
"Pirating is not theft. It not being theft does not absolve it of ethical misconduct, but theft it ain't."
It may not be theft of the material, but it is theft of the copyright holder's right to control distribution. And "theft" is not a legal term nor is "pirating", and the question is not whether conduct is "ethical". You know, someone recently said "the lack of precision in speech is exactly what l
Re: (Score:2, Insightful)
The author's legal right to control the distribution of their works was taken from them. That's stealing.
Re: (Score:2)
LOL Went straight to it's not a crime because they deserve it. Sure, copyRIGHT is not a legal RIGHT. The intellect shows through once the copy-paste ends.
Re:bs (Score:4, Insightful)
No sense pretending the laws apply to a company as profitable and useful for propaganda as Facebook.
Re: (Score:2)
The movie industry only goes after people who seed.
Re:bs (Score:5, Insightful)
Fine them $250,000 per violation just like that scary warning before a movie.
Re:bs (Score:4, Informative)
That's about the size of the US national debt, based on the size of LibGen
Re: bs (Score:3)
Re:bs (Score:5, Informative)
Hmm, but the only conceivable defense Meta has in this action is "Fair Use" under copyright law.
The number 1 criterion under Fair Use - and, typically, the one weighted the most by judges - is precisely Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes. So, this one goes 100% to the authors.
It also strong supports bs's point, that using "content for business purpose, i.e. to make money" is one of the places where the line is drawn as to whether it is copyright infringement or not. This is, indeed, one of the lines.
Just for the curious, here are the remaining 3 factors in determining whether a use of copyrighted material can be considered "Fair Use":
#2. Nature of the copyrighted work. If it is highly creative this weights against fair use; if rote or formulaic (ie, a telephone book) it weighs in favor. Since they copied literally everything, including many novels, poetry, and other highly creative and individual works, this factors strongly favors the authors and disfavors Fair Use.
#3. Amount and substantiality of the portion used in relation to the copyrighted work as a whole. They copied the whole damn thing for every work, and an incredibly vast amount of works as well. This factor, too, strongly weights against a finding of Fair Use.
#4. Effect of the use upon the potential market for or value of the copyrighted work. Since the end result of this copying is the creation of a machine that can essentially replicate various others voices and works, this also weighs very, very strongly against a Fair Use defense.
It is hard to imagine how Meta wins this one. If they somehow do, their lawyers have definitely earned the hundreds of millions they are going to charge.
For the curious, here is the explanation of Fair Use from the U.S. Copyright Office: https://www.copyright.gov/fair... [copyright.gov]
Re: (Score:2, Insightful)
No one cares about your line, we only care about the line defined by the law.
Funny you mention that, the commercial nature of an activity is a line drawn throughout all aspects of copyright law.
Re: (Score:3)
"(1) Authors of literary and artistic works protected by this Convention shall have the exclusive right of authorizing the reproduction of these works, in any manner or form."
"Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:
(1) to reproduce the copyrighted work in copies or phonorecords;"
You are confusing the GPL with copyright, copyright is not about distribution.
Re: (Score:3)
I'm not going to argue sides with you two, but this sentence stood out:
Beware: the law and its definitions aren't very well coded, and a lot of it comes from judicial decisions. Let me ask a very simple question:
How do you determine if an act is fair use?
Just kidding. I lied! That question is actually pretty hard.
So we end up with judges asking about the "four factors" (purpose and character of use, nature of the copyrighted work,
Re: (Score:3, Insightful)
What does it change? At the end of the day the law counts and not your moral views. Not even the moral views of the judge, as they can and should only use the law to decide the case.
has the law changed? (Score:5, Informative)
I thought that "making a copy" ie downloading copyrighted works without permission from the copyright holder is illegal. To not clog up the courts, generally the end user is left alone unless they made profit from the infringement or infringed copyright on a large scale. You could argue that both is true for Meta. So unless the law has changed, Meta is clearly guilty and should be made to compensate the right holders and pay fines or have the exec in charge go to jail, just like anyone else.
Otherwise anyone could just leech movies legally, by Meta's argument.
We stopped being a nation of laws (Score:3, Interesting)
Re:We stopped being a nation of laws (Score:5, Interesting)
Russia still has laws, very strict ones even. All dictatorships do. But they don't apply to the Putin's friends, those doing his bidding, and those who've paid their oligarchal dues. Unless of course they fall out with Putin and have an accident.
And it's the same in the US now. Just because the president and his cronies are violating the law and trampling on the constitution doesn't mean that you, a mere citizen, can do the same. It's been this way for quite a while (best justice money can buy), but certainly it's going to get a lot worse now.
Re: We stopped being a nation of laws (Score:2)
Re: (Score:2)
Russia still has laws, very strict ones even. All dictatorships do. But they don't apply to the Putin's friends, those doing his bidding, and those who've paid their oligarchal dues. Unless of course they fall out with Putin and have an accident.
And it's the same in the US now. Just because the president and his cronies are violating the law and trampling on the constitution doesn't mean that you, a mere citizen, can do the same. It's been this way for quite a while (best justice money can buy), but certainly it's going to get a lot worse now.
It's been that way in the US for some time now. The only thing that's changed is that they're not even bothering to hide it any more.
Re: (Score:2)
But it was explicitly codified last year. We all witnessed the brazen criminality of Trump 45 followed by the refusal of the Biden administration or the Congress to do anything about those crimes, but now the courts have ruled that the President is above the Constitution.
Re: (Score:3)
He probably meant to say that you're no longer a nation with "rule of law", i.e. the idea that the law applies to everyone equally. As you say, all countries have laws. There's no country where the laws are truly applied equally and fairly. The US has had a problem with rich people being able to get away with things poor people can't for a long time. You've now decided that the president is above the law. You seem to have dropped the pretense of rule of law altogether. Things are only going to get wor
Re: (Score:2)
Technically, it was when SCOTUS invalidated the Constitution in July. They declared the President immune from law and therefore immune from the Constitution (which is merely part of the law). And it wasn't in November that it took effect, it was the following January. We are now in a post-constitutional period, Americans simply don't realize it yet. Instead, we are cosplaying like Trump is going to have any respect for law at all. He does not.
The laughable part is the talking heads arguing over what ha
Re: (Score:2)
Facebook doesn't have to make a copy of the book. They could feed the download stream straight to the ai tokenizer.
Re: has the law changed? (Score:3)
So if you directly stream, but do not save media it's fine? So if I set up an auditorium and have people pay me to see a movie, and I stream a movie (but do not save it) it is fine? So if I stream a movie and save a deep analysis of the movie that could be used to mostly recreate that movie a little differently without giving any credit to the studio or actors (but only watch the new version) it is fine? What if I root through the media files on your computer and use them to train a model to let me see a sl
Re: has the law changed? (Score:2)
Bro you just described a performance. Copyright law was enshrined in case law which gave a very clear definition on what violates copyright. A performance where you show the material to other people is in violation. Distribution is also a violation. Seeding is distribution, streaming is a performance. Both violate copyright. Meta was doing neither.
Chilling effects: now the people understand why they must leech and never seed anything they pirate on torrent. More chilling effects: the people can train their
Re: (Score:2)
People (sic) seem to forget that the idea of copyright law (any law in fact) is to benefit WE THE FUCKING PEOPLE.
WE give the copyright privilege to creators to ENCOURAGE CREATIVITY and thus to benefit the people.
We do not give them that privilege to fuck us over. Repeatedly. For 95 years.
Re: (Score:2)
The stream is a copy already.
Re: (Score:3)
Facebook doesn't have to make a copy of the book. They could feed the download stream straight to the ai tokenizer.
Legally, that is called "making a copy". It does not matter what you do with that copy afterwards.
Re: (Score:2, Insightful)
So people with eidetic memory can only read stuff that is in the Public Domain or perhaps some stuff with Free-like CC licenses attached?
Re: (Score:3)
Re: (Score:2)
Not sure what the law in the US is, but in Canada downloading isn't actually illegal. It's the uploading (aka sharing) that gets you in actual trouble.
That kind of sounds like what's happening in this instance. They didn't share anything.
Re: has the law changed? (Score:2)
Training an AI model on data and then sharing that model IS sharing it. If I can type "write me a version of Little Women in the writing style of Stephen King" into an AI model and get workable results, they are sharing that data.
Re: (Score:2)
Is that an exception for private downloads or for commercial ones as well? Because I somehow doubt the latter and what Meta did was commercial.
Also, LLMs have shown time and again to sometimes output their training data. Hence the "sharing" is there as well.
Re: (Score:2)
Yes, copyright restricts all reproduction except where exempted. That's why there are specific exemptions for caching, private backup, fair use etc etc.
Fair use or bankruptcy from statutory fines on registered works alone, those are the options for most of the industry.
Re: (Score:2)
If you just download a copyrighted work, the actual losses suffered by the copyright owner are pretty minimal, maximum the cost of a legal copy of the work.
If you use it for commercial purposes or distribute it to others, then the losses suffered by the copyright owner are much higher. In the UK at least, it also makes it a criminal rather than merely civil matter.
Seems like a pointless point (Score:2, Insightful)
Aaron Swartz (Score:5, Informative)
So when Meta does this it's altruistic, but when Aaron Swartz does it, it's a federal crime with 25-life. Make it make sense.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re:Aaron Swartz (Score:4, Insightful)
Ok:
Rich and powerful businesses operate under a different set of laws than individual people.
That's pretty much it. I can elaborate a bit though:
Laws aren't handed to us by God. They aren't discovered by the scientific method. They are invented by human beings. In particular, they are invented by rich and powerful human beings who all share a common motivation: to remain rich and powerful. So, the purpose of the law is to protect their wealth and power.
Ostensibly it is to ensure fair and equal treatment for everyone, keep everyone safe, etc. That's mostly true only inasmuch as such fairness and safety are necessary to keep powerful people powerful. It's not true in some lofty, philosophical, "everyone is equally important" sense. Sentiments like that are just there to get public buy-in so that people don't revolt.
Meta, as a legal entity, is simply more important than any of the individual authors of those works. Their ambitions of creating a better LLM, are more important. So, the little people will be made to move. Their grievances will be heard, paid lip-service-to, and then ignored. There might be some token efforts, like some kind of legal clarity that will make it crystal clear that no other little people are allowed to do this sort of thing. There might even be something of a slap on the wrist to satisfy the mob's desire for vengeance.
But Meta will not be treated like some uppity nobody. Meta will be permitted to pursue its ambitions.
Re: Aaron Swartz (Score:2)
Re: (Score:2)
Have you experienced the intelligence of the average human? George Carlin was right when he said we get the rulers we deserve in a democracy - we elected them after all did we not? If the average person had any intelligence there would have been mass protest votes for third parties since at least the 1980's
Re: (Score:2)
Re: (Score:2, Interesting)
Let me flip that around for you. When Aaron Swartz did it, his actions were morally justified and his alleged crimes completely excused by the Slashdot crowd. But if Meta trains its AI on the same pirated material that Swartz acquired and distributed on the Internet, suddenly it's a federal crime that needs to be prosecuted.
I personally despise Meta, but you could slice the hypocrisy with a knife on th
Re: (Score:3)
Aaron Swartz copied stuff to give everyone access to science that was largely funded by the public purse.
Meta copied a bunch of stuff which wasn't publicly funded so they could provide off it.
Do you really not see that beyond "copying" they are different?
Re: (Score:2)
I for one hope we finally have someone with deep pockets who can fight this bullshit. After all we just had a story about the RIAA getting personal information about internet subscribers again.
You don't need to like Meta to support them using their money to upend the currently broken interpretation of copyright law.
Re:Aaron Swartz (Score:4, Informative)
Except Swartz did it non-commercially. Meta should have the Rico Act thrown at them.
This... looks bad (Score:5, Insightful)
I've actually been somewhat understanding towards the LLM companies when it comes to the data acquisition. Like you can't really train these models without hoovering up the Internet and it is kinda tricky making sure you don't scoop up some improperly shared copyrighted material along the way.
But Meta didn't accidentally downloaded copyrighted materials. They deliberately sought out (and reshared) pirated materials, and tried to hide their involvement.
They should get hammed big time for this.
Re: (Score:2)
You're far too forgiving of these cunts.
Re: (Score:2)
Indeed. A criminal conviction is the least they should get. I mean, they did commercial copyright infringement on mass-scale, perfectly knowing what they did. Should result in prison time for the decision makers at the very least.
Re: (Score:2)
Public access is not public domain. Before the DMCA you had implicit license to make copies for the purpose of viewing the web plus fair use, with the DMCA there are some explicit exemptions and fair use.
The only difference between websites and books is that most websites aren't registered works, so the owner ha to prove damages to get awards.
Re: (Score:2)
But that's not necessary. Ever heard of an HTTP request? You're asking for, not taking, content. The server, which is under the control of the copyright holder, decides whether to provide you with a copy. If they're not the copyright holder, it may be illegal copying and distribution.
Re: (Score:2)
Public access is not public domain. They gave one copy to a router, there's a dozen more transient copies made before you even see it ... and the implicit license ended there.
Re: (Score:2)
>Public access is not public domain.
Yes, but that only means you can't make more copies than the one you were ultimately given.
Re: (Score:3)
So what? I do exactly the same when training my flesh and blood LLM. Does the implementation really matter that much? And if yes, why?! Just about all "knowledge" is encompassed in copyrighted stuff. As is just about anything you consume with your eyes that is not nature, ranging from your clothes to this comment and from the method you used to learn arithmetic to you singing your favorite song or reading a book from the library.
Is everything I do a derivative of a copyrighted work? Yes. Should I therefore
Re: (Score:3, Informative)
If people had to pay for content instead of simply pirating it, they'd have to spend more, too.
>The issue is Facebook was never a customer lost
Same as the guy who steals money from the till.
This is so Facebook (Score:5, Interesting)
Not only did they torrent a ton of shit like petty pirates, they didn't even contribute bandwidth back by seeding for others.
Incongruities (Score:4, Insightful)
Meta ... is hoping to convince the court that torrenting is not in and of itself illegal, but is, rather, a "widely-used protocol to download large files."
True, if you want to exchange or download large files, a protocol is needed - but that does not imply your legitimacy to download any file you want or what you can do with it.
I can lift a can of beans off the grocery shelf and put it in my cart - a "widely-used protocol to obtain canned food". I can then go to checkout and pay, or I can go to door and hope not get caught.
I can drive my car to the bank - a "widely-used protocol to obtain cash". I can then do a legal transaction at the teller window then drive away happy, or I can rob the teller and getaway in the car.
Does Meta believe the PR shit it spews? And - with all due respect to legitimate attorneys - do lawyers believe this self-serving nonsense when they make it up and spew it?
"Meta allegedly accessed and downloaded datasets that Plaintiffs did not create, containing the text of published books that anyone can read in a public library, from public websites Plaintiffs do not operate or own."
Yes, that is what libraries are for. What I cannot do is got to library, borrow book, photocopy or scan it, then republish and sell it as my own work.
Of course, libraries are probably useless to the illiterate, and being a modern tech company seems to make its execs functionally illiterate, unable to read the law or any code of decency.
Re: (Score:2)
unable to read the law or any code of decency
Ah, so laws represent decency? That's cute.
Just out of curiosity, let's say it's your job to write the laws around the downloading of copyrighted material.
If your son was to download a copyrighted book from a publicly accessible URL, but then delete the file without sharing it, what should his punishment be?
What if he downloads a zip file with 1000 books?
What if he downloads a zip with every book in existence?
What if he writes an app that allows people to search for the title of a book they're looking
Re: (Score:2)
The whole thing is a nonsense argument. They are stalling. Torrenting is not and has never been illegal. What is illegal is downloading stuff and sharing stuff without permission. And they did it commercially and on mass-scale.
The same about random numbers... (Score:2)
The same about random numbers, not random unless the random number generator is seeded.
Or, its not illegal or wrong, unless you get caught seeding/doing it...
More simply, companies in the past caught doing something illegal, well they made $1-billion and then had to pay a $1-million fine, but denied any wrong doing.
JoshK.
This is why (Score:2)
commercial companies can get away with anything. It's ridiculous, all you need is a good lawyer and there you go... even pirating is ok for the wealthy. And the crazy thing is Zuck could win this one because facebooks value is 7x of the movie industry. What isn't fair is all the people buried by other companies.
Don't you share as you download? (Score:4, Interesting)
I'm not a particularly experienced seeding, but on the occasion that I did, I noticed that uploads of packets started pretty quickly, even though the file was incomplete. "Seeding" happens when the download is finished... but doesn't sharing start right away?
Re: (Score:3)
Not really, you can just disable seeding in most clients even with just regular clients.
Re: (Score:2)
Ahh... news to me. :)
But haven't torrented since what.cd went out of business.
Finally (Score:3)
As someone who is generally for filesharing and very much against the near-infinite copyrights we have right now, I think it is a good thing that a massive mega-corporation is finally on "our" side.
It used to be Disney and Microsoft vs the common people so obviously Disney won. Now the Disney attorneys can battle with the Meta attorneys, and I'm sure Microsoft is going to keep its mouth shut unless someone makes them admit to doing pretty much the same thing.
Re: Finally (Score:4, Interesting)
Re: (Score:2)
They are not. The enemy of my enemy is not my friend.
What's far more likely is meta will either get some kind of exception, or a token fine, nothing like the equivalent of 10 years in a federal prison.
I can guarantee you this will not lead of loosening of anything for individuals or companies smaller than megacorps.
I truly wish it were not so but I think that is the world we live in. Twas ever thus
Bold strategy, let's see if that pays off (Score:2)
Honestly I'm hoping Facebook "wins" and also "loses"
I hope they win, in that "using torrents is not in itself copyright infringement", but also "ingesting unlicensed materials without attempted to license, is still copyright infringement"
Re: (Score:2)
I'm too lazy to look it up but I recall one of the *AAs using the IP addresses of users downloading from a torrent to try and force ISPs to identify the customers so they could sue them. I however don't recall that seeding was a requirement, jus
Hosting costs (Score:2)
"I didn't do it!" (Score:2)
and if I did, it wasn't illegal! and if it was illegal, you've got no proof!
Sounds like something Bart Simpson would say?
I would love to see how they managed this (Score:5, Interesting)
I would really love to see how they managed to download such a vast trove of works WITHOUT seeding a single packet of data.
As anyone who torrents can attest... turning seeding off or throttling it essentially kills the download... at the file sizes they were downloading, to torrent those with seed off would take years.
They're flat out lying and full of shit.
And no.. stopping it at the firewall would have also killed it...
So them saying there is no proof means they simply torrented from an IP not linked to meta. which they essentially said when they said they took care to not use meta servers... which would also mean a guilty conscience and an active attempt to conceal them breaking the law. They're making the case against them. If the court sides that making copies by the simple act of downloading it... any copies made during the training process... putting the works into memory... a drive backup, etc... of others works for your own profit is legal, it will cause a drastic shift how the laws are interpreted. NOT GOING TO HAPPEN.
But that's only if META's actions are treated similarly to how a person would be prosecuted. (which on a side note... the people who did it should be charged, not just the company... it was a human doing it... might make people think twice about doing shady things on behalf of companies thinking they have immunity.)
Meta's only course at this stage is to go sealed court records, NDA's and a cash settlement.
Re: (Score:2)
Indeed. Sunds like a highly criminal endeavor. Probably need to find that part of Meta to be a criminal enterprise.
They need courage if they want to win (Score:3)
Charging for access to a derivative work would be a big no-no. Is an LLM a derivative work, or is it more like a child you are teaching to read by going to the library? I wouldn't think it illegal for a minor to learn how to read or become a writer by downloading books and not reselling them, though some writers or some countries might say you should go to a library where they have bought a copy. I don't think the "I only smoked second-hand" defense is worthy of them (or maybe it is *just* like them..) but if they had the courage to make it, they might actually have a point. That in the modern day there *might* be use-cases for massive data acquisition and processing that are to the public good. I don't think the current Wild West approach is.
Back in a writing class I remember the story of one famous writer who (on a mechanical typewriter, in the day) typed in the books of other authors they admired to get used to their writing. I don't know if Meta could argue they are poor, but they did release it for free, so I'd guess they are in a better situation than closed source vendors. But ultimately there probably would need to be a law about accepted uses for massive datasets regardless of provenance. Some such uses might be for private use, for LLM training if given back to the public as open source, statistical analysis, or to enable noncommercial search engines such as for home use or in a library or school. Certainly there is apparently no other legal way to search through book content than having the data on your hard drive or going to Google Books or maybe Google Cite if that is still a thing, not even sure if search works there.
So in that sense, it is either utterly illegal or there are shenanigans going on with special cases ironed out for corporations with billions of dollars. This lawsuit might be a good opportunity to specify some use cases where authors are not getting infringed, rather they are being promoted, learning is being promoted, and discovery of works based on either knowing the exact words you want to find or can specify to an AI, is promoted. Those all seem like good things. The problems come from billionaires making more billions from authors and artists without compensating them, and creating works based on their styles that effectively put them out of business and limit opportunities for young writers and artists to make a living. Generative AI currently has boundless opportunity for expansion and without recognizing those dangers I cannot see people whose creativity is there livelihood being complacent about teaching AIs with their work.
Google Books lets you preview the inside of books. I don't know how it has changed over the years and there were a bunch of lawsuits at one time, but I notice The Stand by Stephen King is there, including its cover art, it doesn't seem to reproduce a page about no copying allowed though. The Cat Who Walks Through Walls by Robert Heinlein doesn't seem to have a preview (though it has different versions, maybe one does) but it does have links to purchase on Amazon and an online bookstore local to me it seems. The Catcher in the Rye has a weird cartoon font cover page saying it is from Bibliomania Publishing in Egypt.. no clue if this is a weird scofflaw publisher or the real deal.
Anyway, I leave it to Meta and others, probably the billions mean there will be a legal loophole for LLMs and this is one of the tech bros' interests in the current U.S. administration. But if such loopholes can include education and provision for private use then I could see the Library of Congress and similar national libraries in other countries playing a role in scanning and hosting the torrents. Since there actually are good arguments for being able to search and index by concept the collected culture of the world, while promoting authors' rights and livelihoods in a balanced, legal manner.
Re: (Score:3)
Machines are not humans and do not have the privileges humans have. "Machine learning" is not learning. The term is used as a simplifying analog.
Of course they seeded, or will seed (Score:2)
Re: (Score:2)
Indeed. This is commercial copyright infringement, on mass-scale, and that comes with criminal penalties. Might even find that part of Meta to be a criminal enterprise.
I mean, if this was some single mom downloading something (i.e. non commercial downloading), she would be threatened with prison time and a few millions in fines.
So we can now pirate all we like? (Score:2)
As long as we disable seeding, that is? Well, in that case AI would have had at least one positive effect.
Looking forward to torrenting the last Disney movies! Or not.
FWIW in other countries (Score:3)
FWIW that is the situation in Switzerland. If something is available on the internet, you can download it. It is, however, illegal to *provide* content that you do not have the right to distribute.
For the consumer, this seems like a completely fair solution. Whether it should apply to companies is, perhaps, a different question.
Re: (Score:2)
So if you download a Windows ISO or a copy of Oracle or whatever, you can just use it in perpetuity, then, right
So why does any company or person in the entire country pay for software, books, artwork or anything else?
I don't think that's how it works at all, or you're greatly oversimplifying it.
Re: (Score:2)
LEECHers (Score:2)
Worse than copyright felons, they are a bunch of damned LEECHERS.
Just shun them . . .
Re: (Score:2)
Logic? (Score:2)
OK, so copying something isn't violating copyright, as long as you don't let others copy what you copied. Everyone should be copying their school text books instead of buying them then, because that's legal. Libraries have a problem, though.
Re: (Score:3)
If running one of history's worst acts of industrial-scale piracy for profit was fine because they didn't upload, then pirating anything from Napster or Megaupload should've been 100% legal.
Uhm, really? (Score:2)
I bet a lot of pirates find this interesting, as people suing usually seed themselves and wait who beings to load from them. This way they only prove the download.
I think the usual argument is "Only who is seeding causes 1 fantastillon in damages" so downloading is not worth going to court as the process would be only about like 30 USD in damages, if the user was not seeding (and thus starting the chain reaction of all others loading the content!). This defense will be hard, if you downloaded Terabytes of u
Did they make any copies? (Score:2)
I would like facebook to show how they distributed the files to their training systems.
Did they not make any copy that a human in their company had access to?
That's great! (Score:2)
I didn't rob a bank. I just took money from someone else that robbed a bank, but didn't spend opang of the money. No crime! /s
Maybe what they should have done (Score:2)
Odd claim (Score:3)
Re: (Score:2)
Multiply that by the 90+ million copyrighted works Meta torrented.
Re: (Score:3)