Nvidia Denies Pirate e-Book Sites Are 'Shadow Libraries' To Shut Down Lawsuit (arstechnica.com) 105
An anonymous reader quotes a report from Ars Technica: Some of the most infamous so-called shadow libraries have increasingly faced legal pressure to either stop pirating books or risk being shut down or driven to the dark web. Among the biggest targets are Z-Library, which the US Department of Justice has charged with criminal copyright infringement, and Library Genesis (Libgen), which was sued by textbook publishers last fall for allegedly distributing digital copies of copyrighted works "on a massive scale in willful violation" of copyright laws. But now these shadow libraries and others accused of spurning copyrights have seemingly found an unlikely defender in Nvidia, the AI chipmaker among those profiting most from the recent AI boom.
Nvidia seemed to defend the shadow libraries as a valid source of information online when responding to a lawsuit from book authors over the list of data repositories that were scraped to create the Books3 dataset used to train Nvidia's AI platform NeMo. That list includes some of the most "notorious" shadow libraries -- Bibliotik, Z-Library (Z-Lib), Libgen, Sci-Hub, and Anna's Archive, authors argued. However, Nvidia hopes to invalidate authors' copyright claims partly by denying that any of these controversial websites should even be considered shadow libraries.
"Nvidia denies the characterization of the listed data repositories as 'shadow libraries' and denies that hosting data in or distributing data from the data repositories necessarily violates the US Copyright Act," Nvidia's court filing said. The chipmaker did not go into further detail to define what counts as a shadow library or what potentially absolves these controversial sites from key copyright concerns raised by various ongoing lawsuits. Instead, Nvidia kept its response brief while also curtly disputing authors' petition for class-action status and defending its AI training methods as fair use. "Nvidia denies that it has improperly used or copied the alleged works," the court filing said, arguing that "training is a highly transformative process that may include adjusting numerical parameters including 'weights,' and that outputs of an LLM may be based, at least in part, on such 'weights.'" "Nvidia's argument likely depends on the court agreeing that AI models ingesting published works in order to transform those works into weights governing AI outputs is fair use," notes Ars. "However, authors have argued that 'these weights are entirely and uniquely derived from the protected expression in the training dataset' that has been copied without getting authors' consent or providing authors with compensation."
"Authors suing Nvidia have taken the next step, linking the chipmaker to shadow libraries by arguing that 'these shadow libraries have long been of interest to the AI-training community because they host and distribute vast quantities of unlicensed copyrighted material. For that reason, these shadow libraries also violate the US Copyright Act.'"
Nvidia seemed to defend the shadow libraries as a valid source of information online when responding to a lawsuit from book authors over the list of data repositories that were scraped to create the Books3 dataset used to train Nvidia's AI platform NeMo. That list includes some of the most "notorious" shadow libraries -- Bibliotik, Z-Library (Z-Lib), Libgen, Sci-Hub, and Anna's Archive, authors argued. However, Nvidia hopes to invalidate authors' copyright claims partly by denying that any of these controversial websites should even be considered shadow libraries.
"Nvidia denies the characterization of the listed data repositories as 'shadow libraries' and denies that hosting data in or distributing data from the data repositories necessarily violates the US Copyright Act," Nvidia's court filing said. The chipmaker did not go into further detail to define what counts as a shadow library or what potentially absolves these controversial sites from key copyright concerns raised by various ongoing lawsuits. Instead, Nvidia kept its response brief while also curtly disputing authors' petition for class-action status and defending its AI training methods as fair use. "Nvidia denies that it has improperly used or copied the alleged works," the court filing said, arguing that "training is a highly transformative process that may include adjusting numerical parameters including 'weights,' and that outputs of an LLM may be based, at least in part, on such 'weights.'" "Nvidia's argument likely depends on the court agreeing that AI models ingesting published works in order to transform those works into weights governing AI outputs is fair use," notes Ars. "However, authors have argued that 'these weights are entirely and uniquely derived from the protected expression in the training dataset' that has been copied without getting authors' consent or providing authors with compensation."
"Authors suing Nvidia have taken the next step, linking the chipmaker to shadow libraries by arguing that 'these shadow libraries have long been of interest to the AI-training community because they host and distribute vast quantities of unlicensed copyrighted material. For that reason, these shadow libraries also violate the US Copyright Act.'"
Books have value but ... (Score:5, Insightful)
Nvidia, "Your books are really important to our AI models, but we're not going to pay you anything for your effort creating them. We want your work for free. Paying you would hurt our bottom line".
Re:Books have value but ... (Score:5, Informative)
Nah, we've seen the same thing with MP3 and movie downloads. People are willing to pay a reasonable amount, but copyright law and markets lag behind technology.
When the iPod was released in 2001, "1000 songs in your pocket for $399", but who was going to pay thousands of dollars to fill it?
Here in Australia, there was not even a legal way to do so at any price. Only in 2003 did iTunes launch, and Spotify many years later.
Australians were among the wolds most prolific pirate movie downloaders, until 2015 when Netflix launched.
One day, there will be reasonable terms negotiated for AI companies to pay authors, just as radio stations and Spotify found a way to pay musicians.
Re: (Score:3)
Australians were among the wolds most prolific pirate movie downloaders
Bunch of thieves and criminals the lot of them. We should round them all up and go put them on a deserted* island somewhere where they can't harm anyone else.
*We'll just claim its deserted.
Re: (Score:2)
Terra Nullus does not mean uninhabited, but unowned. Practically, it means there were no chiefs to sign a treaty with, or to buy land, as they had done elsewhere. The tribal social structures that existed in the Americas or Pacific islands simply did not exist. But the people all became full British subjects when we took the land. Did they thank us?
Now don't insult us, or I'll slip some iocaine powder in your drink.
Re: (Score:2)
Terra Nullus does not mean uninhabited, but unowned.
You turning a joke into a serious matter aside, it's a distinction without a difference. The world (or at least many in Australia) have recognised traditional ownership of the land, even if the colonisers didn't.
Shit I just had this discussion yesterday how I was standing at Perth airport looking for my gate to Brisbane without being able to find it and critically without being able to recognise any of the place names on the board. Turns out Perth Airport was writing the traditional owners above the gate so
Re: (Score:2)
That just leads to a silly semantic argument over the word "ownership".
Stone-age territory was simply whatever you could hold by force, and that ebbed and flowed. If their land was taken, they lost it, be it the next tribe or foreign settlers. The idea of a higher authority, a state, recording and enforcing agreed boundaries and rights was utterly unknown to pre-civilisation cultures.
Re: (Score:2)
But in the meantime, a $2 trillion company can just steal the collected works of humanity?
Re: (Score:2)
But in the meantime, a $2 trillion company can just steal the collected works of humanity?
Every writer derives their work from others. We all stand on the shoulders of giants.
(And FYI, stealing mean to permanently deprive, not to copy.)
Re: (Score:2)
In most cases the writer provided some value back to "the others" in the process. A direct book sale is the most straightforward way. But local taxes or tuition dollars that supports a legit library are another. The ads embedded in this or that website. Subscription fees to magazines or online services. Even intangibles, like a shoutout, recommendation, or citation. The list goes on.
And writers in turn hope for some payment or value back when people use
Re: (Score:2)
You were expected to rip your CDs. That's why it launched with iTunes and its built in CD ripper. You weren't expected to spend thousands of dollars filling it as you probably already have over the years with your CD collection.
And while ripping laws were questionable, so it's entirely possible CD ripping wa
Re: (Score:2)
Indeed. 5 years, 10 years, but at that point things should go into the public domain. Same for patents, and no way to extend them.
Hmm (Score:5, Interesting)
Actually interesting legal question.
On one hand, distribution of copyrighted works openly on the internet violates US law. On the other hand, learning from another person's copyrighted material has never required permission. For a very good reason.
All human creativity comes from learning from works of those that came before. Making learning from knowledge wouldn't just be a massive expansion of copyright that requires legislative changes. It would be a change that would destroy all future creativity. Because most of the creative people aren't rich, while authors have every incentive to maximally rent seek on that which they created by learning from those that came before them for free.
So does learning from libraries that violate copyright law distributing books actually against the law? I don't think anyone knows, and precedent will need to be set. And that's going to be a hell of a dangerous one, because if it's even slightly too broad, the court will risk just destroying the main source of creativity in US, ability to learn from those that came before. Completely.
This is one case where public good is just so clear and massive, while private harm is so miniscule and massively accepted that this should be a clear cut case. But this is a court of US law, so it's not that simple.
Re: (Score:2, Informative)
The books are not being used to learn. Humans learn. Dogs learn. Computers transform data between formats; they do not learn.
What is the great public good here when private trillion dollar companies taking the works of thousands of people without compensation?
If it was your life's work being taken without compensation, you might feel the harm wasn't miniscule.
Re: (Score:2)
Re: (Score:2)
We have long established that neural nets learn. We modelled them on living beings.
You're in denial of well established facts.
P.S. What's better at learning, bacteria or neural net?
Re: (Score:2)
Distinction without meaning for this discussion.
Re: (Score:2, Insightful)
And point 2: what is the great public good in fucking over private citizens who spent months or years writing books and letting a trillion dollar company take them for free?
Still waiting on that one, too.
Re: (Score:2)
Since you're obviously running a far left feelings based argument, let's play your game.
What is the great public good in allowing books published by a massive company mandate life long royalty payments from all those that learned valuable lessons from those books.
Notably unlike your scenario, mine isn't a hypothetical. Entire school and university book publishing model is built on trying to get as close to the scenario described above as possible, best the ultimate rent seeking is rent seeking on learning f
Re: (Score:2)
So you don't have an answer as to why a mega corporation should get free shit at the expense of the little guy.
Nothing but deflection, as expected.
Re: (Score:2)
>So you don't have an answer as to why a mega corporation should get free shit at the expense of the little guy.
And neither did you. And there's nothing hypothetical about my scenario. I literally listed an example of scenario I described happening for several decades at this point that still is going on. That is widely known, documented, complained about, and one of the main reasons people are pushed into massive debt in places like US when they go to a school/university.
Re: (Score:2)
Lucky attempts to engage in debate, "I know you are but what am I?"
Lol, your deflections are childish. My kid was better at this when she was 7. I didn't call you a hypocrite. Do you even know what that word means?
You didn't even read what I said, did you? You have no idea what my point was, you're just reflexively going off because you got called out for saying dumb shit and being a Big Corp shill.
Here, I'll tell you AGAIN: you think it's cool for trillion dollar corporations to take the work of author
Re: (Score:2)
The obviously hilarious part here is that you're projecting. You didn't in fact read what I said. And you are in fact shilling for Big Corp. Not hypothetically as you claim myself to do. Practically, for a specific form of rent seeking that has been a central complaint of everyone studying for decades.
And yet you're filled with this hilarious Righteous Fury. And it has to go somewhere. And so it does. And it's hilarious.
Re: (Score:2)
Dude, give it up. You said something stupid, got called out, got embarrassed and now you need to put the shovel down.
This thread is so old only you n me even know it's still active. No one else cares and you're not convincing either of us of anything.
Let it go. You posted dumb Big Corp Cock Sucking Shill bullshit. Own it. It's ok.
You will never answer how a trillion dollar company taking author's works for no compensation is a public good. You won't even acknowledge you said it or that is the thing I
Re: (Score:2)
>You will never answer how a trillion dollar company taking author's works for no compensation is a public good.
Notably the answer has been given above and hasn't changed since. You just can't accept it. That's why you're babbling about "little guy getting hypothetically screwed", and ignoring the actual travesty that is current publishing model of school books.
Re: (Score:2)
You have not given an answer to my question. You gave an answer to what you wanted my question to be and then asked me some unrelated nonsense.
I will try again. I think this is the 6th time:
What is the specific public good that occurs when trillion dollar companies do not pay authors for their works?
Re: (Score:2)
Keep going. Let's see how long you can keep up this hilarious sharade of yours of caring about small people as you stomp on them.
Re: (Score:2)
What is the public good when a trillion dollar company takes the work of authors?
Re: (Score:2)
C'mon, you gotta do better. At least say "steal" or "mob" or "beat the shit out of".
Re: (Score:2)
Tesla is fucked again?
Ok. I'll add this time to the list.
Re: (Score:2)
It's still not five. No matter how much you polish your missile to thoughts of having intimate time with me, and imagine it being real.
Re: (Score:2)
Lol, so you got nothing.
He has a point. At best you are making a semantic argument about the word "learning", and a childish retort. Actually, to call it an argument would be too kind - you merely made an unsupported assertion. I'm sure a smart guy like you can come up with a definition of learning that excludes machines, but what would be the point?
If "transform" means finding patterns in the data, and using those to answer new questions, then we choose to call that learning. You are arguing over whether a submarine can swim.
Re:Hmm (Score:4, Insightful)
Can you not say that learning for humans is not simply transforming data and storing it in your brain? I see no difference apart from the assumptions that living things learn, and computers can't. Just because we don't know how it is stored (it clearly is since you can recall it), doesn't make it learning.
Re: (Score:3)
>Can you not say that learning for humans is not simply transforming data and storing it in your brain?
No, because we know from fMRI and a large body of psychological testing of how humans learn that it in fact is just that. That's what we modelled AI on. That's likely why OpenAI folks are so certain that if they can get enough compute, they can get to AGI without figuring it out properly as evolution did with humans. On the most fundamental level, hardware is vastly different but general principle of le
Re: (Score:3)
I think that is what I was saying, I am not sure we are disagreeing, but in a sense we are transforming data, its not a lossless transform, but somehow its stored, you learn something. It adjust the way your neurons fire, and the connections between them. As you said like a neural net is modeled on in AI.
As for self awareness I am not sure what even self aware is, is it just a learned behavior, genetically not as a survival trait. There are still many animals that we can't prove they are self aware that we
Re: (Score:2)
Self awareness is the issue theoretical AI developers discovered back in the 1960s, when they were trying to model and predict what would AI actually look like, and what it would need to have.
Essentially, it's awareness of self, and calibration of the world in relation to itself. This allows for incredible amount of shortcuts in abstract thinking, but also generates quite a few errors typical to humans (and to which human societies always struggle to adapt).
A good example of shortcuts this allows is autonom
Re: (Score:2)
You're laying out the case I laid out on why the likes of OpenAI folks believe they can brute force the "General" part of AGI.
Reality is though, we have no idea. It's just the best shot we have at AGI, because we have no idea what makes the underlying chemical/electric computational neural network that is human body as efficient as it is in reasoning with as little information as it requires to do it.
P.S. Reminder: human intelligence is NOT just the brain. It's not even just the human in human body. Just th
Re: (Score:2)
Living things draw on their own experiences, as well as things they have read. An LLM is only the sum of its parts, it has no life experience, no brain that dreams at night, no ability to seek out new experiences based on its interests.
Re: (Score:2)
And now, we're redefining life to mean only the most complex mammals, if even that.
Bacteria? Not alive. Micro-organisms? Not alive. Etc.
This is the sad state of people with that specific political leaning today. They learned to think with words rather than concepts. And they think that by changing words and their meanings, they can change the concepts that words describe. It never even occurs to them that their world is backwards. Words are merely tools used to describe concepts, and changing words and thei
Re: (Score:2)
Well, not quite. Each training example is indeed transformed from raw text or image to model gradients. So AI's don't "learn" but they "model the data". That model is also highly reliant on combining gradients from billions of examples, it's far from simply changing data formats. They merge knowledge from many sources. And the end result is not a duplicate of the original in a
Re: (Score:2)
Humans learn. Dogs learn. Computers transform data between formats
All you did was transform words from a dictionary to a storage form in your brain and regurgitate them back on your keyboard.
Calling training an AI model "transforming data" is almost as ignorant as all those people using the term "AI" for actually generating something with that model.
Re: (Score:2)
The books are not being used to learn. Humans learn. Dogs learn. Computers transform data between formats; they do not learn.
The physicalists think that humans, dogs and computers are essentially the same thing. They are pretty much a bizarre nihilism cult.
Re: (Score:1)
Learning? You mean copying to local storage, copying into the training set and copying into memory during training?
Have no fear, no one else is going to confuse computers copying with humans reading.
Re: (Score:2)
Have no fear, no one else is going to confuse computers copying with humans reading.
Really?
Re: (Score:3)
>Have no fear, no one else is going to confuse computers copying with humans reading.
There's no fear, because there's certainty. Processes are remarkably similar, as one is directly modelled on the other. Primary difference is not in process, but what each system is optimized for. Biological brains of humans, being a General Intelligence save a lot of energy on having awareness of self and projecting learning from said awareness of self. Big Data Machine Leaning processes do not, so they have to brute fo
Re: (Score:2)
You forgot all the copying at each hop and repeater along the internet, and only then to memory, cache, the video card, the monitor, an image onto the retina, and then weird lossy encoding into neural signals. That's for a human on the internet; for AI learning some of those copies are skipped or replaced, and the lossiness of the encoding can be quantified.
Re: (Score:2)
It looks to me like a lot of people and even people here are not smart enough to see the difference. "Learning" requires insight. But understanding that learning requires insight also requires insight. A lot of people do not have that tool available in any significant amount.
Re: (Score:2)
A dog can mostly just train, which is not learning for the purpose of the discussion here. The level of insight a dog can have is rather limited, but it is not zero, so a dog can also learn.
Incidentally, you just confirmed my statement. Well done. Even if you are not smart enough to be able to understand what you just did.
Re: (Score:3)
On one hand, distribution of copyrighted works openly on the internet violates US law. On the other hand, learning from another person's copyrighted material has never required permission. For a very good reason
Humans and computers are not treated the same under the law.
It's an interesting philosophical point, but until the law changes, AI is not treated like a human.
Re: (Score:2)
Corporations are treated as people under the law. A lot of ML AI is in fact corporate.
Re: (Score:2)
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
When an AI is charged with copyright violation, then that will be something notable.
Re: (Score:2)
No one has been changed with anything. This is a discussion about hypothetical scenario.
You seemed to understand this concept in the first post, and promptly forgotten in the second.
Re: (Score:2)
Charged with what? We're talking civil law here, not criminal.
Re: (Score:2)
Re: (Score:2)
I concur with the recommendation, and add "and comprehend what it says" to it.
Re: (Score:2)
Re: (Score:2)
Neither is stated in the OP, and neither is true.
Re: (Score:2)
The corporation may be treated like a person. Any AI it owns will not be, anymore than an AI owned by an individual will be.
Re: (Score:2)
Human may be treated like a person. Any body part human has won't be.
Ok. It's still a true distinction, and it still one that has no meaning in this context.
Re: (Score:2)
LLM's learn to the same degree that compression algorithms learn.
Re: (Score:2)
Indeed. That is the same "learning" any automated mechanisms does. May as well claim a page of paper "learns" when some text is printed on it. Complete nonsense.
Re: (Score:2)
And both learning concepts are directly derived from human learning. Denial of reality in the name of hatred of LLMs and ML AI is getting to the point where people blinded by their hatred reject reality and history.
Re: (Score:1)
So does learning from libraries that violate copyright law distributing books actually against the law?
What makes you think libraries are illegal?
Re: (Score:3)
Now copyright defenders are in a t
Re: (Score:2)
Being able to do things and understanding things are two different things. The second typically empowers the first, but it is a separate activity. No "society" or "culture" needed at all.
Re: (Score:2)
There is no "learning" here. "Learning" takes an entity capable of insight. AI cannot do insight. What they really are doing is calibrating a machine based on data from somebody else they have not gotten permission for to use in that way.
Re: (Score:2)
You just declared that no simple animal is capable of learning, and that evolution is incapable of learning as a process.
This is why blanket statement from AI haters are so dangerous. They don't think their assertions through at all. And if we take them seriously, they stand a very real chance of causing massive damage to society.
Re: (Score:2)
Learning, in the form used in this discussion, is not about rote-"learning", which simply is training. Stop being a dumb asshole.
Incidentally, it is quasi-religious insightless "true believers" like you that are a danger to society.
Re: (Score:2)
You really have not a single clue how world works, do you? I know that your ideology makes you an evolution denier, but surely you can at least agree that fish can and do learn?
So what do you think that learning entails?
Re: (Score:2)
You obviously have not understood anything I wrote. No surprise. That you post a question directly under the relevant answer from my side is quite telling. Also, I have a) not presented any "ideology" and b) never denied evolution. In fact, I am pretty sure Evolution is what makes human bodies what they are today. Whether the human mind is a product of evolution is unclear however and that is not even in dispute among scientists. Some aspects certainly are (concrete memory is mostly or completely a thing th
Re: (Score:2)
>I have never denied evolution
And then straight into denial of evolution:
>but we have not a single example of general intelligence in the animal kingdom. That makes it rather unlikely that general intelligence comes from evolution.
How did I know you would deny evolution? What is my secret?
Re: (Score:2)
On the other hand, learning from another person's copyrighted material has never required permission.
That's why most of these suits includes samples of direct reproduction. The algorithms overmatch, so with the appropriate triggers the large slabs of exact source data will be produced. The courts haven't decided yet, but one argument being put forward is that the overmatches show infringing reproduction, not just learning.
Re: (Score:2)
Notice how you have to really, really split hairs to get a differentiation.
All while publishers and writers have long wanted to rent seek on learning processes. One merely needs to look at publishing and writing principles in academic book writing to see just how badly they want to get this into the system, by any means necessary.
Chances of legal system declaring learning in violation of copyright, and then not following up with massive amount of studies and lawsuits that everyone who read your book in scho
Re: (Score:2)
And being a human, I made a mistake in amount of negatives in that statement. Third paragraph fist sentence should be:
>Chances of legal system declaring learning in violation of copyright, and then following up with massive amount of studies and lawsuits that everyone who read your book in school owes you royalties for life as they earn a part of their living from information they learned from it is 100%.
But the point I'm making is obvious enough.
Re: (Score:2)
I'm not sure how you can claim that appropriating the work of just about every published author is "miniscule and massively accepted".
Re: (Score:2)
Because entire human history and every single successful society, from first man who invented fire to today is based on it.
The fact that you don't think this to be sufficient evidence says nothing about my claim, and everything about your understanding of the world.
"Social Credit" by CH Douglas supports your point (Score:2)
https://en.wikipedia.org/wiki/... [wikipedia.org]
"Social credit is a distributive philosophy of political economy developed by C. H. Douglas. Douglas attributed economic downturns to discrepancies between the cost of goods and the compensation of the workers who made them. To combat what he saw as a chronic deficiency of purchasing power in the economy, Douglas prescribed government intervention in the form of the issuance of debt-free money directly to consumers or producers (if they sold their product below cost to consu
"intellectual property" is trouble (Score:5, Interesting)
It was quite obvious long ago already that no matter what new and useful thing you do, you must "stand on the shoulders of giants".
With the "intellectual property" legal fiction, the shoulders of giants are replaced with very thin ice, and the lawyered-up equivalent of Brick Top is under it, helping to break it.
Re: (Score:2)
Well, yes. At the same time authors and creators must have some form of profiting reasonably off their works. I do agree that "intellectual property" is not a good solution for that.
Oops... (Score:2)
Sounds like criminal commercial copyright infringement by Nvidia. Nice.
if its a valid point , however (Score:2)