Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Piracy The Courts

Nvidia Denies Pirate e-Book Sites Are 'Shadow Libraries' To Shut Down Lawsuit (arstechnica.com) 105

An anonymous reader quotes a report from Ars Technica: Some of the most infamous so-called shadow libraries have increasingly faced legal pressure to either stop pirating books or risk being shut down or driven to the dark web. Among the biggest targets are Z-Library, which the US Department of Justice has charged with criminal copyright infringement, and Library Genesis (Libgen), which was sued by textbook publishers last fall for allegedly distributing digital copies of copyrighted works "on a massive scale in willful violation" of copyright laws. But now these shadow libraries and others accused of spurning copyrights have seemingly found an unlikely defender in Nvidia, the AI chipmaker among those profiting most from the recent AI boom.

Nvidia seemed to defend the shadow libraries as a valid source of information online when responding to a lawsuit from book authors over the list of data repositories that were scraped to create the Books3 dataset used to train Nvidia's AI platform NeMo. That list includes some of the most "notorious" shadow libraries -- Bibliotik, Z-Library (Z-Lib), Libgen, Sci-Hub, and Anna's Archive, authors argued. However, Nvidia hopes to invalidate authors' copyright claims partly by denying that any of these controversial websites should even be considered shadow libraries.

"Nvidia denies the characterization of the listed data repositories as 'shadow libraries' and denies that hosting data in or distributing data from the data repositories necessarily violates the US Copyright Act," Nvidia's court filing said. The chipmaker did not go into further detail to define what counts as a shadow library or what potentially absolves these controversial sites from key copyright concerns raised by various ongoing lawsuits. Instead, Nvidia kept its response brief while also curtly disputing authors' petition for class-action status and defending its AI training methods as fair use. "Nvidia denies that it has improperly used or copied the alleged works," the court filing said, arguing that "training is a highly transformative process that may include adjusting numerical parameters including 'weights,' and that outputs of an LLM may be based, at least in part, on such 'weights.'"
"Nvidia's argument likely depends on the court agreeing that AI models ingesting published works in order to transform those works into weights governing AI outputs is fair use," notes Ars. "However, authors have argued that 'these weights are entirely and uniquely derived from the protected expression in the training dataset' that has been copied without getting authors' consent or providing authors with compensation."

"Authors suing Nvidia have taken the next step, linking the chipmaker to shadow libraries by arguing that 'these shadow libraries have long been of interest to the AI-training community because they host and distribute vast quantities of unlicensed copyrighted material. For that reason, these shadow libraries also violate the US Copyright Act.'"
This discussion has been archived. No new comments can be posted.

Nvidia Denies Pirate e-Book Sites Are 'Shadow Libraries' To Shut Down Lawsuit

Comments Filter:
  • by iAmWaySmarterThanYou ( 10095012 ) on Tuesday May 28, 2024 @06:36PM (#64506555)

    Nvidia, "Your books are really important to our AI models, but we're not going to pay you anything for your effort creating them. We want your work for free. Paying you would hurt our bottom line".

    • by quenda ( 644621 ) on Tuesday May 28, 2024 @08:22PM (#64506753)

      Nah, we've seen the same thing with MP3 and movie downloads. People are willing to pay a reasonable amount, but copyright law and markets lag behind technology.
      When the iPod was released in 2001, "1000 songs in your pocket for $399", but who was going to pay thousands of dollars to fill it?
      Here in Australia, there was not even a legal way to do so at any price. Only in 2003 did iTunes launch, and Spotify many years later.
      Australians were among the wolds most prolific pirate movie downloaders, until 2015 when Netflix launched.

      One day, there will be reasonable terms negotiated for AI companies to pay authors, just as radio stations and Spotify found a way to pay musicians.

      • Australians were among the wolds most prolific pirate movie downloaders

        Bunch of thieves and criminals the lot of them. We should round them all up and go put them on a deserted* island somewhere where they can't harm anyone else.

        *We'll just claim its deserted.

        • by quenda ( 644621 )

          Terra Nullus does not mean uninhabited, but unowned. Practically, it means there were no chiefs to sign a treaty with, or to buy land, as they had done elsewhere. The tribal social structures that existed in the Americas or Pacific islands simply did not exist. But the people all became full British subjects when we took the land. Did they thank us?

          Now don't insult us, or I'll slip some iocaine powder in your drink.

          • Terra Nullus does not mean uninhabited, but unowned.

            You turning a joke into a serious matter aside, it's a distinction without a difference. The world (or at least many in Australia) have recognised traditional ownership of the land, even if the colonisers didn't.

            Shit I just had this discussion yesterday how I was standing at Perth airport looking for my gate to Brisbane without being able to find it and critically without being able to recognise any of the place names on the board. Turns out Perth Airport was writing the traditional owners above the gate so

            • by quenda ( 644621 )

              That just leads to a silly semantic argument over the word "ownership".
              Stone-age territory was simply whatever you could hold by force, and that ebbed and flowed. If their land was taken, they lost it, be it the next tribe or foreign settlers. The idea of a higher authority, a state, recording and enforcing agreed boundaries and rights was utterly unknown to pre-civilisation cultures.

      • by necro81 ( 917438 )

        One day, there will be reasonable terms negotiated for AI companies to pay authors

        But in the meantime, a $2 trillion company can just steal the collected works of humanity?

        • by quenda ( 644621 )

          But in the meantime, a $2 trillion company can just steal the collected works of humanity?

          Every writer derives their work from others. We all stand on the shoulders of giants.
          (And FYI, stealing mean to permanently deprive, not to copy.)

          • by necro81 ( 917438 )

            Every writer derives their work from others

            In most cases the writer provided some value back to "the others" in the process. A direct book sale is the most straightforward way. But local taxes or tuition dollars that supports a legit library are another. The ads embedded in this or that website. Subscription fees to magazines or online services. Even intangibles, like a shoutout, recommendation, or citation. The list goes on.

            And writers in turn hope for some payment or value back when people use

      • by tlhIngan ( 30335 )

        When the iPod was released in 2001, "1000 songs in your pocket for $399", but who was going to pay thousands of dollars to fill it?
        Here in Australia, there was not even a legal way to do so at any price.

        You were expected to rip your CDs. That's why it launched with iTunes and its built in CD ripper. You weren't expected to spend thousands of dollars filling it as you probably already have over the years with your CD collection.

        And while ripping laws were questionable, so it's entirely possible CD ripping wa

  • Hmm (Score:5, Interesting)

    by Luckyo ( 1726890 ) on Tuesday May 28, 2024 @06:45PM (#64506567)

    Actually interesting legal question.

    On one hand, distribution of copyrighted works openly on the internet violates US law. On the other hand, learning from another person's copyrighted material has never required permission. For a very good reason.

    All human creativity comes from learning from works of those that came before. Making learning from knowledge wouldn't just be a massive expansion of copyright that requires legislative changes. It would be a change that would destroy all future creativity. Because most of the creative people aren't rich, while authors have every incentive to maximally rent seek on that which they created by learning from those that came before them for free.

    So does learning from libraries that violate copyright law distributing books actually against the law? I don't think anyone knows, and precedent will need to be set. And that's going to be a hell of a dangerous one, because if it's even slightly too broad, the court will risk just destroying the main source of creativity in US, ability to learn from those that came before. Completely.

    This is one case where public good is just so clear and massive, while private harm is so miniscule and massively accepted that this should be a clear cut case. But this is a court of US law, so it's not that simple.

    • Re: (Score:2, Informative)

      The books are not being used to learn. Humans learn. Dogs learn. Computers transform data between formats; they do not learn.

      What is the great public good here when private trillion dollar companies taking the works of thousands of people without compensation?
      If it was your life's work being taken without compensation, you might feel the harm wasn't miniscule.

      • Re:Hmm (Score:4, Insightful)

        by ewibble ( 1655195 ) on Tuesday May 28, 2024 @07:18PM (#64506641)

        Can you not say that learning for humans is not simply transforming data and storing it in your brain? I see no difference apart from the assumptions that living things learn, and computers can't. Just because we don't know how it is stored (it clearly is since you can recall it), doesn't make it learning.

        • by Luckyo ( 1726890 )

          >Can you not say that learning for humans is not simply transforming data and storing it in your brain?

          No, because we know from fMRI and a large body of psychological testing of how humans learn that it in fact is just that. That's what we modelled AI on. That's likely why OpenAI folks are so certain that if they can get enough compute, they can get to AGI without figuring it out properly as evolution did with humans. On the most fundamental level, hardware is vastly different but general principle of le

          • I think that is what I was saying, I am not sure we are disagreeing, but in a sense we are transforming data, its not a lossless transform, but somehow its stored, you learn something. It adjust the way your neurons fire, and the connections between them. As you said like a neural net is modeled on in AI.

            As for self awareness I am not sure what even self aware is, is it just a learned behavior, genetically not as a survival trait. There are still many animals that we can't prove they are self aware that we

            • by Luckyo ( 1726890 )

              Self awareness is the issue theoretical AI developers discovered back in the 1960s, when they were trying to model and predict what would AI actually look like, and what it would need to have.

              Essentially, it's awareness of self, and calibration of the world in relation to itself. This allows for incredible amount of shortcuts in abstract thinking, but also generates quite a few errors typical to humans (and to which human societies always struggle to adapt).

              A good example of shortcuts this allows is autonom

        • by AmiMoJo ( 196126 )

          Living things draw on their own experiences, as well as things they have read. An LLM is only the sum of its parts, it has no life experience, no brain that dreams at night, no ability to seek out new experiences based on its interests.

          • by Luckyo ( 1726890 )

            And now, we're redefining life to mean only the most complex mammals, if even that.

            Bacteria? Not alive. Micro-organisms? Not alive. Etc.

            This is the sad state of people with that specific political leaning today. They learned to think with words rather than concepts. And they think that by changing words and their meanings, they can change the concepts that words describe. It never even occurs to them that their world is backwards. Words are merely tools used to describe concepts, and changing words and thei

      • > The books are not being used to learn. Humans learn. Dogs learn. Computers transform data between formats; they do not learn.

        Well, not quite. Each training example is indeed transformed from raw text or image to model gradients. So AI's don't "learn" but they "model the data". That model is also highly reliant on combining gradients from billions of examples, it's far from simply changing data formats. They merge knowledge from many sources. And the end result is not a duplicate of the original in a
      • Humans learn. Dogs learn. Computers transform data between formats

        All you did was transform words from a dictionary to a storage form in your brain and regurgitate them back on your keyboard.

        Calling training an AI model "transforming data" is almost as ignorant as all those people using the term "AI" for actually generating something with that model.

      • by gweihir ( 88907 )

        The books are not being used to learn. Humans learn. Dogs learn. Computers transform data between formats; they do not learn.

        The physicalists think that humans, dogs and computers are essentially the same thing. They are pretty much a bizarre nihilism cult.

    • Learning? You mean copying to local storage, copying into the training set and copying into memory during training?

      Have no fear, no one else is going to confuse computers copying with humans reading.

      • Have no fear, no one else is going to confuse computers copying with humans reading.

        Really?

      • by Luckyo ( 1726890 )

        >Have no fear, no one else is going to confuse computers copying with humans reading.

        There's no fear, because there's certainty. Processes are remarkably similar, as one is directly modelled on the other. Primary difference is not in process, but what each system is optimized for. Biological brains of humans, being a General Intelligence save a lot of energy on having awareness of self and projecting learning from said awareness of self. Big Data Machine Leaning processes do not, so they have to brute fo

      • You forgot all the copying at each hop and repeater along the internet, and only then to memory, cache, the video card, the monitor, an image onto the retina, and then weird lossy encoding into neural signals. That's for a human on the internet; for AI learning some of those copies are skipped or replaced, and the lossiness of the encoding can be quantified.

      • by gweihir ( 88907 )

        It looks to me like a lot of people and even people here are not smart enough to see the difference. "Learning" requires insight. But understanding that learning requires insight also requires insight. A lot of people do not have that tool available in any significant amount.

    • On one hand, distribution of copyrighted works openly on the internet violates US law. On the other hand, learning from another person's copyrighted material has never required permission. For a very good reason

      Humans and computers are not treated the same under the law.

      It's an interesting philosophical point, but until the law changes, AI is not treated like a human.

    • LLM's learn to the same degree that compression algorithms learn.

      • by gweihir ( 88907 )

        Indeed. That is the same "learning" any automated mechanisms does. May as well claim a page of paper "learns" when some text is printed on it. Complete nonsense.

      • by Luckyo ( 1726890 )

        And both learning concepts are directly derived from human learning. Denial of reality in the name of hatred of LLMs and ML AI is getting to the point where people blinded by their hatred reject reality and history.

    • by Anonymous Coward

      So does learning from libraries that violate copyright law distributing books actually against the law?

      What makes you think libraries are illegal?

    • Good points. To add to that: intelligence is social. Both language and culture are based on many individual contributions across space and time. A human taken apart from humanity can't achieve much. Just like an untrained model, if nobody taught him language, he would not rediscover it individually. Rediscovering our scientific progress from scratch would take the same amount (took us 200k years!), we're not that smart even across a whole generation to do it in one step.

      Now copyright defenders are in a t
      • by gweihir ( 88907 )

        Being able to do things and understanding things are two different things. The second typically empowers the first, but it is a separate activity. No "society" or "culture" needed at all.

    • by gweihir ( 88907 )

      There is no "learning" here. "Learning" takes an entity capable of insight. AI cannot do insight. What they really are doing is calibrating a machine based on data from somebody else they have not gotten permission for to use in that way.

      • by Luckyo ( 1726890 )

        You just declared that no simple animal is capable of learning, and that evolution is incapable of learning as a process.

        This is why blanket statement from AI haters are so dangerous. They don't think their assertions through at all. And if we take them seriously, they stand a very real chance of causing massive damage to society.

        • by gweihir ( 88907 )

          Learning, in the form used in this discussion, is not about rote-"learning", which simply is training. Stop being a dumb asshole.

          Incidentally, it is quasi-religious insightless "true believers" like you that are a danger to society.

          • by Luckyo ( 1726890 )

            You really have not a single clue how world works, do you? I know that your ideology makes you an evolution denier, but surely you can at least agree that fish can and do learn?

            So what do you think that learning entails?

            • by gweihir ( 88907 )

              You obviously have not understood anything I wrote. No surprise. That you post a question directly under the relevant answer from my side is quite telling. Also, I have a) not presented any "ideology" and b) never denied evolution. In fact, I am pretty sure Evolution is what makes human bodies what they are today. Whether the human mind is a product of evolution is unclear however and that is not even in dispute among scientists. Some aspects certainly are (concrete memory is mostly or completely a thing th

              • by Luckyo ( 1726890 )

                >I have never denied evolution

                And then straight into denial of evolution:

                >but we have not a single example of general intelligence in the animal kingdom. That makes it rather unlikely that general intelligence comes from evolution.

                How did I know you would deny evolution? What is my secret?

    • by lordlod ( 458156 )

      On the other hand, learning from another person's copyrighted material has never required permission.

      That's why most of these suits includes samples of direct reproduction. The algorithms overmatch, so with the appropriate triggers the large slabs of exact source data will be produced. The courts haven't decided yet, but one argument being put forward is that the overmatches show infringing reproduction, not just learning.

      • by Luckyo ( 1726890 )

        Notice how you have to really, really split hairs to get a differentiation.

        All while publishers and writers have long wanted to rent seek on learning processes. One merely needs to look at publishing and writing principles in academic book writing to see just how badly they want to get this into the system, by any means necessary.

        Chances of legal system declaring learning in violation of copyright, and then not following up with massive amount of studies and lawsuits that everyone who read your book in scho

        • by Luckyo ( 1726890 )

          And being a human, I made a mistake in amount of negatives in that statement. Third paragraph fist sentence should be:

          >Chances of legal system declaring learning in violation of copyright, and then following up with massive amount of studies and lawsuits that everyone who read your book in school owes you royalties for life as they earn a part of their living from information they learned from it is 100%.

          But the point I'm making is obvious enough.

    • by necro81 ( 917438 )

      This is one case where public good is just so clear and massive, while private harm is so miniscule and massively accepted that this should be a clear cut case.

      I'm not sure how you can claim that appropriating the work of just about every published author is "miniscule and massively accepted".

      • by Luckyo ( 1726890 )

        Because entire human history and every single successful society, from first man who invented fire to today is based on it.

        The fact that you don't think this to be sufficient evidence says nothing about my claim, and everything about your understanding of the world.

    • https://en.wikipedia.org/wiki/... [wikipedia.org]
      "Social credit is a distributive philosophy of political economy developed by C. H. Douglas. Douglas attributed economic downturns to discrepancies between the cost of goods and the compensation of the workers who made them. To combat what he saw as a chronic deficiency of purchasing power in the economy, Douglas prescribed government intervention in the form of the issuance of debt-free money directly to consumers or producers (if they sold their product below cost to consu

  • by Mr. Dollar Ton ( 5495648 ) on Tuesday May 28, 2024 @06:58PM (#64506603)

    It was quite obvious long ago already that no matter what new and useful thing you do, you must "stand on the shoulders of giants".

    With the "intellectual property" legal fiction, the shoulders of giants are replaced with very thin ice, and the lawyered-up equivalent of Brick Top is under it, helping to break it.

    • by gweihir ( 88907 )

      Well, yes. At the same time authors and creators must have some form of profiting reasonably off their works. I do agree that "intellectual property" is not a good solution for that.

  • Sounds like criminal commercial copyright infringement by Nvidia. Nice.

  • doesn't that mean they (Nivida) are guilty of receiving and handling stolen property or at the least profiting from the proceeds of a crime?

Technology is dominated by those who manage what they do not understand.

Working...