Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Facebook The Courts

Mark Zuckerberg Gave Meta's Llama Team the OK To Train On Copyright Works, Filing Claims (techcrunch.com) 68

Plaintiffs in Kadrey v. Meta allege that Meta CEO Mark Zuckerberg authorized the team behind the company's Llama AI models to use a dataset of pirated ebooks and articles for training. They further accuse the company of concealing its actions by stripping copyright information and torrenting the data. TechCrunch reports: In newly unredacted documents filed (PDF) with the U.S. District Court for the Northern District of California late Wednesday, plaintiffs in Kadrey v. Meta, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, recount Meta's testimony from late last year, during which it was revealed that Zuckerberg approved Meta's use of a data set called LibGen for Llama-related training. LibGen, which describes itself as a "links aggregator," provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued a number of times, ordered to shut down, and fined tens of millions of dollars for copyright infringement.

According to Meta's testimony, as relayed by plaintiffs' counsel, Zuckerberg cleared the use of LibGen to train at least one of Meta's Llama models despite concerns within Meta's AI exec team and others at the company. The filing quotes Meta employees as referring to LibGen as a "data set we know to be pirated," and flagging that its use "may undermine [Meta's] negotiating position with regulators." The filing also cites a memo to Meta AI decision-makers noting that after "escalation to MZ," Meta's AI team "[was] approved to use LibGen." (MZ, here, is rather obvious shorthand for "Mark Zuckerberg.")

The details seemingly line up with reporting from The New York Times last April, which suggested that Meta cut corners to gather data for its AI. At one point, Meta was hiring contractors in Africa to aggregate summaries of books and considering buying the publisher Simon & Schuster, according to the Times. But the company's execs determined that it would take too long to negotiate licenses and reasoned that fair use was a solid defense. The filing Wednesday contains new accusations, like that Meta might've tried to conceal its alleged infringement by stripping the LibGen data of attribution.

Mark Zuckerberg Gave Meta's Llama Team the OK To Train On Copyright Works, Filing Claims

Comments Filter:
  • The cancer is deep, and spreading!

    • Brevity is the soul of wit, but too much brevity may not lead anywhere? The story is about to fall off the front page of Slashdot and your potentially significant comment on the story didn't even get a funny mod for dark humor, let alone any follow up comments.

      In the form of a question, I think you should have made it clear in what way this cancerous behavior is new, different, or significant. From what I can understand of the situation, this is supposed to be public information and it isn't obvious who is

  • Shocked! (Score:4, Informative)

    by slipped_bit ( 2842229 ) on Thursday January 09, 2025 @05:24PM (#65076573) Homepage

    I'm shocked! Shocked, I tell you!

    Well, not that shocked.

    • Re:Shocked! (Score:4, Insightful)

      by 93 Escort Wagon ( 326346 ) on Thursday January 09, 2025 @10:42PM (#65077119)

      This has how Facebook has always behaved. It's the old principle "it's better to beg forgiveness than ask permission", but carried to ridiculous extremes. They historically have always broken both laws and norms... then, when they get caught, say "mea culpa" - but with the damage already done and not recoverable, as seems to be their intent.

      So, unfortunately, your joke/meme doesn't work with Facebook-related news simply because no one could possibly be shocked by their behavior after all this time.

      • Apparently breaking laws and norms is what it takes to get elected president of the US as well as start any successful business. It has become clear that laws are now just about crowd control rather than right and wrong.
        • Re: Shocked! (Score:5, Insightful)

          by nightflameauto ( 6607976 ) on Friday January 10, 2025 @08:52AM (#65077791)

          Apparently breaking laws and norms is what it takes to get elected president of the US as well as start any successful business. It has become clear that laws are now just about crowd control rather than right and wrong.

          Laws in America, and in fact the entire judicial system, is based on the concept of protecting the owner class. Always have been, always will be. When the rabble gets upset enough to threaten violence, they may feed one of their own into the system, but for the most part it's about keeping the facade wrapped up securely so the owners can continue to fleece the rest of us.

  • That proven orangeNoser [youtu.be] should be locked up.

  • Anyone infringing immorally-long copyright terms is doing the world a moral service.

    Even Zuck.

    Pirate on until copyright is a fit-for-purpose 5 years.
  • I have no idea what to do about the legal and moral problems. But I would greatly prefer an AI whose knowledge is not limited to non-copyrighted work.

  • At this point, the only logical thing to do is to abolish copyright entirely. If corporations don't have to follow it, why should anybody else? If anything, AI has proven that the romantic idea of a scientific/engineering/artistic genius was just an illusion, most creative work is easily automated. So why should it get special protections? Culture has existed before copyright was a thing, and will exist afterwards. Without IP reform, humanity will end up being slaves to megacorps that can ignore it and then

    • by Rinnon ( 1474161 ) on Thursday January 09, 2025 @06:51PM (#65076771)
      Abolishing copyright outright is throwing the baby out with the bathwater. I totally agree the current system is absurd, but no copyright isn't the answer. Without copyright, a record label, or publishing house, has no obligation to pay a musician or author when they sell their work. If they can get their hands on it, they can sell it and keep 100% of the profit. Basically, if you upload a song to the internet (anywhere, even your own server) YouTube can snag it, put it up, monetize it, and give you nothing. If you write a book, you'd be able to sell about 1 copy online before Amazon grabbed it, threw it up on Kindle, and kept any profits for itself. Unless your intention is to take any money currently going to musicians and authors and redirect it to the likes of Google and Amazon, a more nuanced solution is required.
      • by tlhIngan ( 30335 )

        Abolishing copyright outright is throwing the baby out with the bathwater. I totally agree the current system is absurd, but no copyright isn't the answer. Without copyright, a record label, or publishing house, has no obligation to pay a musician or author when they sell their work. If they can get their hands on it, they can sell it and keep 100% of the profit. Basically, if you upload a song to the internet (anywhere, even your own server) YouTube can snag it, put it up, monetize it, and give you nothing

      • Indeed. The problem is not the concept of copyright, IMO, but with what it has become.

        Take the duration and bring it back to well within an author's life time, as the implementation in the U.S constitution intended.

        This was crucial - as it encouraged authors to not just create, but to keep creating rather than just sitting on their asses for the rest of their life. It was also crucial because it meant the public domain got consistent, and regular additions from which anyone could take portions, pieces
  • plaintiffs in Kadrey v. Meta, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates

    And if someone figures out how to remove their stuff from the dataset, nothing of value would be lost ...

  • They trained on copyrighted books, so what? Is this any problem? If we want to infringe those works, we can get them from LibGen directly, faster, and more exact. LLaMA does not recreate those works unless you prompt with a passage, and that means you already have the text. But when you do chat, and put your data into the session, then LLaMA simply doesn't reproduce those works, instead it works on what the user requested. The users add new intent on top of what the model learns. It's transformative
  • If we humans are allowed to learn from copyrighted materials why would AI not be allowed to use the same texts, videos and audio to learn the same way?
    • Humans are allowed to learn from *authorized* copyrighted materials, i.e., you bought a book so you have an authorized copy. Meta is using *unauthorized* copyrighted materials, hence guilty of copyright infringement. But you would also be guilty of copyright infringement if you obtained an unauthorized copy of a copyrighted material.
      • Humans are allowed to learn from *authorized* copyrighted materials,

        The distinction seems kind of odd - wouldn't borrowing a book be legally allowed but not necessarily authorized - and a way for someone to learn from a work?

        • Assuming the entity you borrowed the book from had an authorized copy to begin with, the First Sale doctrine allows said entity to do anything they want with it: lend it, destroy it, put it under the short leg of a table, whatever. There is only ever one authorized copy in the current scenario.
  • Based on what I read in the complaint, this motion includes evidence from discovery around Meta's use of copyrighted material in training Llama. This evidence raises real questions about Meta's approach to training Llama, and it does not look good for Meta's fair use defense. Training an AI model could be transformative -- Llama abstracts patterns from data instead of just copying works, yes, but the evidence uncovered in discovery presents a much darker picture. It clearly shows practices that defy the p

  • from using the pirated content will far exceed the amount they will have to pay to settle the lawsuit.

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.

Working...