Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Courts AI Slashdot.org

Meta Hit With New Author Copyright Lawsuit Over AI Training (reuters.com) 47

Novelist Christopher Farnsworth has filed a class-action lawsuit (PDF) against Meta, accusing the company of using his and other authors' pirated books to train its Llama AI model. Farnsworth seeks damages and an order to stop the alleged copyright infringement, joining a growing group of creators suing tech companies over unauthorized AI training. Reuters reports: Farnsworth said in the lawsuit on Tuesday that Meta fed Llama, which powers its AI chatbots, thousands of pirated books to teach it how to respond to human prompts. Other authors including Ta-Nehisi Coates, former Arkansas governor Mike Huckabee and comedian Sarah Silverman have brought similar class-action claims against Meta in the same court over its alleged use of their books in AI training. [...] Several groups of copyright owners including writers, visual artists and music publishers have sued major tech companies over the unauthorized use of their work to train generative AI systems. The companies have argued that their AI training is protected by the copyright doctrine of fair use and that the lawsuits threaten the burgeoning AI industry.
This discussion has been archived. No new comments can be posted.

Meta Hit With New Author Copyright Lawsuit Over AI Training

Comments Filter:
  • They even admit it (Score:5, Insightful)

    by evanh ( 627108 ) on Wednesday October 02, 2024 @07:37PM (#64835867)

    If they can blatantly copy what they like then the average joe should be allowed to copy what we like too.

    • by evanh ( 627108 ) on Wednesday October 02, 2024 @07:39PM (#64835873)

      And my platform of choice - Usenet.

    • You pretty much can.
      What you cannot do is redistribute it.

      If you want to ban this usage you should also take away degrees from anyone who pirated their books for university

      • Re: (Score:2, Interesting)

        by evanh ( 627108 )

        An AI is redistribution.

        • by GigaplexNZ ( 1233886 ) on Wednesday October 02, 2024 @07:46PM (#64835893)
          Redistribution of knowledge, not of the copyrighted content. Designing a bridge after reading an engineering textbook doesn't make that bridge a redistribution of copyrighted material.
          • Re: (Score:1, Troll)

            by evanh ( 627108 )

            AI is just a delivery system, and yes, of copyrighted content too. Nothing more. Even if it was locked up for personal use only, it's still just a delivery system.

          • by evanh ( 627108 )

            AI is no different to a search engine.

          • LLMs don't design bridges, they create passages of text, and that text came from somewhere. People plagiarized in my high school. They'd copy, but then change the words so it didn't look copied. Rearrange sentences, paragraphs.

            LLM spitting out text on how to design a bridge after reading an engineering textbook is more like transcoding a Bluray movie into a low quality rip. The bits aren't the same, but the output is basically. Same with that engineering textbook. If the LLM is any accurate, it'll re-pres

            • by cstacy ( 534252 )

              LLMs don't design bridges, they create passages of text, and that text came from somewhere. People plagiarized in my high school.
              [...]
              If the LLM is any accurate, it'll re-present the information in it. Maybe in a different way, but if its accurate the meaning will not be changed from the source material.

              Plagiarism is not illegal. And you can't copyright ideas, nor facts.

              Occasionally we see that less slicing/dicing/blending has happened in the model than was expected. Getting the model to reproduce intact passages. Also, the training may have involved Copyright violations. (Whether they broke the law, got ahead of the law, or did something morally unacceptable, are ongoing questions.)

              The main problem with LLMs is that they are unreliable regurgitatators, because they "understand" nothing and do not "reason"

    • by chuckugly ( 2030942 ) on Wednesday October 02, 2024 @07:42PM (#64835883)

      I'm pretty sure it's OK to read whatever you want as long as you buy or legally borrow a copy of the book. In fact you can read that book all you want, and then if you bought it, you could even loan it to a friend. Websites that we open to all, also OK to read. I hope you spend your time wisely now that you know all this material is available to you. Good health to you.

    • You already do. Digital content is copied to your device for the purposes of display and copied into your memory for the purposes of consumption by you. The latter isn't verbatim, the former is transitive. AI works in a similar way.

      I suspect this will hinge on how Facebook got access to the books. If they actually pirated them they may be in trouble. If they bought them then this copyright lawsuit will go nowhere just like all the others have gone nowhere.

    • Using an AI to reconstruct the source material it was constructed on the basis of is a too much like hard work to justify the gold-digging of the authors and publishers. Copyright exist, according to the US constitution, to 'promote the Progress of Science and useful Arts'. This use of it is blocking that advance.

    • by gweihir ( 88907 )

      Where common people get hanged for minor things, the mighty can do as they please. Sad but often true.

  • Honeypot? (Score:3, Interesting)

    by Black Parrot ( 19622 ) on Wednesday October 02, 2024 @08:03PM (#64835925)

    I wonder if it would be useful to set up some honeypots that would make robots think they've hit on a motherload of text documents, but were in fact being fed a trove of machine-generated text that looked plausible if a human skimmed a small portion, but as a whole taught any LLM a bunch of nonsense.

    • by evanh ( 627108 )

      Heh, I bet Meta doesn't use its own troll farms.

    • Re:Honeypot? (Score:4, Insightful)

      by Miles_O'Toole ( 5152533 ) on Wednesday October 02, 2024 @09:19PM (#64836005)

      As far as I know, "poisoning the well" is by far the best widely-available means to fight back against data harvesters. We owe them nothing except our ill will.

    • by allo ( 1728082 )

      You don't need to do that. The internet is full of spam, bot farms, SEO websites and more. But the AI crawlers do a prefiltering for quality to avoid that. Avoiding spam targeted on AI would not even require new measures, it would just be detected as low quality text.

    • by cstacy ( 534252 )

      I wonder if it would be useful to set up some honeypots that would make robots think they've hit on a motherload of text documents[,,,] but as a whole taught any LLM a bunch of nonsense.

      This is referred to as "Reddit", I believe.

      How many countries in Africa have names that begin with the letter "K"?

    • I wonder if it would be useful to set up some honeypots that would make robots think they've hit on a motherload of text documents, but were in fact being fed a trove of machine-generated text that looked plausible if a human skimmed a small portion, but as a whole taught any LLM a bunch of nonsense.

      Point them at the Slashdot comments section.

  • Was it not META that was storing passwords without even MD5? Storing them in plain-text? How embarrassing in 2024. Not even SHA512.

  • And Meta is probably the most close to OpenSource of the big models out there, why just sue them? Does the lawsuit split all damages evenly among all authors of the world? Didn't think so https://www.genolve.com/design... [genolve.com]
  • by nicubunu ( 242346 ) on Thursday October 03, 2024 @12:40AM (#64836161) Homepage

    So if I train myself using some pirated books, should I expect to be sued too?

    • ... pirated books ...

      Yes, because you knew they were the fruit of a crime. If anything raises suspicion, always claim ignorance: Since the hypothetical books were in your possession, you know something about them: The "I know nothing, nothing!" defense is inadmissible. You need an answer that shows you were an innocent by-stander.

    • Not because you read the copy, no. Copyright is literally about the right to copy. Itâ(TM)s right there in the name. If you made an unauthorized copy, then and only then would you have committed copyright infringement. Whether you read the copy, or print it and put it under the short leg of your table, or burn it is irrelevant.

      If Meta is found guilty of copyright infringement, it will be solely because they made an unauthorized copy. That they subsequently used it to train their AI is irrelevant.

  • So, if a bunch of humans decide to voraciously consume books from a local library or bookstore, it’s perfectly acceptable use for the author. But if “AI” reads the books, it’s considered..theft? Is AI really reading that wrongly here? If a single human buys and reads 100 books out loud to homeless people, are the homeless going to be charged for stealing too?

    Yeah. No shit I’m not a lawyer. Still barely makes sense, I thought authors actually wanted people reading their st

    • If a single human buys and reads 100 books out loud to homeless people, are the homeless going to be charged for stealing too?

      The homeless people, no. But - by the copyright laws - the person who bought and read the books aloud can be sued, yes. Even more so if this person profits from the public reading.

      Was AI some kind of “unauthorized” user and a book has an implied single use license, good only for one customer?

      There are all kinds of licenses. But generally speaking, yes. Most books are single-user licensed.

      Tell me how we enable future audiobooks without absolutely needing a human reading it.

      This one is pretty obvious. Audiobook publishers have to buy books rights in order to publish them. And audiobooks are also licensed, just like printed books.

      I thought authors actually wanted people reading their stuff.

      Of course they do. But just like any other professional, they would prefer i

      • If a single human buys and reads 100 books out loud to homeless people, are the homeless going to be charged for stealing too?

        The homeless people, no. But - by the copyright laws - the person who bought and read the books aloud can be sued, yes. Even more so if this person profits from the public reading.

        So the volunteer at the retirement home who loves to read aloud their books to those who enjoy listening, should be concerned about being sued? What’s next, FBI random raids at book clubs and coffee shops looking for plot pimps and cracking down on those illegal spoiler alerts?

        Was AI some kind of “unauthorized” user and a book has an implied single use license, good only for one customer?

        There are all kinds of licenses. But generally speaking, yes. Most books are single-user licensed.

        A concept that sounds absolutely stupid when it comes to the printed book. If we allowed copyright clowns to preach their gospel as far and wide as they wanted, books would not merely be single-user licensed. They would be si

      • by cstacy ( 534252 )

        But - by the copyright laws - the person who bought and read the books aloud can be sued, yes.

        This of course is not true. Copyright law is much more complicated than you are suggesting.

      • But generally speaking, yes. Most books are single-user licensed.

        FALSE.

        The concept of "copyright exhaustion" applies under the "doctrine of first sale": essentially a copyright holders right to control a particular legally produced copy of a work is exhausted after the first sale of the item. This is why it is legal to give, loan, or resell a copy of a book, musical recording, or video performance. Note that this does not allow for making additional copies of that work, but applies solely to the disposition of the existing copy of that work.

        • Actually, my bad. You're right. This applies especially to printed books. E-books, on the other hand, can be a bit different.

    • by cstacy ( 534252 )

      So, if a bunch of humans decide to voraciously consume books from a local library or bookstore, it’s perfectly acceptable use for the author. But if “AI” reads the books, it’s considered..theft?

      I’m not a lawyer. Still barely makes sense.

      The laws (such as Copyright) embody and are used to execute our morality. Thoes laws did not anticipate LLMs, and people feel morally violated. A machine reading all the books is not the same as humans reading books. It is not the action that matters, nor the fact that humans are a kind of machine. It is the morality of the totality of the circumstances and what it means for human rights and society. Right now there is outrage.

    • Your memory must be a lot better than mine. I'll remember the general content of a book, but won't remember it word for word. I can't write a sentence directly from that book if someone asks me the right question - at least not without going to that book and finding it in the book for them. LLMs don't think back about what they read and give you its understanding of historical context. LLMs keep a full copy of all of that text and regurgitate parts of it. They might mix and match parts of sentences, but the

  • The US constitution gives the Federal government the right to offer copyright protection 'To promote the Progress of Science and useful Arts'. Blocking the availability of text online for the advance of AI is thus an abuse of the power. Authors and Publishers are trying it on to get an extra source of revenue that they had no reason to expect when they created / published the works. The courts need to strike down this gold-digging.

    • The US constitution gives the Federal government the right to offer copyright protection 'To promote the Progress of Science and useful Arts'. Blocking the availability of text online for the advance of AI is thus an abuse of the power

      By that logic, should I also be allowed to pirate, print, and distribute any educational text? Distributing knowledge would very likely promote progress. In fact, you could probably justify piracy of just about any media: consumption of a wide range of content would likely lead to inspiration and progress in Useful Arts.

      It doesn't make much sense to read the constitutional clause granting Congress the right to issue copyright as saying that copyright is invalid in the vast majority of cases.

      • The basic business model of authors and publishers of educational texts is to 'To promote the Progress of Science and useful Arts' by creating and selling those texts; for them to have them stolen by copying is to destroy their ability ''To promote the Progress of Science and useful Arts'. By contrast whether a text is used for input to an LLM AI is not destroying that ability, so should not be protected by copyright unless the LLM can be enticed into easily regurgitating large amounts of the text - above t

  • It seems like most people think AI is just a search engine. They think it just stores a copy of the book in memory and can reference it all later. However, the entire memory footprint of models like this are in the range of 1-200GB. So the authors seem to think their boos are so important that with the entire internet of data their shitty book would be preserved in that 100GB of weights.
  • You can just post a notice on your FB page that says you do not consent to this. But hurry, the deadline for posting this is coming up soon!

  • It probably is fair use to train AI on legally obtained books. If it weren't, gleaning any sort of knowledge by reading would be illegal. Training AI on pirated book, however, is probably not legit. The training part is fine, the "pirated content" part isn't.

The best defense against logic is ignorance.

Working...