Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Government AI United States

US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works' (yahoo.com) 121

Business Insider tells the story in three bullet points:

- Big Tech companies depend on content made by others to train their AI models.

- Some of those creators say using their work to train AI is copyright infringement.

- The U.S. Copyright Office just published a report that indicates it may agree.

The office released on Friday its latest in a series of reports exploring copyright laws and artificial intelligence. The report addresses whether the copyrighted content AI companies use to train their AI models qualifies under the fair use doctrine. AI companies are probably not going to like what they read...

AI execs argue they haven't violated copyright laws because the training falls under fair use. According to the U.S. Copyright Office's new report, however, it's not that simple. "Although it is not possible to prejudge the result in any particular case, precedent supports the following general observations," the office said. "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs — all of which can affect the market."

The office made a distinction between AI models for research and commercial AI models. "When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs are unlikely to substitute for expressive works used in training," the office said. "But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

The report says outputs "substantially similar to copyrighted works in the dataset" are less likely to be considered transformative than when the purpose "is to deploy it for research, or in a closed system that constrains it to a non-substitutive task."

"A day after the office released the report, President Donald Trump fired its director, Shira Perlmutter, a spokesperson told Business Insider."

US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works'

Comments Filter:
  • Effective enforcement will not be easy.

    The other problem that the spiders do is to overload servers. They do not seem to be gentle in the way that most search engine spiders are.

  • Repeat after me (Score:2, Insightful)

    by blahabl ( 7651114 )
    Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.
    • by reanjr ( 588767 ) on Monday May 12, 2025 @05:15AM (#65369893) Homepage

      It may be a "valid" response, but it's hardly compelling. Your argument is that copyright should be completely upended and author protections should simply vanish into an LLM. Yo make that argument, you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind.

      • A moderate amount of compensation and protection wouldn't be bad. However, the current "annuity for my great-great-grandchildren" is anything but. It is the product of lobbying by the biggest corporations, does nothing for authors and artists, and only ensures rentseeking is profitable while stifling innovation and new arts. Sampling in hip hop being a case in point, but there are hundreds of examples if you search a bit.

        Rent seeking is a problem because it eventually paralizes any economy. And it is expeci

      • It's not "completely upended". It's currently perfectly legal for a human to read all the copyrighted works they want, learn from them, and produce their own works based on what they've read. It's not immediately obvious that a machine doing the same thing is contrary to the principles of copyright. Of course in either case if the output is too closely based on a small set of works then it's an infringement, but that doesn't mean that the training itself is.
        That's not so say that there isn't an issue of cou

        • in either case if the output is too closely based on a small set of works then it's an infringement,

          These bots are getting better at guardrails for speech and not saying things that are factually false. Like if you ask a newer LLM how many Rs in strawberry, it will produce a convincing but wrong answer but then correct itself mid-output.

          Like a human, there should be "self-awareness" when there are violations and avoidance should be programmed in. But also like a human, a human employee that does the work for hire and unintentionally copies something - the hiring company is allowed to be sued rather than

      • by dfghjk ( 711126 )

        Did "author protections" get "upended" when you read a book? How is it different when it's an LLM doing the reading?

        Your response is neither valid nor compelling.

        "..you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind."

        No, all you need to do is point out the hypocrisy of your position. You believe the benefit is for you, not for them. It's pulling up the ladder, nothing more.

        • An LLM is different than an individual because said individual cannot read entire libraries in minutes, and cannot programmatically regurgitate those works en-masse.

          Starting from first principles here is good, it shows how a well-read author is distinct from wholesale monetized copyright infringement.
          • Great examples. Here’s another: the individual doesn’t have hundreds of millions of users issuing billions of queries like “write me a short story in the style of Author X because I want to read one but don’t want to buy a copy.”

            Or “write me a summary of Topic T using all the info you learned from Published Books X Y and Z, because I don’t want to buy a copy.”

            • Or “write me a thesis of the current thought around Theory T by summarizing all the info you learned from Published Theses X, Y and Z, because I don’t want to buy a copy.”

              Which is how science works.
      • I had a thought while reading what you wrote:

        you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind.

        The thought was this: Why do artists think they deserve compensation? If someone had asked for the art, then the person who asked should have paid or why would the artist do such a thing and expect money?

        I only responded because you said, "let's go back to First Principles"

    • by PseudoThink ( 576121 ) on Monday May 12, 2025 @05:20AM (#65369901)

      Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.

      A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.

      A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem right, either.

      • by tlhIngan ( 30335 )

        Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.

        A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.

        A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem right,

        • So instead of life+70, the copyright term can be reduced to something reasonable

          I think that I would accept life+70 or 30 years from the point that a work is commercially successful, whichever is smaller. It's complex, but reasonable. However, with strengthened trademark laws regarding use of public domain works.

          This would prevent someone else from profiting off your work before you can. If you suddenly find fame and success after decades of hard work, it wouldn't be right before the clock runs out.

          So Disney would still be the primary Mickey owner even if someone else can give away o

    • Re:Repeat after me (Score:4, Interesting)

      by Entrope ( 68843 ) on Monday May 12, 2025 @06:13AM (#65369931) Homepage

      That's not an argument, that's just emoting.

      How is that different in substance from "maybe copyright infringement should be allowed because I don't want to pay for a newspaper subscription. that would benefit mankind, i.e. me."?

      • by dfghjk ( 711126 )

        It's different because the argument is that you get access to the articles for free, but the AI company doesn't. Straw manning the argument is not a win, data being scraped for training isn't behind a "newspaper subscription", and if it were, AI use of the articles should be fine, by your standard, if the company paid the subscription.

        • by Entrope ( 68843 )

          Copyright laws and licenses govern more than just what somebody pays for the copyrighted work. Your argument is based on ignoring violations of licenses, all the secondary copying that goes on during training, and the propensity of AI models to regurgitate training material.

    • Re:Repeat after me (Score:4, Insightful)

      by gweihir ( 88907 ) on Monday May 12, 2025 @06:28AM (#65369947)

      Well, maybe it should if the resulting models are available and also under fair-use. Most are not, hence criminal commercial copyright infringement.

      • by dfghjk ( 711126 )

        You think "fair use" is a license? LOL

        "Most are not" LOLOL says who? You think AI publishers get to decide whether you get "fair use".

        You never fail to impress with the stupid.

        • by gweihir ( 88907 )

          How pathetic. You do not even have basic reading comprehension. Dumb and aggressive. Nice!

    • by msauve ( 701917 )
      I see your argument. Ownership of any sort of property is not a natural right. So someone with a bigger gun taking your stuff is very valid.
      • by znrt ( 2424692 )

        there are no natural rights, someone with a gun can make a very valid point about your right to life too.

    • Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.

      Copyright is being abused and it's no longer about encouraging innovation. If concentrates wealth and is a key driver of inequality. It has been extended and extended. It went from a reasonable 7 years to 14 years renewable to 28, then life of the author plus 50 to life plus 70 years.

      It was extended to buildings which is completely ridiculous. They industry tried but failed to extend it to clothing. There is absolutely no reason why an AI shouldn't read a work because it doesn't compete by selling a copy of

    • Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators

      Absolutely agree. Copyright's purpose is to give creative people with good ideas an incentive to realise or at least flesh out those ideas. It's purpose is not to give people money for nothing - but of course people will never stop trying to find something that gives them exactly that.

      What I very much dispute, however, is that so-called AI is a benefit to mankind. As I see it's it's not only not beneficial but very dangerous. It has potential to bring great harm to mankind

      Let's start by assuming that just t

    • I'm out of mod points, so I will just have to applaud this and link, once again, Thomas Babington Macaulay's 1841 speech to the House of Commons on this subject:

      https://www.thepublicdomain.org/2014/07/24/macaulay-on-copyright/ [thepublicdomain.org]
  • by greytree ( 7124971 ) on Monday May 12, 2025 @03:51AM (#65369793)
    People of the world to the US Copyright office:

    95 years is a fucking abomination.

    Copyright is not fit for its purpose of encouraging creativity.
  • Intersting take (Score:5, Interesting)

    by thegarbz ( 1787294 ) on Monday May 12, 2025 @04:07AM (#65369821)

    Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use. That said I'm right there with them when it comes to illegal access.

    How much did grandma have to pay for downloading one mp3? I hope Meta pays the same amount multiplied by all the works they pirated.

    • by pjt33 ( 739471 )

      Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use.

      I've only read the summary, but I'm not seeing anything related to speed or volume of production. Am I overlooking something?

      • by Entrope ( 68843 )

        No, you're not. The assertion you quoted to is entirely false; it is a straw man to stand in for the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement of a very traditional character.

        • Except it's not, think about it. If you remove the word "troves" from the quote, then the entire argument is basically completely at odds with decades of established case law saying such work would be permitted under copyright rules. You can use materials to inspire new works and sell those works competing with the original. That is something that has been permitted since the beginning, it is the basis of fair use.

          If the transformative aspect is the same, and the commercial aspect is the same, then there's

          • Thank you very much.
            Fair use is you reading a book and maybe even applying knowledge you gleaned.
            The vast industrial scale of harvesting the web is the defining difference between you reading a book and Big AI slurping it up as training data.
          • by Entrope ( 68843 )

            If you remove the "troves" bit from the quote, then the argument becomes:

            But making commercial use of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.

            This is still trivially true under copyright law. If you want to point to supposed decades of established case law, then do so. Argument by unbacked assertion is a fallacy. Inspiration is separate from "fair use" -- and there are decades of litigation over where the boundaries are for characters or plots "inspired by" copyrighted material.

        • by dfghjk ( 711126 )

          IT IS LITERALLY IN THE TITLE! "Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Data'."

          What the fuck do you think that means, you moron?

          "...the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement..."

          You literally just admitted that the quote was "entirely true". It's not fair use because it's "large-scale", literally a judgement made on "volume of production". You are an idiot.

          • by Entrope ( 68843 )

            The title means that infringing copyright involving "vast troves" doesn't make it stop being an infringement of copyright. The logic is right in your second quote: AI companies are engaging in large-scale copyright infringement, not small-scale infringement. It's the core of their business model, not a side line.

      • Re:Intersting take (Score:4, Interesting)

        by buck-yar ( 164658 ) on Monday May 12, 2025 @06:29AM (#65369949)
        Volume was mentioned in court against mp3.com.

        Sep 7, 2000 A federal judge Wednesday ordered MP3.com to pay as much as $250 million to Universal Music Group for violating the record company's copyrights by making thousands of CDs available for listening over the Internet.

        U.S. District Judge Jed S. Rakoff punished the online music-sharing service at $25,000 per CD, saying it was necessary to send a message to Internet companies.

        Universal Music Group, the world's largest record company, had urged a stiff penalty in a case closely watched by Napster and other businesses that share music or other copyrighted material over the Internet.

        The judge said some Internet companies "may have a misconception that, because their technology is somewhat novel, they are somehow immune from the ordinary applications of laws of the United States, including copyright law."

        He added: "They need to understand that the law's domain knows no such limits."

        MP3.com said it will appeal. The company had argued that a penalty of any more than $500 per CD would be a virtual "death sentence."

        Shares of MP3.com were halted before the decision; the most recent trade was at $7.88 per share, down 68.8 cents on the Nasdaq Stock Market. https://www.utdailybeacon.com/... [utdailybeacon.com]

        Imagine $25k per infringement against Meta? Neither can I. They probably lobbied and had the law changed. Or the judge doesn't want to crash the stock market (who doesn't hold meta stock directly or indirectly).

      • Yeah not directly, but it was implicit in their decision. The world has a long LONG standing precedent that if work is transformative it is permitted for fair use, even if the result competes with the original. The only thing new here is someone claiming it shouldn't be the case because of "troves" of data being used.

        Volume is the whole basis for the argument here. You can read a book and write a book with similar story elements and styling selling it to compete agains the one which inspired yours and it wo

        • by pjt33 ( 739471 )

          The "troves" of data being used is volume of consumption, not volume of production, and volume of consumption has always been a factor in fair use and similar concepts. One rule of thumb is that if you're copying more than 5% of the original work, that weighs against the use being fair. (And, anticipating one common argument, legally the training process copies 100% of all the works used, even if less than that ends up encoded directly in the weights).

        • Yeah except it doesn't actually and in fact generate works. It regurgitates bits of other people's work, often verbatim, and because those other people's works were taken without compensation, any model trained on said literally stolen data is violating copyright law.

          Again, this has nothing to do with volume, stolen data is stolen data whether you're stealing a small amount or large.

          Personally I hard agree that any model trained on stolen data should be made available for similar "fair use" access by
    • by dfghjk ( 711126 )

      Definitely, one of the most obvious flaws here is the "two rights make a wrong" logic. It's a shitty take on fair use but it serves the interest of the Office.

      At risk of sounding like a victim was raped because she asked for it, here a copyright holder's rights were violated, according to the Office, because they asked for it. You have to realize that, by the standards set forth in the document, copyright infringement doesn't occur when the material is consumed, it occurs before that. It occurs when a by

      • > But this data exists expressly for that purpose and for NO OTHER purpose.
        LIke.. the data here is content on a website?

        I think you just defined the difference between consensual sex and gang rape.
  • by el84 ( 10322963 ) on Monday May 12, 2025 @04:09AM (#65369823)
    While it is patently immoral for big internet companies to basically steal the entirety of human creative output, in order to train their stupid (so-called) ai models and fill their future pockets, if we don't do it then bad guys with absolutely no morals in far flung parts of the world will do it anyway, OK we can have the smug sense that we have done the right thing when we are the penniless vassals of our current global technology competitors, who will remain unnamed out of courtesy.
    • by martin-boundary ( 547041 ) on Monday May 12, 2025 @04:26AM (#65369835)
      While it's immoral to [steal from|rape|torture|slander|kill|enslave] my neighbours, there are people out there, somewhere, who are absolutely willing to [steal from|rape|torture|slander|kill|enslave] my neighbours. I can be smug about not doing it to them myself, but it's just a matter of time until they become victims, so it's really ok if I also [steal from|rape|torture|slander|kill|enslave] my neighbours. Besides, I'm bored thinking about implications.
      • It's also immoral not to free trade with them, so as long as the raping is economically efficient you should rape too, otherwise getting outcompeted is only just.

        Free trade, raping all boats.

      • by dfghjk ( 711126 )

        The rule of law cannot possibly apply to everyone, so it should apply to no one. Except when I'm in power, then what I say goes. And I'll gain that power by [steal from|rape|torture|slander|kill|enslave] my neighbours.

      • You're not wrong. Just because you stick to some sort of moral code or rule of law doesn't mean someone else will.

        So maybe we should apply the rules of law that make our society livable, and just use the AI's coming out of those other place that have no such rules. Then presumably their society implodes under the natural pressures of harvesting the knowledge of their fine peoples and renting it back to them, and we retain a human scale livable society?

        Like your father said to you: Just because your friends
    • No, they wouldn't. The amount of money spent on this shit only makes sense if you're going to widely commercially target the U.S. and the wider West. Bad actors would never be able to come to market.

    • OK we can have the smug sense that we have done the right thing when we are the penniless vassals

      You were going to be a penniless vassal one way or the other. What difference does it make how it happens?

  • In the report: "Commenters cited several examples of AI tools trained on licensed or public domain content, such as Adobe’s Firefly (an image generator), Boomy (a music generator), Getty Images’ AI image generator, and Stability AI’s Stable Audio (a music generator)."

    Often only the infringers get mentioned.

  • Only to get ridiculed by some AI fanboi assholes, with deranged claims about "learning" and other ludicrous claims. Nice to see the actual experts recognize the problem as well. Take that, AI morons.

    • The Librarian of Congress is an "expert" on AI now? It's quite early, do you always smoke for breakfast? And where can I get that good stuff from.

    • by dfghjk ( 711126 )

      This is a shitty, ignorant take made clearly without reading any of the document. Or worse, maybe you did read it and this is what you came up with.

  • First, the Copyright Office is not part of the judicial branch. They can voice their opinion, but they are not empowered to say what the law is.

    Second, there is clearly a biased narrative at work here. If you look at the very start of the infringement discussion (page 26, Section A) the very first thing you see allegations of "right of reproduction", with the Office saying "commenters agreed with or did not dispute that copying during the acquisition and curation process implicates the reproduction right"

  • This looks like an alarming extension of copyright overreach if such restrictions are applied to AI. AI reads content (which may be copyrighted, as this post you are reading is, as nearly everything on the Internet is) and learns from it, and that's how it can process a book and provide a summary within a few minutes.

    If this were an infringement of copyright, basically any form of human learning would also be. Just reviewing a book, a game, anything copyrighted could be constructed as infringement and prose

    • by dfghjk ( 711126 )

      Great comments.

      "If an AI generates text that is substantially a copy of a copyrighted training input, that's a copyright breach; but AIs can be trained to avoid this, just like people can - learn the concept, avoid copying the form."

      This is the most important point. Infringement occurs when an AI vomits up sufficiently large portions of a copyrighted work. AI's must be developed to avoid this, as this is what we require of people as well. You can read a book and you can have a photographic memory, you ca

  • ...reading books and using the knowledge commercially?
    None of the training data is copied

    • by ledow ( 319597 )

      It's not.

      If a human regurgitates vast portions of a copyright work - whether directly from memory or otherwise - and then sells it as part of a commercial service to other people, they will get in just as much trouble for being outside the bounds of fair-use.

      This isn't about "what an AI can do" vs "what a human can do", it's literally about "is the company's end usage covered under fair use", and wholesale regurgitation of source data (books) can be coaxed out of all LLMs *and* some of these companies are d

  • ... only outlaws will have AIs.

    Welcome to the William Gibson future.

No amount of careful planning will ever replace dumb luck.

Working...