Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Government AI United States

US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works' (yahoo.com) 86

Business Insider tells the story in three bullet points:

- Big Tech companies depend on content made by others to train their AI models.

- Some of those creators say using their work to train AI is copyright infringement.

- The U.S. Copyright Office just published a report that indicates it may agree.

The office released on Friday its latest in a series of reports exploring copyright laws and artificial intelligence. The report addresses whether the copyrighted content AI companies use to train their AI models qualifies under the fair use doctrine. AI companies are probably not going to like what they read...

AI execs argue they haven't violated copyright laws because the training falls under fair use. According to the U.S. Copyright Office's new report, however, it's not that simple. "Although it is not possible to prejudge the result in any particular case, precedent supports the following general observations," the office said. "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs — all of which can affect the market."

The office made a distinction between AI models for research and commercial AI models. "When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs are unlikely to substitute for expressive works used in training," the office said. "But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

The report says outputs "substantially similar to copyrighted works in the dataset" are less likely to be considered transformative than when the purpose "is to deploy it for research, or in a closed system that constrains it to a non-substitutive task."

"A day after the office released the report, President Donald Trump fired its director, Shira Perlmutter, a spokesperson told Business Insider."

US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works'

Comments Filter:
  • Effective enforcement will not be easy.

    The other problem that the spiders do is to overload servers. They do not seem to be gentle in the way that most search engine spiders are.

    • by buck-yar ( 164658 ) on Monday May 12, 2025 @05:28AM (#65369907)
      Sure it is. Encourage whistleblowers to come forward. "I did this" from a software engineer in court. There might even be hard evidence in communications, as some of these employees knew what they were doing was wrong and raised objections.

      Penalties, per the FBI warning on home VHS tapes, 5 years, $250,000 fine, and felony so you lose your voting and gun rights for life. If its per violation, some AI execs might be looking at hundreds of years in prison and fines exceeding the value of these large tech companies market capital.

      • Re: (Score:3, Interesting)

        by gweihir ( 88907 )

        Indeed. Also, under some conditions, LLMs can be made to regurgitate parts of their training data....
        The second thing is, you must delete all the stolen content. And that means the whole LLM in this case.

      • Suchir tried that.

    • by AmiMoJo ( 196126 )

      Effective enforcement is easy. Just make some severe penalties for doing it, and crucially for using AI that has been trained on unlicensed material. Then any business that wants to sell AI services will need to certify that they trained it legally, because their customers will demand it for fear of being hit by penalties themselves.

      The same rules will apply to foreign made AIs of course.

      The EU does that and it has proven successful with things like GDPR.

    • And it certainly won't stop Chinese spiders... So even though it seems right and a win for content providers to stop the spiders, it will just give China a huge advantage.
  • Repeat after me (Score:5, Insightful)

    by blahabl ( 7651114 ) on Monday May 12, 2025 @03:45AM (#65369783)
    Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.
    • A well-made factual point.

      So of course it is downvoted.
      Any mod who did so should be banned immediately.
      • Agreed. The entire moderation system on this platform is screwed.
        • This place is generally screwed. It doesn't work, it isn't well managed, you can't block posters, it allows anonymous posts and you can't block those either. Broken by design.

          • by Zak3056 ( 69287 )

            you can't block posters, it allows anonymous posts and you can't block those either.

            I mean, you can't "block" anything, but you effectively can. Change your comment modifiers [slashdot.org] and set ACs to -6, set "foes" to -6, mark commenters you don't like as foes, and browse at 0 or above. You will no longer see AC comments or comments from people you don't like.

            • I mean, you can't "block" anything, but you effectively can.

              You can't but you can? Take two steps forward, and two steps back, and then two steps forward, and two steps back, and now we're doing the cha-cha.

              I want their content to disappear as if it never was, whether it was modded up or not. That would be blocking, unlike what you propose.

              • by Zak3056 ( 69287 )

                Take two steps forward, and two steps back, and then two steps forward, and two steps back, and now we're doing the cha-cha.

                Thank you, Chris Knight.

                I want their content to disappear as if it never was, whether it was modded up or not. That would be blocking, unlike what you propose.

                Yes, I understand--that is not something you can do. You can, however, effectively achieve this by following the instructions given.
                You know, the adverb form of "effect." Specifically, definition 2, "in effect; virtually" which rests upon definition 4 of the root word, "the power to bring about a result." So, to answer your snarky question, yes, in effect you can't but you can.

                Is your brain not firing on all cylinders this morning, or are you just purposefully being obtuse?

      • by msauve ( 701917 )
        >A well-made factual point.

        It is not factual. In the US, copyright is meant to benefit both the author and mankind: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."

        It (deliberately?) ignored that in order to encourage authorship and still benefit society, copyrights are granted for limited terms, after which the works become public domain. Let the AIs freely train on all the
        • You misread it.

          It parses as "To promote A, by securing (for limited times) to B thing C"
          where
          A = The Progress of Science and useful Arts
          B = Authors and Inventors
          C = the exclusive Right to their respective Writings and Discoveries

          i.e. The law is to promote A.
          It does that by securing C for B.

          i.e. The law is to promote Progress of Science and useful Arts.
          It does that by securing the exclusive Right to their respective Writings and Discoveries for Authors and Invento
          • by msauve ( 701917 )
            LOL. You're arguing that granting limited exclusive rights is "not for the benefit of content creators." You fail.
      • by dfghjk ( 711126 )

        If objective fairness were the policy you would have been gone long ago.

    • by reanjr ( 588767 ) on Monday May 12, 2025 @05:15AM (#65369893) Homepage

      It may be a "valid" response, but it's hardly compelling. Your argument is that copyright should be completely upended and author protections should simply vanish into an LLM. Yo make that argument, you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind.

      • A moderate amount of compensation and protection wouldn't be bad. However, the current "annuity for my great-great-grandchildren" is anything but. It is the product of lobbying by the biggest corporations, does nothing for authors and artists, and only ensures rentseeking is profitable while stifling innovation and new arts. Sampling in hip hop being a case in point, but there are hundreds of examples if you search a bit.

        Rent seeking is a problem because it eventually paralizes any economy. And it is expeci

      • It's not "completely upended". It's currently perfectly legal for a human to read all the copyrighted works they want, learn from them, and produce their own works based on what they've read. It's not immediately obvious that a machine doing the same thing is contrary to the principles of copyright. Of course in either case if the output is too closely based on a small set of works then it's an infringement, but that doesn't mean that the training itself is.
        That's not so say that there isn't an issue of cou

      • by dfghjk ( 711126 )

        Did "author protections" get "upended" when you read a book? How is it different when it's an LLM doing the reading?

        Your response is neither valid nor compelling.

        "..you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind."

        No, all you need to do is point out the hypocrisy of your position. You believe the benefit is for you, not for them. It's pulling up the ladder, nothing more.

        • An LLM is different than an individual because said individual cannot read entire libraries in minutes, and cannot programmatically regurgitate those works en-masse.

          Starting from first principles here is good, it shows how a well-read author is distinct from wholesale monetized copyright infringement.
    • Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.

      A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.

      A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem righ

      • by dfghjk ( 711126 )

        Because you see everything as zero sum. AI companies could win on merit, we don't have to accept your false choice.

      • by tlhIngan ( 30335 )

        Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.

        A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.

        A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem right,

    • Re:Repeat after me (Score:4, Interesting)

      by Entrope ( 68843 ) on Monday May 12, 2025 @06:13AM (#65369931) Homepage

      That's not an argument, that's just emoting.

      How is that different in substance from "maybe copyright infringement should be allowed because I don't want to pay for a newspaper subscription. that would benefit mankind, i.e. me."?

      • by dfghjk ( 711126 )

        It's different because the argument is that you get access to the articles for free, but the AI company doesn't. Straw manning the argument is not a win, data being scraped for training isn't behind a "newspaper subscription", and if it were, AI use of the articles should be fine, by your standard, if the company paid the subscription.

        • by Entrope ( 68843 )

          Copyright laws and licenses govern more than just what somebody pays for the copyrighted work. Your argument is based on ignoring violations of licenses, all the secondary copying that goes on during training, and the propensity of AI models to regurgitate training material.

    • Re:Repeat after me (Score:4, Insightful)

      by gweihir ( 88907 ) on Monday May 12, 2025 @06:28AM (#65369947)

      Well, maybe it should if the resulting models are available and also under fair-use. Most are not, hence criminal commercial copyright infringement.

      • by dfghjk ( 711126 )

        You think "fair use" is a license? LOL

        "Most are not" LOLOL says who? You think AI publishers get to decide whether you get "fair use".

        You never fail to impress with the stupid.

        • by gweihir ( 88907 )

          How pathetic. You do not even have basic reading comprehension. Dumb and aggressive. Nice!

    • by msauve ( 701917 )
      I see your argument. Ownership of any sort of property is not a natural right. So someone with a bigger gun taking your stuff is very valid.
      • by znrt ( 2424692 )

        there are no natural rights, someone with a gun can make a very valid point about your right to life too.

    • Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.

      Copyright is being abused and it's no longer about encouraging innovation. If concentrates wealth and is a key driver of inequality. It has been extended and extended. It went from a reasonable 7 years to 14 years renewable to 28, then life of the author plus 50 to life plus 70 years.

      It was extended to buildings which is completely ridiculous. They industry tried but failed to extend it to clothing. There is absolutely no reason why an AI shouldn't read a work because it doesn't compete by selling a copy of

    • Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators

      Absolutely agree. Copyright's purpose is to give creative people with good ideas an incentive to realise or at least flesh out those ideas. It's purpose is not to give people money for nothing - but of course people will never stop trying to find something that gives them exactly that.

      What I very much dispute, however, is that so-called AI is a benefit to mankind. As I see it's it's not only not beneficial but very dangerous. It has potential to bring great harm to mankind

      Let's start by assuming that just t

  • "A day after the office released the report, President Donald Trump fired its director, Shira Perlmutter,

    Which mean the report could soon flip 180.

  • by greytree ( 7124971 ) on Monday May 12, 2025 @03:51AM (#65369793)
    People of the world to the US Copyright office:

    95 years is a fucking abomination.

    Copyright is not fit for its purpose of encouraging creativity.
  • Intersting take (Score:5, Interesting)

    by thegarbz ( 1787294 ) on Monday May 12, 2025 @04:07AM (#65369821)

    Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use. That said I'm right there with them when it comes to illegal access.

    How much did grandma have to pay for downloading one mp3? I hope Meta pays the same amount multiplied by all the works they pirated.

    • by pjt33 ( 739471 )

      Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use.

      I've only read the summary, but I'm not seeing anything related to speed or volume of production. Am I overlooking something?

      • by Entrope ( 68843 )

        No, you're not. The assertion you quoted to is entirely false; it is a straw man to stand in for the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement of a very traditional character.

        • Except it's not, think about it. If you remove the word "troves" from the quote, then the entire argument is basically completely at odds with decades of established case law saying such work would be permitted under copyright rules. You can use materials to inspire new works and sell those works competing with the original. That is something that has been permitted since the beginning, it is the basis of fair use.

          If the transformative aspect is the same, and the commercial aspect is the same, then there's

          • Thank you very much.
            Fair use is you reading a book and maybe even applying knowledge you gleaned.
            The vast industrial scale of harvesting the web is the defining difference between you reading a book and Big AI slurping it up as training data.
          • by Entrope ( 68843 )

            If you remove the "troves" bit from the quote, then the argument becomes:

            But making commercial use of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.

            This is still trivially true under copyright law. If you want to point to supposed decades of established case law, then do so. Argument by unbacked assertion is a fallacy. Inspiration is separate from "fair use" -- and there are decades of litigation over where the boundaries are for characters or plots "inspired by" copyrighted material.

        • by dfghjk ( 711126 )

          IT IS LITERALLY IN THE TITLE! "Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Data'."

          What the fuck do you think that means, you moron?

          "...the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement..."

          You literally just admitted that the quote was "entirely true". It's not fair use because it's "large-scale", literally a judgement made on "volume of production". You are an idiot.

          • by Entrope ( 68843 )

            The title means that infringing copyright involving "vast troves" doesn't make it stop being an infringement of copyright. The logic is right in your second quote: AI companies are engaging in large-scale copyright infringement, not small-scale infringement. It's the core of their business model, not a side line.

      • Re:Intersting take (Score:4, Interesting)

        by buck-yar ( 164658 ) on Monday May 12, 2025 @06:29AM (#65369949)
        Volume was mentioned in court against mp3.com.

        Sep 7, 2000 A federal judge Wednesday ordered MP3.com to pay as much as $250 million to Universal Music Group for violating the record company's copyrights by making thousands of CDs available for listening over the Internet.

        U.S. District Judge Jed S. Rakoff punished the online music-sharing service at $25,000 per CD, saying it was necessary to send a message to Internet companies.

        Universal Music Group, the world's largest record company, had urged a stiff penalty in a case closely watched by Napster and other businesses that share music or other copyrighted material over the Internet.

        The judge said some Internet companies "may have a misconception that, because their technology is somewhat novel, they are somehow immune from the ordinary applications of laws of the United States, including copyright law."

        He added: "They need to understand that the law's domain knows no such limits."

        MP3.com said it will appeal. The company had argued that a penalty of any more than $500 per CD would be a virtual "death sentence."

        Shares of MP3.com were halted before the decision; the most recent trade was at $7.88 per share, down 68.8 cents on the Nasdaq Stock Market. https://www.utdailybeacon.com/... [utdailybeacon.com]

        Imagine $25k per infringement against Meta? Neither can I. They probably lobbied and had the law changed. Or the judge doesn't want to crash the stock market (who doesn't hold meta stock directly or indirectly).

      • Yeah not directly, but it was implicit in their decision. The world has a long LONG standing precedent that if work is transformative it is permitted for fair use, even if the result competes with the original. The only thing new here is someone claiming it shouldn't be the case because of "troves" of data being used.

        Volume is the whole basis for the argument here. You can read a book and write a book with similar story elements and styling selling it to compete agains the one which inspired yours and it wo

        • by pjt33 ( 739471 )

          The "troves" of data being used is volume of consumption, not volume of production, and volume of consumption has always been a factor in fair use and similar concepts. One rule of thumb is that if you're copying more than 5% of the original work, that weighs against the use being fair. (And, anticipating one common argument, legally the training process copies 100% of all the works used, even if less than that ends up encoded directly in the weights).

    • by dfghjk ( 711126 )

      Definitely, one of the most obvious flaws here is the "two rights make a wrong" logic. It's a shitty take on fair use but it serves the interest of the Office.

      At risk of sounding like a victim was raped because she asked for it, here a copyright holder's rights were violated, according to the Office, because they asked for it. You have to realize that, by the standards set forth in the document, copyright infringement doesn't occur when the material is consumed, it occurs before that. It occurs when a by

      • > But this data exists expressly for that purpose and for NO OTHER purpose.
        LIke.. the data here is content on a website?

        I think you just defined the difference between consensual sex and gang rape.
  • by el84 ( 10322963 ) on Monday May 12, 2025 @04:09AM (#65369823)
    While it is patently immoral for big internet companies to basically steal the entirety of human creative output, in order to train their stupid (so-called) ai models and fill their future pockets, if we don't do it then bad guys with absolutely no morals in far flung parts of the world will do it anyway, OK we can have the smug sense that we have done the right thing when we are the penniless vassals of our current global technology competitors, who will remain unnamed out of courtesy.
    • by martin-boundary ( 547041 ) on Monday May 12, 2025 @04:26AM (#65369835)
      While it's immoral to [steal from|rape|torture|slander|kill|enslave] my neighbours, there are people out there, somewhere, who are absolutely willing to [steal from|rape|torture|slander|kill|enslave] my neighbours. I can be smug about not doing it to them myself, but it's just a matter of time until they become victims, so it's really ok if I also [steal from|rape|torture|slander|kill|enslave] my neighbours. Besides, I'm bored thinking about implications.
      • It's also immoral not to free trade with them, so as long as the raping is economically efficient you should rape too, otherwise getting outcompeted is only just.

        Free trade, raping all boats.

      • by dfghjk ( 711126 )

        The rule of law cannot possibly apply to everyone, so it should apply to no one. Except when I'm in power, then what I say goes. And I'll gain that power by [steal from|rape|torture|slander|kill|enslave] my neighbours.

      • You're not wrong. Just because you stick to some sort of moral code or rule of law doesn't mean someone else will.

        So maybe we should apply the rules of law that make our society livable, and just use the AI's coming out of those other place that have no such rules. Then presumably their society implodes under the natural pressures of harvesting the knowledge of their fine peoples and renting it back to them, and we retain a human scale livable society?

        Like your father said to you: Just because your friends
    • No, they wouldn't. The amount of money spent on this shit only makes sense if you're going to widely commercially target the U.S. and the wider West. Bad actors would never be able to come to market.

  • In the report: "Commenters cited several examples of AI tools trained on licensed or public domain content, such as Adobe’s Firefly (an image generator), Boomy (a music generator), Getty Images’ AI image generator, and Stability AI’s Stable Audio (a music generator)."

    Often only the infringers get mentioned.

  • Only to get ridiculed by some AI fanboi assholes, with deranged claims about "learning" and other ludicrous claims. Nice to see the actual experts recognize the problem as well. Take that, AI morons.

    • The Librarian of Congress is an "expert" on AI now? It's quite early, do you always smoke for breakfast? And where can I get that good stuff from.

    • by dfghjk ( 711126 )

      This is a shitty, ignorant take made clearly without reading any of the document. Or worse, maybe you did read it and this is what you came up with.

  • First, the Copyright Office is not part of the judicial branch. They can voice their opinion, but they are not empowered to say what the law is.

    Second, there is clearly a biased narrative at work here. If you look at the very start of the infringement discussion (page 26, Section A) the very first thing you see allegations of "right of reproduction", with the Office saying "commenters agreed with or did not dispute that copying during the acquisition and curation process implicates the reproduction right"

  • This looks like an alarming extension of copyright overreach if such restrictions are applied to AI. AI reads content (which may be copyrighted, as this post you are reading is, as nearly everything on the Internet is) and learns from it, and that's how it can process a book and provide a summary within a few minutes.

    If this were an infringement of copyright, basically any form of human learning would also be. Just reviewing a book, a game, anything copyrighted could be constructed as infringement and prose

    • by dfghjk ( 711126 )

      Great comments.

      "If an AI generates text that is substantially a copy of a copyrighted training input, that's a copyright breach; but AIs can be trained to avoid this, just like people can - learn the concept, avoid copying the form."

      This is the most important point. Infringement occurs when an AI vomits up sufficiently large portions of a copyrighted work. AI's must be developed to avoid this, as this is what we require of people as well. You can read a book and you can have a photographic memory, you ca

  • Just a whole fuck load of pro-AI astrotufers circle-jerking in here.

"The only way for a reporter to look at a politician is down." -- H.L. Mencken

Working...