Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Government AI United States

US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works' (yahoo.com) 180

Business Insider tells the story in three bullet points:

- Big Tech companies depend on content made by others to train their AI models.

- Some of those creators say using their work to train AI is copyright infringement.

- The U.S. Copyright Office just published a report that indicates it may agree.

The office released on Friday its latest in a series of reports exploring copyright laws and artificial intelligence. The report addresses whether the copyrighted content AI companies use to train their AI models qualifies under the fair use doctrine. AI companies are probably not going to like what they read...

AI execs argue they haven't violated copyright laws because the training falls under fair use. According to the U.S. Copyright Office's new report, however, it's not that simple. "Although it is not possible to prejudge the result in any particular case, precedent supports the following general observations," the office said. "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs — all of which can affect the market."

The office made a distinction between AI models for research and commercial AI models. "When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs are unlikely to substitute for expressive works used in training," the office said. "But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."

The report says outputs "substantially similar to copyrighted works in the dataset" are less likely to be considered transformative than when the purpose "is to deploy it for research, or in a closed system that constrains it to a non-substitutive task."

Business Insider adds that "A day after the office released the report, President Donald Trump fired its director, Shira Perlmutter, a spokesperson told Business Insider."

US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works'

Comments Filter:
  • Effective enforcement will not be easy.

    The other problem that the spiders do is to overload servers. They do not seem to be gentle in the way that most search engine spiders are.

    • by buck-yar ( 164658 ) on Monday May 12, 2025 @05:28AM (#65369907)
      Sure it is. Encourage whistleblowers to come forward. "I did this" from a software engineer in court. There might even be hard evidence in communications, as some of these employees knew what they were doing was wrong and raised objections.

      Penalties, per the FBI warning on home VHS tapes, 5 years, $250,000 fine, and felony so you lose your voting and gun rights for life. If its per violation, some AI execs might be looking at hundreds of years in prison and fines exceeding the value of these large tech companies market capital.

    • by AmiMoJo ( 196126 ) on Monday May 12, 2025 @07:25AM (#65370029) Homepage Journal

      Effective enforcement is easy. Just make some severe penalties for doing it, and crucially for using AI that has been trained on unlicensed material. Then any business that wants to sell AI services will need to certify that they trained it legally, because their customers will demand it for fear of being hit by penalties themselves.

      The same rules will apply to foreign made AIs of course.

      The EU does that and it has proven successful with things like GDPR.

    • And it certainly won't stop Chinese spiders... So even though it seems right and a win for content providers to stop the spiders, it will just give China a huge advantage.
    • by mysidia ( 191772 ) on Monday May 12, 2025 @11:06AM (#65370679)

      The other problem that the spiders do is to overload servers. They do not seem to be gentle

      There is also a technical issue that calls for a technical solution IMO.

      Most web servers are based off open source programs such as Apache or Nginx. My suggestion is that those projects should develop some mitigation against automated crawlers and spiders, and include a Default configuration that spanks them hard while still allowing well-known crawlers within reasonable limits.

      Essentially peoples' web service daemon should have functionality added to identify and classify crawlers and Penalize or Block those crawlers or IP addresses who are identified as having acted in certain shady manners, including:

      1. Crawlers who performed an excessive number of requests per second or per minute.

      2. Crawlers who fail to maintain a stable distinctive User-Agent string. Especially any crawlers spoofing a standard Browser UA string. Also Crawler IPs known to have taken on a different major provider's UA string such as a non-Google robot using a Googlebot UA.

      3. Crawlers who failed to request a robots.txt Or who disregarded a robots.txt by crawling a directory not listed as allowed in it (if present) or by crawling a directory listed as Disallowed.

      I'm suggesting that standard web server software get modules added to Detect these cases AND share IP address and User-Agent information with centralized trackers of crawlers that can be locally classified as violative.

      Finally, that they add functionality where certain DNSBL and shared Classification repositories can be used to automatically block various IP Addresses and UA strings detected by others as being nefarious or unruly crawlers.

  • Repeat after me (Score:5, Insightful)

    by blahabl ( 7651114 ) on Monday May 12, 2025 @03:45AM (#65369783)
    Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.
    • by reanjr ( 588767 ) on Monday May 12, 2025 @05:15AM (#65369893) Homepage

      It may be a "valid" response, but it's hardly compelling. Your argument is that copyright should be completely upended and author protections should simply vanish into an LLM. Yo make that argument, you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind.

      • A moderate amount of compensation and protection wouldn't be bad. However, the current "annuity for my great-great-grandchildren" is anything but. It is the product of lobbying by the biggest corporations, does nothing for authors and artists, and only ensures rentseeking is profitable while stifling innovation and new arts. Sampling in hip hop being a case in point, but there are hundreds of examples if you search a bit.

        Rent seeking is a problem because it eventually paralizes any economy. And it is expeci

      • It's not "completely upended". It's currently perfectly legal for a human to read all the copyrighted works they want, learn from them, and produce their own works based on what they've read. It's not immediately obvious that a machine doing the same thing is contrary to the principles of copyright. Of course in either case if the output is too closely based on a small set of works then it's an infringement, but that doesn't mean that the training itself is.
        That's not so say that there isn't an issue of cou

        • in either case if the output is too closely based on a small set of works then it's an infringement,

          These bots are getting better at guardrails for speech and not saying things that are factually false. Like if you ask a newer LLM how many Rs in strawberry, it will produce a convincing but wrong answer but then correct itself mid-output.

          Like a human, there should be "self-awareness" when there are violations and avoidance should be programmed in. But also like a human, a human employee that does the work for hire and unintentionally copies something - the hiring company is allowed to be sued rather than

      • by dfghjk ( 711126 )

        Did "author protections" get "upended" when you read a book? How is it different when it's an LLM doing the reading?

        Your response is neither valid nor compelling.

        "..you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind."

        No, all you need to do is point out the hypocrisy of your position. You believe the benefit is for you, not for them. It's pulling up the ladder, nothing more.

        • An LLM is different than an individual because said individual cannot read entire libraries in minutes, and cannot programmatically regurgitate those works en-masse.

          Starting from first principles here is good, it shows how a well-read author is distinct from wholesale monetized copyright infringement.
          • Great examples. Here’s another: the individual doesn’t have hundreds of millions of users issuing billions of queries like “write me a short story in the style of Author X because I want to read one but don’t want to buy a copy.”

            Or “write me a summary of Topic T using all the info you learned from Published Books X Y and Z, because I don’t want to buy a copy.”

            • Or “write me a thesis of the current thought around Theory T by summarizing all the info you learned from Published Theses X, Y and Z, because I don’t want to buy a copy.”

              Which is how science works.
      • I had a thought while reading what you wrote:

        you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind.

        The thought was this: Why do artists think they deserve compensation? If someone had asked for the art, then the person who asked should have paid or why would the artist do such a thing and expect money?

        I only responded because you said, "let's go back to First Principles"

    • by PseudoThink ( 576121 ) on Monday May 12, 2025 @05:20AM (#65369901)

      Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.

      A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.

      A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem right, either.

      • by tlhIngan ( 30335 )

        Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.

        A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.

        A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem right,

        • So instead of life+70, the copyright term can be reduced to something reasonable

          I think that I would accept life+70 or 30 years from the point that a work is commercially successful, whichever is smaller. It's complex, but reasonable. However, with strengthened trademark laws regarding use of public domain works.

          This would prevent someone else from profiting off your work before you can. If you suddenly find fame and success after decades of hard work, it wouldn't be right before the clock runs out.

          So Disney would still be the primary Mickey owner even if someone else can give away o

    • Re:Repeat after me (Score:4, Interesting)

      by Entrope ( 68843 ) on Monday May 12, 2025 @06:13AM (#65369931) Homepage

      That's not an argument, that's just emoting.

      How is that different in substance from "maybe copyright infringement should be allowed because I don't want to pay for a newspaper subscription. that would benefit mankind, i.e. me."?

      • by dfghjk ( 711126 )

        It's different because the argument is that you get access to the articles for free, but the AI company doesn't. Straw manning the argument is not a win, data being scraped for training isn't behind a "newspaper subscription", and if it were, AI use of the articles should be fine, by your standard, if the company paid the subscription.

        • by Entrope ( 68843 )

          Copyright laws and licenses govern more than just what somebody pays for the copyrighted work. Your argument is based on ignoring violations of licenses, all the secondary copying that goes on during training, and the propensity of AI models to regurgitate training material.

    • Re:Repeat after me (Score:4, Insightful)

      by gweihir ( 88907 ) on Monday May 12, 2025 @06:28AM (#65369947)

      Well, maybe it should if the resulting models are available and also under fair-use. Most are not, hence criminal commercial copyright infringement.

      • by dfghjk ( 711126 )

        You think "fair use" is a license? LOL

        "Most are not" LOLOL says who? You think AI publishers get to decide whether you get "fair use".

        You never fail to impress with the stupid.

        • by gweihir ( 88907 )

          How pathetic. You do not even have basic reading comprehension. Dumb and aggressive. Nice!

    • by msauve ( 701917 )
      I see your argument. Ownership of any sort of property is not a natural right. So someone with a bigger gun taking your stuff is very valid.
      • by znrt ( 2424692 )

        there are no natural rights, someone with a gun can make a very valid point about your right to life too.

    • Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.

      Copyright is being abused and it's no longer about encouraging innovation. If concentrates wealth and is a key driver of inequality. It has been extended and extended. It went from a reasonable 7 years to 14 years renewable to 28, then life of the author plus 50 to life plus 70 years.

      It was extended to buildings which is completely ridiculous. They industry tried but failed to extend it to clothing. There is absolutely no reason why an AI shouldn't read a work because it doesn't compete by selling a copy of

    • Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators

      Absolutely agree. Copyright's purpose is to give creative people with good ideas an incentive to realise or at least flesh out those ideas. It's purpose is not to give people money for nothing - but of course people will never stop trying to find something that gives them exactly that.

      What I very much dispute, however, is that so-called AI is a benefit to mankind. As I see it's it's not only not beneficial but very dangerous. It has potential to bring great harm to mankind

      Let's start by assuming that just t

    • I'm out of mod points, so I will just have to applaud this and link, once again, Thomas Babington Macaulay's 1841 speech to the House of Commons on this subject:

      https://www.thepublicdomain.org/2014/07/24/macaulay-on-copyright/ [thepublicdomain.org]
  • by greytree ( 7124971 ) on Monday May 12, 2025 @03:51AM (#65369793)
    People of the world to the US Copyright office:

    95 years is a fucking abomination.

    Copyright is not fit for its purpose of encouraging creativity.
  • Intersting take (Score:5, Interesting)

    by thegarbz ( 1787294 ) on Monday May 12, 2025 @04:07AM (#65369821)

    Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use. That said I'm right there with them when it comes to illegal access.

    How much did grandma have to pay for downloading one mp3? I hope Meta pays the same amount multiplied by all the works they pirated.

    • by pjt33 ( 739471 )

      Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use.

      I've only read the summary, but I'm not seeing anything related to speed or volume of production. Am I overlooking something?

      • by Entrope ( 68843 )

        No, you're not. The assertion you quoted to is entirely false; it is a straw man to stand in for the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement of a very traditional character.

        • Except it's not, think about it. If you remove the word "troves" from the quote, then the entire argument is basically completely at odds with decades of established case law saying such work would be permitted under copyright rules. You can use materials to inspire new works and sell those works competing with the original. That is something that has been permitted since the beginning, it is the basis of fair use.

          If the transformative aspect is the same, and the commercial aspect is the same, then there's

          • Thank you very much.
            Fair use is you reading a book and maybe even applying knowledge you gleaned.
            The vast industrial scale of harvesting the web is the defining difference between you reading a book and Big AI slurping it up as training data.
          • by Entrope ( 68843 )

            If you remove the "troves" bit from the quote, then the argument becomes:

            But making commercial use of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.

            This is still trivially true under copyright law. If you want to point to supposed decades of established case law, then do so. Argument by unbacked assertion is a fallacy. Inspiration is separate from "fair use" -- and there are decades of litigation over where the boundaries are for characters or plots "inspired by" copyrighted material.

        • by dfghjk ( 711126 )

          IT IS LITERALLY IN THE TITLE! "Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Data'."

          What the fuck do you think that means, you moron?

          "...the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement..."

          You literally just admitted that the quote was "entirely true". It's not fair use because it's "large-scale", literally a judgement made on "volume of production". You are an idiot.

          • by Entrope ( 68843 )

            The title means that infringing copyright involving "vast troves" doesn't make it stop being an infringement of copyright. The logic is right in your second quote: AI companies are engaging in large-scale copyright infringement, not small-scale infringement. It's the core of their business model, not a side line.

      • Re:Intersting take (Score:5, Interesting)

        by buck-yar ( 164658 ) on Monday May 12, 2025 @06:29AM (#65369949)
        Volume was mentioned in court against mp3.com.

        Sep 7, 2000 A federal judge Wednesday ordered MP3.com to pay as much as $250 million to Universal Music Group for violating the record company's copyrights by making thousands of CDs available for listening over the Internet.

        U.S. District Judge Jed S. Rakoff punished the online music-sharing service at $25,000 per CD, saying it was necessary to send a message to Internet companies.

        Universal Music Group, the world's largest record company, had urged a stiff penalty in a case closely watched by Napster and other businesses that share music or other copyrighted material over the Internet.

        The judge said some Internet companies "may have a misconception that, because their technology is somewhat novel, they are somehow immune from the ordinary applications of laws of the United States, including copyright law."

        He added: "They need to understand that the law's domain knows no such limits."

        MP3.com said it will appeal. The company had argued that a penalty of any more than $500 per CD would be a virtual "death sentence."

        Shares of MP3.com were halted before the decision; the most recent trade was at $7.88 per share, down 68.8 cents on the Nasdaq Stock Market. https://www.utdailybeacon.com/... [utdailybeacon.com]

        Imagine $25k per infringement against Meta? Neither can I. They probably lobbied and had the law changed. Or the judge doesn't want to crash the stock market (who doesn't hold meta stock directly or indirectly).

      • Yeah not directly, but it was implicit in their decision. The world has a long LONG standing precedent that if work is transformative it is permitted for fair use, even if the result competes with the original. The only thing new here is someone claiming it shouldn't be the case because of "troves" of data being used.

        Volume is the whole basis for the argument here. You can read a book and write a book with similar story elements and styling selling it to compete agains the one which inspired yours and it wo

        • by pjt33 ( 739471 )

          The "troves" of data being used is volume of consumption, not volume of production, and volume of consumption has always been a factor in fair use and similar concepts. One rule of thumb is that if you're copying more than 5% of the original work, that weighs against the use being fair. (And, anticipating one common argument, legally the training process copies 100% of all the works used, even if less than that ends up encoded directly in the weights).

        • Yeah except it doesn't actually and in fact generate works. It regurgitates bits of other people's work, often verbatim, and because those other people's works were taken without compensation, any model trained on said literally stolen data is violating copyright law.

          Again, this has nothing to do with volume, stolen data is stolen data whether you're stealing a small amount or large.

          Personally I hard agree that any model trained on stolen data should be made available for similar "fair use" access by
    • by dfghjk ( 711126 )

      Definitely, one of the most obvious flaws here is the "two rights make a wrong" logic. It's a shitty take on fair use but it serves the interest of the Office.

      At risk of sounding like a victim was raped because she asked for it, here a copyright holder's rights were violated, according to the Office, because they asked for it. You have to realize that, by the standards set forth in the document, copyright infringement doesn't occur when the material is consumed, it occurs before that. It occurs when a by

      • > But this data exists expressly for that purpose and for NO OTHER purpose.
        LIke.. the data here is content on a website?

        I think you just defined the difference between consensual sex and gang rape.
  • by el84 ( 10322963 ) on Monday May 12, 2025 @04:09AM (#65369823)
    While it is patently immoral for big internet companies to basically steal the entirety of human creative output, in order to train their stupid (so-called) ai models and fill their future pockets, if we don't do it then bad guys with absolutely no morals in far flung parts of the world will do it anyway, OK we can have the smug sense that we have done the right thing when we are the penniless vassals of our current global technology competitors, who will remain unnamed out of courtesy.
    • by martin-boundary ( 547041 ) on Monday May 12, 2025 @04:26AM (#65369835)
      While it's immoral to [steal from|rape|torture|slander|kill|enslave] my neighbours, there are people out there, somewhere, who are absolutely willing to [steal from|rape|torture|slander|kill|enslave] my neighbours. I can be smug about not doing it to them myself, but it's just a matter of time until they become victims, so it's really ok if I also [steal from|rape|torture|slander|kill|enslave] my neighbours. Besides, I'm bored thinking about implications.
      • It's also immoral not to free trade with them, so as long as the raping is economically efficient you should rape too, otherwise getting outcompeted is only just.

        Free trade, raping all boats.

      • by dfghjk ( 711126 )

        The rule of law cannot possibly apply to everyone, so it should apply to no one. Except when I'm in power, then what I say goes. And I'll gain that power by [steal from|rape|torture|slander|kill|enslave] my neighbours.

      • You're not wrong. Just because you stick to some sort of moral code or rule of law doesn't mean someone else will.

        So maybe we should apply the rules of law that make our society livable, and just use the AI's coming out of those other place that have no such rules. Then presumably their society implodes under the natural pressures of harvesting the knowledge of their fine peoples and renting it back to them, and we retain a human scale livable society?

        Like your father said to you: Just because your friends
    • No, they wouldn't. The amount of money spent on this shit only makes sense if you're going to widely commercially target the U.S. and the wider West. Bad actors would never be able to come to market.

    • OK we can have the smug sense that we have done the right thing when we are the penniless vassals

      You were going to be a penniless vassal one way or the other. What difference does it make how it happens?

  • by balaam's ass ( 678743 ) on Monday May 12, 2025 @06:16AM (#65369933) Journal

    In the report: "Commenters cited several examples of AI tools trained on licensed or public domain content, such as Adobe’s Firefly (an image generator), Boomy (a music generator), Getty Images’ AI image generator, and Stability AI’s Stable Audio (a music generator)."

    Often only the infringers get mentioned.

  • Only to get ridiculed by some AI fanboi assholes, with deranged claims about "learning" and other ludicrous claims. Nice to see the actual experts recognize the problem as well. Take that, AI morons.

    • The Librarian of Congress is an "expert" on AI now? It's quite early, do you always smoke for breakfast? And where can I get that good stuff from.

    • by dfghjk ( 711126 )

      This is a shitty, ignorant take made clearly without reading any of the document. Or worse, maybe you did read it and this is what you came up with.

  • First, the Copyright Office is not part of the judicial branch. They can voice their opinion, but they are not empowered to say what the law is.

    Second, there is clearly a biased narrative at work here. If you look at the very start of the infringement discussion (page 26, Section A) the very first thing you see allegations of "right of reproduction", with the Office saying "commenters agreed with or did not dispute that copying during the acquisition and curation process implicates the reproduction right"

  • by orzetto ( 545509 ) on Monday May 12, 2025 @07:45AM (#65370075)

    This looks like an alarming extension of copyright overreach if such restrictions are applied to AI. AI reads content (which may be copyrighted, as this post you are reading is, as nearly everything on the Internet is) and learns from it, and that's how it can process a book and provide a summary within a few minutes.

    If this were an infringement of copyright, basically any form of human learning would also be. Just reviewing a book, a game, anything copyrighted could be constructed as infringement and prosecuted. Parodies, tributes, quotations. Imagine Leni Riefenstahl suing George Lucas for the final scene of the original Star Wars.

    If an AI generates text that is substantially a copy of a copyrighted training input, that's a copyright breach; but AIs can be trained to avoid this, just like people can - learn the concept, avoid copying the form.

    The report of the Copyright Office contains the following statement on page 26:

    The steps required to produce a training dataset containing copyrighted works clearly implicate the right of reproduction. Developers make multiple copies of works by downloading them; transferring them across storage mediums; converting them to different formats; and creating modified versions or including them in filtered subsets. In many cases, the first step is downloading data from publicly available locations, but whatever the source, copies are made—often repeatedly.

    That's the same way any browser operates. For that sake, a lot of browsers pre-download links on a page, so that copies are made locally before any action is taken by the user. Proxy servers also make local copies of often-requested files. If this is infringement, anyone who ever accessed the Internet is a criminal. What if you move a legally-owned copyrighted file from one hard disk partition to another? That would technically require creating a copy.

    In practice, the line is drawn when you start distributing (other people's) copyrighted works, which also is the only enforceable one. That is what should be required of AI engines.

    Obviously the reason is another: owners of copyrighted work do not want AI to learn their concepts and re-express them (which has always been legal for humans), because their customers will find it easier to ask the AI rather than pay/read the original documents themselves, busting their business model.

    • by dfghjk ( 711126 )

      Great comments.

      "If an AI generates text that is substantially a copy of a copyrighted training input, that's a copyright breach; but AIs can be trained to avoid this, just like people can - learn the concept, avoid copying the form."

      This is the most important point. Infringement occurs when an AI vomits up sufficiently large portions of a copyrighted work. AI's must be developed to avoid this, as this is what we require of people as well. You can read a book and you can have a photographic memory, you ca

  • ...reading books and using the knowledge commercially?
    None of the training data is copied

    • by ledow ( 319597 )

      It's not.

      If a human regurgitates vast portions of a copyright work - whether directly from memory or otherwise - and then sells it as part of a commercial service to other people, they will get in just as much trouble for being outside the bounds of fair-use.

      This isn't about "what an AI can do" vs "what a human can do", it's literally about "is the company's end usage covered under fair use", and wholesale regurgitation of source data (books) can be coaxed out of all LLMs *and* some of these companies are d

  • ... only outlaws will have AIs.

    Welcome to the William Gibson future.

There are two ways to write error-free programs; only the third one works.

Working...