


US Copyright Office to AI Companies: Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Works' (yahoo.com) 86
Business Insider tells the story in three bullet points:
- Big Tech companies depend on content made by others to train their AI models.
- Some of those creators say using their work to train AI is copyright infringement.
- The U.S. Copyright Office just published a report that indicates it may agree.
The office released on Friday its latest in a series of reports exploring copyright laws and artificial intelligence. The report addresses whether the copyrighted content AI companies use to train their AI models qualifies under the fair use doctrine. AI companies are probably not going to like what they read...
AI execs argue they haven't violated copyright laws because the training falls under fair use. According to the U.S. Copyright Office's new report, however, it's not that simple. "Although it is not possible to prejudge the result in any particular case, precedent supports the following general observations," the office said. "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs — all of which can affect the market."
The office made a distinction between AI models for research and commercial AI models. "When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs are unlikely to substitute for expressive works used in training," the office said. "But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."
The report says outputs "substantially similar to copyrighted works in the dataset" are less likely to be considered transformative than when the purpose "is to deploy it for research, or in a closed system that constrains it to a non-substitutive task."
"A day after the office released the report, President Donald Trump fired its director, Shira Perlmutter, a spokesperson told Business Insider."
- Big Tech companies depend on content made by others to train their AI models.
- Some of those creators say using their work to train AI is copyright infringement.
- The U.S. Copyright Office just published a report that indicates it may agree.
The office released on Friday its latest in a series of reports exploring copyright laws and artificial intelligence. The report addresses whether the copyrighted content AI companies use to train their AI models qualifies under the fair use doctrine. AI companies are probably not going to like what they read...
AI execs argue they haven't violated copyright laws because the training falls under fair use. According to the U.S. Copyright Office's new report, however, it's not that simple. "Although it is not possible to prejudge the result in any particular case, precedent supports the following general observations," the office said. "Various uses of copyrighted works in AI training are likely to be transformative. The extent to which they are fair, however, will depend on what works were used, from what source, for what purpose, and with what controls on the outputs — all of which can affect the market."
The office made a distinction between AI models for research and commercial AI models. "When a model is deployed for purposes such as analysis or research — the types of uses that are critical to international competitiveness — the outputs are unlikely to substitute for expressive works used in training," the office said. "But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries."
The report says outputs "substantially similar to copyrighted works in the dataset" are less likely to be considered transformative than when the purpose "is to deploy it for research, or in a closed system that constrains it to a non-substitutive task."
"A day after the office released the report, President Donald Trump fired its director, Shira Perlmutter, a spokesperson told Business Insider."
I cannot see this stopping the AI spiders (Score:4, Insightful)
Effective enforcement will not be easy.
The other problem that the spiders do is to overload servers. They do not seem to be gentle in the way that most search engine spiders are.
Re:I cannot see this stopping the AI spiders (Score:5, Interesting)
Penalties, per the FBI warning on home VHS tapes, 5 years, $250,000 fine, and felony so you lose your voting and gun rights for life. If its per violation, some AI execs might be looking at hundreds of years in prison and fines exceeding the value of these large tech companies market capital.
Re: (Score:3, Interesting)
Indeed. Also, under some conditions, LLMs can be made to regurgitate parts of their training data....
The second thing is, you must delete all the stolen content. And that means the whole LLM in this case.
Re: (Score:2)
Suchir tried that.
Re: (Score:3)
Effective enforcement is easy. Just make some severe penalties for doing it, and crucially for using AI that has been trained on unlicensed material. Then any business that wants to sell AI services will need to certify that they trained it legally, because their customers will demand it for fear of being hit by penalties themselves.
The same rules will apply to foreign made AIs of course.
The EU does that and it has proven successful with things like GDPR.
Re: I cannot see this stopping the AI spiders (Score:2)
Re: I cannot see this stopping the AI spiders (Score:2)
Repeat after me (Score:5, Insightful)
Re: (Score:2)
So of course it is downvoted.
Any mod who did so should be banned immediately.
Re: Repeat after me (Score:2)
Re: Repeat after me (Score:2)
This place is generally screwed. It doesn't work, it isn't well managed, you can't block posters, it allows anonymous posts and you can't block those either. Broken by design.
Re: (Score:3)
you can't block posters, it allows anonymous posts and you can't block those either.
I mean, you can't "block" anything, but you effectively can. Change your comment modifiers [slashdot.org] and set ACs to -6, set "foes" to -6, mark commenters you don't like as foes, and browse at 0 or above. You will no longer see AC comments or comments from people you don't like.
Re: (Score:2)
I mean, you can't "block" anything, but you effectively can.
You can't but you can? Take two steps forward, and two steps back, and then two steps forward, and two steps back, and now we're doing the cha-cha.
I want their content to disappear as if it never was, whether it was modded up or not. That would be blocking, unlike what you propose.
Re: (Score:3)
Take two steps forward, and two steps back, and then two steps forward, and two steps back, and now we're doing the cha-cha.
Thank you, Chris Knight.
I want their content to disappear as if it never was, whether it was modded up or not. That would be blocking, unlike what you propose.
Yes, I understand--that is not something you can do. You can, however, effectively achieve this by following the instructions given.
You know, the adverb form of "effect." Specifically, definition 2, "in effect; virtually" which rests upon definition 4 of the root word, "the power to bring about a result." So, to answer your snarky question, yes, in effect you can't but you can.
Is your brain not firing on all cylinders this morning, or are you just purposefully being obtuse?
Re: (Score:3)
It is not factual. In the US, copyright is meant to benefit both the author and mankind: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."
It (deliberately?) ignored that in order to encourage authorship and still benefit society, copyrights are granted for limited terms, after which the works become public domain. Let the AIs freely train on all the
Re: (Score:3)
It parses as "To promote A, by securing (for limited times) to B thing C"
where
A = The Progress of Science and useful Arts
B = Authors and Inventors
C = the exclusive Right to their respective Writings and Discoveries
i.e. The law is to promote A.
It does that by securing C for B.
i.e. The law is to promote Progress of Science and useful Arts.
It does that by securing the exclusive Right to their respective Writings and Discoveries for Authors and Invento
Re: (Score:2)
Re: (Score:2)
If objective fairness were the policy you would have been gone long ago.
Re: (Score:2)
IOW Fuck off.
Re: (Score:2)
But they don't need to, copyright is about making copies of the cake, not eating shit.
Re: Repeat after me (Score:5, Insightful)
It may be a "valid" response, but it's hardly compelling. Your argument is that copyright should be completely upended and author protections should simply vanish into an LLM. Yo make that argument, you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind.
Re: Repeat after me (Score:3)
A moderate amount of compensation and protection wouldn't be bad. However, the current "annuity for my great-great-grandchildren" is anything but. It is the product of lobbying by the biggest corporations, does nothing for authors and artists, and only ensures rentseeking is profitable while stifling innovation and new arts. Sampling in hip hop being a case in point, but there are hundreds of examples if you search a bit.
Rent seeking is a problem because it eventually paralizes any economy. And it is expeci
Re: (Score:2)
It's not "completely upended". It's currently perfectly legal for a human to read all the copyrighted works they want, learn from them, and produce their own works based on what they've read. It's not immediately obvious that a machine doing the same thing is contrary to the principles of copyright. Of course in either case if the output is too closely based on a small set of works then it's an infringement, but that doesn't mean that the training itself is.
That's not so say that there isn't an issue of cou
Re: (Score:2)
Did "author protections" get "upended" when you read a book? How is it different when it's an LLM doing the reading?
Your response is neither valid nor compelling.
"..you're going to have to start at first principles to explain why compensating artists is no longer beneficial to mankind."
No, all you need to do is point out the hypocrisy of your position. You believe the benefit is for you, not for them. It's pulling up the ladder, nothing more.
Re: (Score:2)
Starting from first principles here is good, it shows how a well-read author is distinct from wholesale monetized copyright infringement.
Re: Repeat after me (Score:3)
Agreed. It is a valid discussion and reducing it to a black and white generalization is absurd.
A complete win for Western content creators would likely leave AI development and advancement crippled compared to countries where it is unfettered. Our content creators can sip their kombuchas while foreign AI dominates the future.
A complete win for AI companies would likely result in continued, flagrant abuse of created content for profit in a manner which competes with the content creators. Doesn't seem righ
Re: (Score:2)
Because you see everything as zero sum. AI companies could win on merit, we don't have to accept your false choice.
Re: (Score:2)
Re: Repeat after me (Score:2)
That statement isn't supported by the history of copyright. It would have been easier to not have copyright, if you wanted to censor things.
Re: (Score:2)
The origin of copyright law in most European countries lies in efforts by the church and governments to regulate and control the output of printers.[10] Before the invention of the printing press, a writing, once created, could only be physically multiplied by the highly laborious and error-prone process of manual copying by scribes. An elaborate system of censorship and control over scribes did not exist, as scribes were scattered and worked on single manuscripts.[11] Printing allowed for multiple exact copies of a work, leading to a more rapid and widespread circulation of ideas and information (see print culture).[10] In 1559 the Index Expurgatorius, or List of Prohibited Books, was issued for the first time.[11]
https://en.wikipedia.org/wiki/... [wikipedia.org]
(and btw, case in point, another factually correct statement downvoted by zealots)
Re:Repeat after me (Score:4, Interesting)
That's not an argument, that's just emoting.
How is that different in substance from "maybe copyright infringement should be allowed because I don't want to pay for a newspaper subscription. that would benefit mankind, i.e. me."?
Re: (Score:2)
It's different because the argument is that you get access to the articles for free, but the AI company doesn't. Straw manning the argument is not a win, data being scraped for training isn't behind a "newspaper subscription", and if it were, AI use of the articles should be fine, by your standard, if the company paid the subscription.
Re: (Score:2)
Copyright laws and licenses govern more than just what somebody pays for the copyrighted work. Your argument is based on ignoring violations of licenses, all the secondary copying that goes on during training, and the propensity of AI models to regurgitate training material.
Re:Repeat after me (Score:4, Insightful)
Well, maybe it should if the resulting models are available and also under fair-use. Most are not, hence criminal commercial copyright infringement.
Re: (Score:2)
You think "fair use" is a license? LOL
"Most are not" LOLOL says who? You think AI publishers get to decide whether you get "fair use".
You never fail to impress with the stupid.
Re: (Score:2)
How pathetic. You do not even have basic reading comprehension. Dumb and aggressive. Nice!
Re: (Score:3)
Re: (Score:2)
there are no natural rights, someone with a gun can make a very valid point about your right to life too.
Copyright is being abused (Score:2)
Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators. And to anyone saying "copyright does not allow use by AI" an answer of "well, maybe it should" is very valid.
Copyright is being abused and it's no longer about encouraging innovation. If concentrates wealth and is a key driver of inequality. It has been extended and extended. It went from a reasonable 7 years to 14 years renewable to 28, then life of the author plus 50 to life plus 70 years.
It was extended to buildings which is completely ridiculous. They industry tried but failed to extend it to clothing. There is absolutely no reason why an AI shouldn't read a work because it doesn't compete by selling a copy of
Re: (Score:2)
Copyright is not a natural right. It is a privilege given to content creators for the benefit of mankind not for the benefit of content creators
Absolutely agree. Copyright's purpose is to give creative people with good ideas an incentive to realise or at least flesh out those ideas. It's purpose is not to give people money for nothing - but of course people will never stop trying to find something that gives them exactly that.
What I very much dispute, however, is that so-called AI is a benefit to mankind. As I see it's it's not only not beneficial but very dangerous. It has potential to bring great harm to mankind
Let's start by assuming that just t
Entrumpy keeps one guessing (Score:1)
Which mean the report could soon flip 180.
Correction (Score:1)
"means"
Re: (Score:2)
Also a nice example what deep government corruption looks like.
People to US Copyright office (Score:5, Insightful)
95 years is a fucking abomination.
Copyright is not fit for its purpose of encouraging creativity.
Re:People to US Copyright office (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2)
People of the world to the US Copyright office: 95 years is a fucking abomination. Copyright is not fit for its purpose of encouraging creativity.
The alternative was apparently "forever minus one day" as proposed by Jack Valenti many years ago. So it's better than that. And at least it's finally settled. But yeah, 95 years is too long. Actually, in a few cases it will end up being over 100 years due to the way the law was written, but that only applies to some music in the 1940s if I remember correctly.
Intersting take (Score:5, Interesting)
Copyright law has never considered speed or volume of production, yet now the copyright office is claiming that precisely this implicates fair use. That said I'm right there with them when it comes to illegal access.
How much did grandma have to pay for downloading one mp3? I hope Meta pays the same amount multiplied by all the works they pirated.
Re: (Score:2)
I've only read the summary, but I'm not seeing anything related to speed or volume of production. Am I overlooking something?
Re: (Score:2)
No, you're not. The assertion you quoted to is entirely false; it is a straw man to stand in for the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement of a very traditional character.
Re: (Score:2)
Except it's not, think about it. If you remove the word "troves" from the quote, then the entire argument is basically completely at odds with decades of established case law saying such work would be permitted under copyright rules. You can use materials to inspire new works and sell those works competing with the original. That is something that has been permitted since the beginning, it is the basis of fair use.
If the transformative aspect is the same, and the commercial aspect is the same, then there's
Re: (Score:2)
Fair use is you reading a book and maybe even applying knowledge you gleaned.
The vast industrial scale of harvesting the web is the defining difference between you reading a book and Big AI slurping it up as training data.
Re: (Score:2)
If you remove the "troves" bit from the quote, then the argument becomes:
But making commercial use of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.
This is still trivially true under copyright law. If you want to point to supposed decades of established case law, then do so. Argument by unbacked assertion is a fallacy. Inspiration is separate from "fair use" -- and there are decades of litigation over where the boundaries are for characters or plots "inspired by" copyrighted material.
Re: (Score:2)
IT IS LITERALLY IN THE TITLE! "Fair Use Isn't 'Commercial Use of Vast Troves of Copyrighted Data'."
What the fuck do you think that means, you moron?
"...the Copyright Office's observation that AI companies are engaging in large-scale copyright infringement..."
You literally just admitted that the quote was "entirely true". It's not fair use because it's "large-scale", literally a judgement made on "volume of production". You are an idiot.
Re: (Score:2)
The title means that infringing copyright involving "vast troves" doesn't make it stop being an infringement of copyright. The logic is right in your second quote: AI companies are engaging in large-scale copyright infringement, not small-scale infringement. It's the core of their business model, not a side line.
Re:Intersting take (Score:4, Interesting)
Sep 7, 2000 A federal judge Wednesday ordered MP3.com to pay as much as $250 million to Universal Music Group for violating the record company's copyrights by making thousands of CDs available for listening over the Internet.
U.S. District Judge Jed S. Rakoff punished the online music-sharing service at $25,000 per CD, saying it was necessary to send a message to Internet companies.
Universal Music Group, the world's largest record company, had urged a stiff penalty in a case closely watched by Napster and other businesses that share music or other copyrighted material over the Internet.
The judge said some Internet companies "may have a misconception that, because their technology is somewhat novel, they are somehow immune from the ordinary applications of laws of the United States, including copyright law."
He added: "They need to understand that the law's domain knows no such limits."
MP3.com said it will appeal. The company had argued that a penalty of any more than $500 per CD would be a virtual "death sentence."
Shares of MP3.com were halted before the decision; the most recent trade was at $7.88 per share, down 68.8 cents on the Nasdaq Stock Market. https://www.utdailybeacon.com/... [utdailybeacon.com]
Imagine $25k per infringement against Meta? Neither can I. They probably lobbied and had the law changed. Or the judge doesn't want to crash the stock market (who doesn't hold meta stock directly or indirectly).
Re: (Score:2)
Yeah not directly, but it was implicit in their decision. The world has a long LONG standing precedent that if work is transformative it is permitted for fair use, even if the result competes with the original. The only thing new here is someone claiming it shouldn't be the case because of "troves" of data being used.
Volume is the whole basis for the argument here. You can read a book and write a book with similar story elements and styling selling it to compete agains the one which inspired yours and it wo
Re: (Score:2)
The "troves" of data being used is volume of consumption, not volume of production, and volume of consumption has always been a factor in fair use and similar concepts. One rule of thumb is that if you're copying more than 5% of the original work, that weighs against the use being fair. (And, anticipating one common argument, legally the training process copies 100% of all the works used, even if less than that ends up encoded directly in the weights).
Re: (Score:2)
Definitely, one of the most obvious flaws here is the "two rights make a wrong" logic. It's a shitty take on fair use but it serves the interest of the Office.
At risk of sounding like a victim was raped because she asked for it, here a copyright holder's rights were violated, according to the Office, because they asked for it. You have to realize that, by the standards set forth in the document, copyright infringement doesn't occur when the material is consumed, it occurs before that. It occurs when a by
Re: (Score:2)
LIke.. the data here is content on a website?
I think you just defined the difference between consensual sex and gang rape.
The problem isn't only about tech giants (Score:3, Insightful)
Re:The problem isn't only about tech giants (Score:5, Insightful)
Re: (Score:2)
It's also immoral not to free trade with them, so as long as the raping is economically efficient you should rape too, otherwise getting outcompeted is only just.
Free trade, raping all boats.
Re: (Score:2)
The rule of law cannot possibly apply to everyone, so it should apply to no one. Except when I'm in power, then what I say goes. And I'll gain that power by [steal from|rape|torture|slander|kill|enslave] my neighbours.
Re: (Score:2)
So maybe we should apply the rules of law that make our society livable, and just use the AI's coming out of those other place that have no such rules. Then presumably their society implodes under the natural pressures of harvesting the knowledge of their fine peoples and renting it back to them, and we retain a human scale livable society?
Like your father said to you: Just because your friends
Re: The problem isn't only about tech giants (Score:2)
No, they wouldn't. The amount of money spent on this shit only makes sense if you're going to widely commercially target the U.S. and the wider West. Bad actors would never be able to come to market.
Re: (Score:3)
I voted for Trump because he said he would hurt the right people, he won't hurt me!
Nice to see the licensed models mentioned for once (Score:3)
In the report: "Commenters cited several examples of AI tools trained on licensed or public domain content, such as Adobe’s Firefly (an image generator), Boomy (a music generator), Getty Images’ AI image generator, and Stability AI’s Stable Audio (a music generator)."
Often only the infringers get mentioned.
I have said that for ages (Score:2, Insightful)
Only to get ridiculed by some AI fanboi assholes, with deranged claims about "learning" and other ludicrous claims. Nice to see the actual experts recognize the problem as well. Take that, AI morons.
Re: (Score:2)
The Librarian of Congress is an "expert" on AI now? It's quite early, do you always smoke for breakfast? And where can I get that good stuff from.
Re: (Score:2)
No. An expert on copyright, Try to keep up.
Re: (Score:2)
This is a shitty, ignorant take made clearly without reading any of the document. Or worse, maybe you did read it and this is what you came up with.
Re: (Score:2)
Hahaha, no. I am just not as dumb and disconnected as you are.
not impressed (Score:2)
First, the Copyright Office is not part of the judicial branch. They can voice their opinion, but they are not empowered to say what the law is.
Second, there is clearly a biased narrative at work here. If you look at the very start of the infringement discussion (page 26, Section A) the very first thing you see allegations of "right of reproduction", with the Office saying "commenters agreed with or did not dispute that copying during the acquisition and curation process implicates the reproduction right"
Dangerous extension of copyright concept (Score:2)
This looks like an alarming extension of copyright overreach if such restrictions are applied to AI. AI reads content (which may be copyrighted, as this post you are reading is, as nearly everything on the Internet is) and learns from it, and that's how it can process a book and provide a summary within a few minutes.
If this were an infringement of copyright, basically any form of human learning would also be. Just reviewing a book, a game, anything copyrighted could be constructed as infringement and prose
Re: (Score:2)
Great comments.
"If an AI generates text that is substantially a copy of a copyrighted training input, that's a copyright breach; but AIs can be trained to avoid this, just like people can - learn the concept, avoid copying the form."
This is the most important point. Infringement occurs when an AI vomits up sufficiently large portions of a copyrighted work. AI's must be developed to avoid this, as this is what we require of people as well. You can read a book and you can have a photographic memory, you ca
Wow. (Score:1)
Just a whole fuck load of pro-AI astrotufers circle-jerking in here.