Senator Introduces Bill To Compel More Transparency From AI Developers 71

Posted by BeauHD on Monday November 25, 2024 @08:10PM from the held-to-account dept.

A new bill introduced by Sen. Peter Welch (D-Vt) aims to make it easier for human creators to find out if their work was used without permission to train artificial intelligence. NBC News reports: The Transparency and Responsibility for Artificial Intelligence Networks (TRAIN) Act would enable copyright holders to subpoena training records of generative AI models, if the holder can declare a "good faith belief" that their work was used to train the model. The developers would only need to reveal the training material that is "sufficient to identify with certainty" whether the copyright holder's works were used. Failing to comply would create a legal assumption -- until proven otherwise -- that the AI developer did indeed use the copyrighted work. [...]

In a news release, Welch said the TRAIN Act has been endorsed by several organizations -- including the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA), the American Federation of Musicians, and the Recording Academy -- as well as major music labels -- including Universal Music Group, Warner Music Group and Sony Music Group.

Senator Introduces Bill To Compel More Transparency From AI Developers

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 71 Comments Log In/Create an Account

Comments Filter:

What about a bill (Score:5, Insightful)

by Valgrus Thunderaxe ( 8769977 ) writes: on Monday November 25, 2024 @08:23PM (#64972373)

to compel more transparency in the senate? That seems like a far more pressing issue.

- Re: (Score:1)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- The Transparency of Corruption. (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  to compel more transparency in the senate? That seems like a far more pressing issue.
  When the group of Americans known as Lawmakers and Representatives can stand in front of those they (allegedly) represent and blatantly dismiss insider trading corruption as some kind of fucking job perk, you should know that transparency is hardly the pressing issue. Blatantly allowing and accepting open corruption, is.
  And when it’s out in the open like that, you have a MUCH larger problem than mere corruption. When they don’t even bother hiding the corruption, it’s because they already
  - Re: (Score:2)
    
    by BringsApples ( 3418089 ) writes:
    
    "because they already know you can’t and won’t do a fucking thing about it."
    BullSHIT we can't do anything about it. We can SCREAM on Slashdot about it. Ooo, VOTE, yeah do that too. You can VOTE for another pig. Another thing that we can do is IGNORE it all, and go back to drinking beer and being... ya know, the BEST! Weeeeee yeHAW buddy!!
- Re: (Score:2)
  
  by BringsApples ( 3418089 ) writes:
  
  This will work in that way, once AI runs for senate. Laugh if you want, but it'll happen. Eventually our entire judicial system will run off of a hybrid bio-chip. I shit you, not.
Flawed legal theory (Score:1)

by starworks5 ( 139327 ) writes:

This is alleged subpoena power, to be able to confer the article III standing upon the court issuing the subpoena to compel the records, has to establish that a training an ai model is violating one of the exclusive rights of the copyright holder in 17 USC 106, which you may have noticed does not include using the copyrighted works for training / learning. This means essential hurdle is that machine learning is on the opposite side of the idea/expression paradigm from the creative works themselves, because
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  You're quoting existing law, but congress has power to rewrite the law. That's what one (at least) senator here is trying to do.
  - Re: (Score:2)
    
    by WaffleMonster ( 969671 ) writes:
    
    You're quoting existing law, but congress has power to rewrite the law. That's what one (at least) senator here is trying to do.
    The bill in question does NOT change copyright law.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      Ok? I have no idea what your point is though.
    - Re: (Score:2)
      
      by geekmux ( 1040042 ) writes:
      
      You're quoting existing law, but congress has power to rewrite the law. That's what one (at least) senator here is trying to do.
      The bill in question does NOT change copyright law.
      Damn shame, because even a moron AI would grasp the fact that our copyright laws are about as valid as the concept of a patent war chest.
    - Re: (Score:2)
      
      by mysidia ( 191772 ) writes:
      
      The bill does not have to. It is already against the law to create a reproduction of somebody else's work.
      That includes private reproductions: Such as copying a DVD for personal reasons - IF your reason for making that copy Or
      anything you do with that reproduction is not protected by an exception such as fair use. Fair use does Not
      protect uses of the work that harm the potential market for the original copyright works by usurping it. Training an AI to that has a chance to generate new works compet
      - Re: (Score:1)
        
        by starworks5 ( 139327 ) writes:
        
        The copyright act does not give a person a monopoly on all creative works, it only creates a monopoly on their specific expression, and it doesn't matter if new works usurp the original work, what matters is a copy of the original authors expression usurps the market for that specific expression. Otherwise websites like Slashdot could not exist, because the news reporters would claim that independently written new news article, learned facts from the original news article and was therefore a "copy" of the o
- Re: (Score:2)
  
  by dfghjk ( 711126 ) writes:
  
  "...which you may have noticed does not include using the copyrighted works for training / learning"
  LOL wut?
  "....an AI model is just a list of statistical measurements from the training data."
  LOL wut?
  Is an exact copy merely a "statistical measurement? By this standard, no copyright holder could control reproduction.
  Bad faith arguments are apparently profitable, but lies are still lies.
  - Re: (Score:1)
    
    by starworks5 ( 139327 ) writes:
    
    I suggest you should read up on the literature
    https://en.wikipedia.org/wiki/Idea%E2%80%93expression_distinction
  - Re: (Score:3)
    
    by WaffleMonster ( 969671 ) writes:
    
    LOL wut?
    This isn't a difficult concept. Copyright law only addresses copying and performing creative works and derivatives not using or otherwise benefiting from.
    Is an exact copy merely a "statistical measurement? By this standard, no copyright holder could control reproduction.
    Copyright is not a grant of exclusive use of information or insights or data. It is a grant of exclusivity to the work itself.
    - Re: (Score:3)
      
      by BadDreamer ( 196188 ) writes:
      
      And copying covered by copyright includes copying from the 'net to your disk, and from your disk to your RAM, and thus into your model trainer.
      Training a generative AI on unlicensed work is breach of copyright and license.
      - Re: (Score:1)
        
        by starworks5 ( 139327 ) writes:
        
        Why do you even need to touch the disk, you can copy it directly to ram and do the calculation in ram, which is essentially what happens anyways because you misapprehend the scale of the data and the problems with disk latency. Either way so does literally every single ISP that transmits packets over the wire, and the courts have thus held that copies such as these are "transitoriy", and therefore not fixed to a tangible medium of expression because no person can actually apprehend the expression in the cop
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Why do you even need to touch the disk, you can copy it directly to ram and do the calculation in ram,
        I am confused. What part of that doesn't sound like "copying" to you?
        
        Re: (Score:1)
        
        by starworks5 ( 139327 ) writes:
        
        The reproduction right applies to the creation of “copies,” which are defined in 17 U.S.C. 101 as “material objects, . . . in which a work is fixed . . . and from which the work can be perceived, reproduced, or otherwise communicated.” Section 101 further provides that a work is “‘fixed’ in a tangible medium of expression when its embodiment . . . is sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of
        
        Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        No, the courts have held that they are transitory because they are functionally transitory and not used in any way except being moved between parties. The speed is irrelevant, and what a person could do isn't a factor.
        Copying for use is not transitory, even if they use is very fast, and even if a person couldn't do that use directly. It's copying in the sense copyright protects against. Even if it is directly to RAM.
      - Re: (Score:2)
        
        by aldousd666 ( 640240 ) writes:
        
        we have prior art, and prior litigation on exactly this premise: Google images and Google Books used the 'copying for the purpose of assimilating' as "fair use" and they won in court.
        
        Re: (Score:2)
        
        by BadDreamer ( 196188 ) writes:
        
        What Google won was the right to assimilate for search purposes to provide a service where the public can find specific books and images, but not access books or read them unless they are licensed to do so. Allowing search was deemed sufficient public good that even though Google creates income from this, it's warranted.
        This ruling was subject to the strict limitations of the service, and the nonexistent use of the material for any other purpose than providing a search service.
        And even then it wasn't clear
    - Re: (Score:2)
      
      by mysidia ( 191772 ) writes:
      
      Copyright law only addresses copying and performing
      Training an AI is not possible without creating an unauthorized copy which is then imported to the training system's memory to start the training process.
      The copy of the work in the training system's RAM is also an unauthorized copy that would not be created within the normal course of playing the work, And finally the version of the work Encoded in the training process' output is argued to be a reproduction.
      To be clear: one way one can way of being liabl
  - Re: (Score:1)
    
    by starworks5 ( 139327 ) writes:
    
    >Is an exact copy merely a "statistical measurement? By this standard, no copyright holder could control reproduction.
    No, it is not possible, for example, to reverse a 6GB image model, and pull out of it any of the 2 petabytes worth of training images out of it, because the model generalizes correlations between the statistics of the features in the training data, and it does not instantiate each one of them into the model weights.
    For example if I have a class of 500 people, and I pull from that the mean
    - Re: (Score:2)
      
      by cmseagle ( 1195671 ) writes:
      
      For example if I have a class of 500 people, and I pull from that the mean and standard deviation, that does not mean that I can reverse the process and tell you the height of each person.
      True, but the fact that you're able to state those statistical facts implies that you somehow got your hands on the heights of everyone in the class. If there were hypothetically laws controlling the way that kind of information is allowed to be acquired and used, you could well be violating them.
- Re: (Score:2)
  
  by mysidia ( 191772 ) writes:
  
  an AI model is just a list of statistical measurements from the training data.
  I disagree. As for the implication that an AI model does not represent creative elements from training data.
  An AI model has a way of encoding all kinds of information within a system that is measured by statistics.
  ANY kind of data can end up encapsulated in such a model through the training process; for example Large Language Models have been known to spit out API keys and secret values that accidentally got leaked into code th
  - Re: (Score:1)
    
    by starworks5 ( 139327 ) writes:
    
    It doesn't matter what the statistical distribution of a color of a real apple or a fictional apple is, that statistic is an idea and not an expression, and therefore outside of the subject matter of copyright.
    see e.g. 17 USC 102(b)
    (b)In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.
    - Re: (Score:2)
      
      by mysidia ( 191772 ) writes:
      
      that statistic is an idea and not an expression, and therefore outside of the subject matter of copyright.
      No. There is nothing in the law that says a unique color distribution is an idea and not tangible expression.
      Copyright law is much more nuanced than you have presented. It is always possible to write down what you call an "idea" that turns out to be so overly specific that it merges with the tangible expression causing the idea to vanish, and it's just an encoding of the tangible expression still pr
she's a witch! (Score:2)

by dfghjk ( 711126 ) writes:

All you need to do is produce material that doesn't exist and if you don't, you're guilty of using material as it was intended. Good thing we'll soon have a convicted felon as president.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by rossdee ( 243626 ) writes:
  
  But only if she weighs the same as a duck, duck, go
Prove a negative much? (Score:3)

by jvkjvk ( 102057 ) writes: on Monday November 25, 2024 @09:24PM (#64972497)

So they have to turn over everything.
Otherwise, how do you know that it wasn't in what wasn't turned over?

- Re: (Score:2)
  
  by misnohmer ( 1636461 ) writes:
  
  And of course it's going to be so much data, this data will have to be fed into another AI to prove said negative. If this other AI happens to retain some of the training materials, well, there is no choice, cannot un-ring a bell. Now, I bet Google thinks Open AI trained their models on at least one Google content. Here comes the subpoena for all of Open AI's training data.
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Otherwise, how do you know that it wasn't in what wasn't turned over?
  If the LLM quotes something from your work and they didn't turn that over, then you know they lied. It might not happen, but it definitely can happen.
  - Re: (Score:2)
    
    by HiThere ( 15173 ) writes:
    
    How long a quote? For a sufficiently short quote that's definitely wrong. Even some longer quotes are constrained by context into only a few forms.
Transparency for Who(m), or What? (Score:2)

by geekmux ( 1040042 ) writes:

Who's the transparency for? The human owner forced by rule of transparency law to tell the truth, or the AI subtlety learning to hide truths from even the human owner?
(Hopefully we don’t forget to corral the other mind we’re developing here. Preferably before “it” realizes just how easy it is to fool and lie to humans. Every time.)
- Re: (Score:2)
  
  by aldousd666 ( 640240 ) writes:
  
  Guilty until proven innocent
  I believe that's why this bill cannot actually become law, and if it does, it'll be tossed by the supreme court.
Regulatory Capture at work... (Score:3)

by tiqui ( 1024021 ) writes: on Monday November 25, 2024 @11:03PM (#64972655)

Let's see here... ANY copyright holder (or hundreds of them or thousands of them...) will be able to issue legal subpoenas to creators of AI projects (who will, of course, want lawyers to handle the raft of subpoenas in order to avoid stepping upon legal landmines) which will, in turn, essentially force the AI developers to prove their innocence.
Well, companies like Apple and Google have tons of staff lawyers to deal with such stuff and will be able to afford all the legal overhead costs; they'll probably setup departments to build and maintain databases of evidence of their innocence and to mine and use that info in routine responses to the subpoenas.
If Tom and Harry and Sue setup an AI start-up in a garage somewhere, however...
It won't be worth it for anybody who is not a megacorp to play in the AI space if a law like this goes into effect.

- Maybe...or maybe not (Score:2)
  
  by Roger W Moore ( 538166 ) writes:
  
  ANY copyright holder ... will be able to issue legal subpoenas to creators of AI projects
  Yes, but the consequence for refusing such a subpoena is only the legal assumption that the work was used to train the model. Since the use of works for training and learning is not a right granted by copyright there seems to be little consequence to refusing the subpoena unless I am missing something.
  - ah, people forget stuff, but the law does not. (Score:2)
    
    by tiqui ( 1024021 ) writes:
    
    "copyright" is the legal right to make copies.
    We all usually see this as a right to make, publish, and distribute for sale, books (and more recently, films, videos, and audio recordings). The law, however, does not make this rather narrow interpretation. Copyright law in the United States is actually in the Constitution itself and pre-dates any form of recording - it's very broad and simply covers the copying of a published work. Any clever lawyer in some future AI related case will easily be able to show t
- Re: (Score:2)
  
  by strikethree ( 811449 ) writes:
  
  It won't be worth it for anybody who is not a megacorp to play in the AI space if a law like this goes into effect.
  Congratulations! You figured out the intent of the law. Now, what are you going to do about it? Absolutely nothing. Have a nice day. :)
What about derived/synthetic text? (Score:2)

by Visarga ( 1071662 ) writes:

I can use a LLM to reword texts, they won't come up in a n-gram based search. They can only find similar ideas not expression. What do they do then? Technically, it won't be training on copyrighted data, but the model would learn all the same.
How would that work? (Score:2)

by SoftwareArtist ( 1472499 ) writes:

The developers would only need to reveal the training material that is "sufficient to identify with certainty" whether the copyright holder's works were used.
If you didn't use their works, what are you supposed to reveal? The only way they could "identify with certainty" you didn't use their works is to provide them the entire training dataset, plus the complete training code including all random number seeds, plus the weights of your trained model, so they can repeat the complete training process (costing ~$100 million for a state of the art LLM) and verify they end up with exactly the same model. Unless there's anything nondeterministic in the training proce
- Re: (Score:2)
  
  by aldousd666 ( 640240 ) writes:
  
  yeah this won't pass the constitutional smell test. There's no such thing as 'guilty until you prove you are innocent' in the USA, and let's hope it stays that way
Taking it further... (Score:3)

by physicsphairy ( 720718 ) writes: on Tuesday November 26, 2024 @12:55AM (#64972759)

Can we hold the copyright conglomerates to the same standard and they have to prove, e.g., that every movie produced is not reliant on no member of the production having seen any previous copyrighted movie?
Notably, according to the US Constitution, the raison d’être of copyright is
To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;
So if generative AI is more prolific in those domains, arguably, we may have reached the end of the public utility of patents and copyright.

- Re: (Score:1)
  
  by starworks5 ( 139327 ) writes:
  
  That may be, but model weights are not in the subject matter of copyright at all see e.g. 17 USC 102(b)
  (b)In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.
- Re: (Score:2)
  
  by BadDreamer ( 196188 ) writes:
  
  Generative AI trained on stolen works is not more prolific in science and useful arts, so this is as of now science fiction.
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  arguably, we may have reached the end of the public utility of patents and copyright.
  Yeah but you have to go to quite some effort to prove it, since the courts give congress wide range to their judgement to interpret that phrase.
  
  It seems very unlikely that the courts are going to allow LLMs to be trained without some kind of compensation to copyright holders.
Payola (Score:2)

by bradley13 ( 1118935 ) writes:

Too many ill-informed bills are started, because dome lobbyist made a fat contribution. This is obviously one of them. Said Senator doesn't know an LLM from an M&M, but we can pretty much guess what names are on the check.
DMCA 2.0 (Score:2)

by codebase7 ( 9682010 ) writes:

Here we go again. Yet another way for Hollywood, the world's biggest organized thieves guild (by their own definitions), to veto anything they dislike.

It doesn't take a rocket scientist to figure out how this will be abused. Every AI that comes out will have claims made against it by Hollywood the second any single artTiest thinks, in their own subjective opinion, that some random output kinda sorta maybe could be confused for one of their own works. Then the AI developer will have to completely reveal th
- Re: (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  Here we go again. Yet another way for Hollywood, the world's biggest organized thieves guild (by their own definitions), to veto anything they dislike. It doesn't take a rocket scientist to figure out how this will be abused. Every AI that comes out will have claims made against it by Hollywood the second any single artTiest thinks, in their own subjective opinion, that some random output kinda sorta maybe could be confused for one of their own works. Then the AI developer will have to completely reveal the entire training set to the courts to prove otherwise. (Note, the law says "sufficient to identify with certainty" but we all know "certainty" to Hollywood will mean absolute certainty for the purposes of calculating Hollywood's paychec.....err... damages.) Now repeat this thousands of times over for each AI system made.
  IANAL, but you know that whole “any resemblance to living” line you often see in movie credits? Or how the ENTIRE professional impersonator profession is still a legal one, and Hollywood isn’t credited with throwing droves of impersonators in prison by now?
  Is the a valid reason Hollywood cannot be told to fuck right off and go choke on that same legal dick of valid defense, backed by precedent?
Yeah, that will work (Score:2)

by gweihir ( 88907 ) writes:

The "AI developers" have no clue themselves...
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  In the worst case, they can turn over their entire training set.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    They can. But will they without really knowing what is in there?
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      From my reading, it doesn't matter, as long as they turn over what they have.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        There may be things illegal to have in there. And it does not even need to be CP.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      btw, it would be kind of weird if the training set wasn't organized in some way that would allow a person to find documents.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Why would it be? This is LLM training data. You essentially just dump it in.
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        You collect your training data so you can experiment retraining it with different methods. This is pretty normal for neural networks.
Presumption of innocence? (Score:2)

by aldousd666 ( 640240 ) writes:

This bill should, and hopefully will, get tossed if it gets passed because there is no 'guilty until proven innocent.'

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

What about a bill (Score:5, Insightful)

Re: (Score:1)

The Transparency of Corruption. (Score:2)

Re: (Score:2)

Re: (Score:2)

Flawed legal theory (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

she's a witch! (Score:2)

Re: (Score:2)

Re: (Score:2)

Prove a negative much? (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Transparency for Who(m), or What? (Score:2)

Re: (Score:2)

Regulatory Capture at work... (Score:3)

Maybe...or maybe not (Score:2)

ah, people forget stuff, but the law does not. (Score:2)

Re: (Score:2)

What about derived/synthetic text? (Score:2)

How would that work? (Score:2)

Re: (Score:2)

Taking it further... (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Payola (Score:2)

DMCA 2.0 (Score:2)

Re: (Score:2)

Yeah, that will work (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Presumption of innocence? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals