Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Courts

Getty Asks London Court To Stop UK Sales of Stability AI System (reuters.com) 29

Stock photo provider Getty Images has asked London's High Court for an injunction to prevent artificial intelligence company Stability AI from selling its AI image-generation system in Britain, court filings show. From a report: The Seattle-based company accuses the company of breaching its copyright by using its images to "train" its Stable Diffusion system, according to the filing dated May 12. Stability AI has yet to file a defence to Getty's lawsuit, but filed a motion to dismiss Getty's separate U.S. lawsuit last month. It did not immediately respond to a request for comment.
This discussion has been archived. No new comments can be posted.

Getty Asks London Court To Stop UK Sales of Stability AI System

Comments Filter:
  • by gweihir ( 88907 ) on Friday June 02, 2023 @01:23PM (#63571231)

    And it matters very much where you got it. As the current hype of trained "AI" systems needs excessive amounts of training data, this will be a real issue for all of them.

    • I was about to make this same statement. The larger battle will be over what training data is used, and AI systems MUST be required disclose this going forward. You may even see things in the public domain with only a limitation for AI training usage.
      • by mysidia ( 191772 )

        and AI systems MUST be required disclose this going forward

        Why? No.. the only way you can make them disclose is if they decided to keep the data after training, and they're being sued, then they can be asked for data during discovery.

        If they discarded the data after training and kept no records afterwards, then it's gone, and they not only can't be required to disclose -- there physically is no way to disclose what they don't have.

        There's no law that can force a company that creates digital works to ret

        • by Bobknobber ( 10314401 ) on Friday June 02, 2023 @02:55PM (#63571475)

          In SD’s case though they might have shot themselves in the foot however.
          Recently a research paper released that claimed to utilize the infamous LAION dataset used by Stable Diffusion:

          https://arxiv.org/pdf/2306.006... [arxiv.org]

          “The authors wish to express their thanks to Stability AI Inc. for providing generous computational resources for our experiments and LAION gemeinnütziger e.V. for dataset access and support.”

          Granted this could have been distributed before they were told to delete the dataset, but this does confirm that copies of the datasets used by popular image generation models are abound even if only used for research. And it’s likely that if need be, prosecutors can attain these datasets through other means.

          In any case though, I could see lawmakers demanding that companies retain their datasets to ensure it is not being trained on copyright materials. No datasets available would be potentially grounds for having the model destroyed and replaced/retrained. And even if companies try to be sneaky, all it takes is a whistleblower/leaker to expose the training set before it gets deleted. We’ve already seen it with the Meta leaks as an example.

          • What's your point? This is research paper, not a commercial product. The use of LAION's dataset for non-commercial research purposes is a slam dunk.

        • by noodler ( 724788 )

          No.. the only way you can make them disclose is if they decided to keep the data after training,

          I think you can probe the network to produce information that would make it clear particular works were present in the training data.

    • by Tailhook ( 98486 )

      "AI" systems needs excessive amounts of training data

      Today.

      Tomorrow it won't. This is probably the key remaining unsolved problem with machine learning. Thing is it won't always be true. There are too many areas of knowledge that lack vast quantities of cheap data for training, so the need to solve this compelling. There is no reason to think solving this is not surmountable; pick up a stick and smack a dog in the nose with it and that's the last time that dog will take you carrying a stick for granted. One lesson: learning complete.

      Machine learning

      • by mysidia ( 191772 )

        Tomorrow it won't. This is probably the key remaining unsolved problem with machine learning. Thing is it won't always be true.

        You are assuming that a Computational solution can exist to do so - that it is even a generally feasible problem to solve in the first place to make a modified ML algorithm that solves all problems better. That is not necessarily true.

        There's in fact a chance general machine learning algorithms cannot be updated to make them better at using less data within feasible hardware cos

      • by gweihir ( 88907 )

        Actually, it will. It really does not depend on technology how much training data a statistical model needs for a certain performance. It is a purely mathematical question. The real advances fuelling the current hype come from the possibility to use more training data as training has gotten faster.

        Also, after 70 years of respective research and industrial applications, calling machine learning "nascent" and "primitive" seems to be pretty inappropriate.

        • Its pretty well understood from the loss functions that current training methods are *extremely* inefficient mathematically. We are pretty certain, but can't prove, that there is huge scope for improvement.

          There ARE limitations on ML arising out of thermodynamics and economics. The fact that GPT4 is so expensive to run compared to GPT3.x (GPT2 can be run on a domestic CPU [albeit slowly], and they can pretty much give access to 3.5 away but 4 carries a seriously steep price per month to use) indicates prob

      • by Luckyo ( 1726890 )

        Today's "AI" is technically defined as "Machine Learning and Big Data". Notice the complete absence of "Artificial Intelligence" in that definition, because this isn't intelligent. It's brute force compute to learn relationship between all data points, and the accuracy comes from Big Data. The Bigger the Data the better the outcome.

        What you're talking is AGI. Where machine actually develops self-awareness to be able to process data like humans do, removing the need for Big Data. We have not a faintest clue

        • by gweihir ( 88907 )

          What you're talking is AGI. Where machine actually develops self-awareness to be able to process data like humans do, removing the need for Big Data. We have not a faintest clue how to make one.

          Ah, that was what he meant. Yes, we do indeed have no the faintest clue how to make that. There is not even a credible theory at this time, and humans have been looking into this with some intensity using an actually scientific approach for more than half a century now.

    • by Luckyo ( 1726890 )

      Utterly irrelevant though. Copyright specifically does not forbid learning from copyrighted material.

      The reason is so obvious, it's strange that there are people who genuinely entertain the idea to contrary. Everything creative that exists today came to being because people who made that creative thing learned from totality of production of humanity that came before them. You cannot ban access to learning material and not break the civilization as it exists today, as it would sever the link between knowledg

      • by gweihir ( 88907 )

        First, ChatAI does not "learn" in the legal sense. What it does is data and parameter extraction. That is fundamentally different. Learning, in the legal sense, requires insight and hence a sentient entity. (If you dispute that I will simply ignore you.)

        Hence all what ChatAI does is derivative work. There are labelling requirements, in some cases the sources have to be stated, some stutt must be market as citation, and there are rather tight limits on what is still fair use, especially so when the result it

        • by Luckyo ( 1726890 )

          I'm intrigued to see where "learning" is defined in such an extremely limited way in legal code, and how that part of legal code is relevant to copyright.

        • First, ChatAI does not "learn" in the legal sense. What it does is data and parameter extraction. That is fundamentally different. Learning, in the legal sense, requires insight and hence a sentient entity. (If you dispute that I will simply ignore you.)

          I know you said you will ignore this, but...

          "In the legal sense" means that a law, or a legal finding, establishes the meaning.

          Unless you can cite a law that defines learning as such, or a court case that finds this to be the definition -it is not.

          In the scientific sense, Machine Learning (a subset of the field of study known as Artificial Intelligence) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performan

        • Simple test: With the individual copyrighted work itself excluded from the training data, if the output is still the same, or substantially similar enough to the output which caused the initial complaint, that is enough to prove the output is not actually derivative of the complainants copyrighted work.

          This would be the ML equivalent of what Ed Sheeran recently demonstrated in terms of chord progressions and how almost any new song or musical performance can sound similar to what someone else has already
          • by gweihir ( 88907 )

            Ah, no. You would need to re-train the whole thing with very large effort and it would need to be done in a clean room. It would also need to be done with exactly the same software and the results would be invalid if they, say, patched something somewhere not exactly known afterwards during the training process.

    • by MrL0G1C ( 867445 )

      Are Getty et al going to sue every artist for copyright infringement if they so much as looked at their copyrighted works and were heavily influenced by them? Because that essentially is what this amounts to.

      I lost any respect for copyrights when the term became too long, copyrights were supposedly granted to encourage creators to create with the knowledge they could be rewarded before their work went in to the public domain - where creative works naturally belong.

  • by RossCWilliams ( 5513152 ) on Friday June 02, 2023 @03:05PM (#63571523)
    So is a student training themselves by going to a museum to study the art there a violation of the museum or artists intellectual property? What the argument about using "intellectual property" to train AI is revealing is the flaw in the whole notion of property rights for non-tangible goods. Whole art movements can be traced to violations of the "intellectual property" of their predecessors.
    • by Luckyo ( 1726890 )

      Exactly. I'm going to repost the same thing I posted elsewhere in this thread, because their argument is not just insane, it's anti-progress:

      "Copyright specifically does not forbid learning from copyrighted material.

      The reason is so obvious, it's strange that there are people who genuinely entertain the idea to contrary. Everything creative that exists today came to being because people who made that creative thing learned from totality of production of humanity that came before them. You cannot ban access

    • Agreed; if humans can learn from data, AI can too. AI, at least, won't do weird things like humans will do (copyright infringement, perjury, etc).
    • From a legal perspective whether it learns from a human or not is irrelevant. Copyright, IP, etc. are human rights, not animal or machine. It falls on those claiming it learns like a human to actually prove it is human.

      If the machine is not exactly like a human, then it will not be considered as such and therefore cannot claim any of the same rights and benefits as a human. Just being able to emulate an aspect of the human mind does not qualify it the same privileges as a human creator does. Otherwise we wo

      • This means that the data of every individual, institution, and enterprise is being treated as fair game for training.

        How do you think a baby learns to interact with the world? Should it pay money to every person it ever hears speaking for the privilege of learning English, just because those persons could be saying something original?

        • Firstly, a baby is considered a human, not a machine. Comparisons between a machine and human mind are irrelevant namely because they are two entirely different things from a legal standpoint. There are many legal precedents set that emphasize the protection of human-created works, not animal or machine.

          Secondly, this is about data accountability, privacy, and trust. LLMs are tools, and as such need to be properly used in a way that does not endanger, exploit, or harm people. The tech industry has done a po

  • by Travelsonic ( 870859 ) on Friday June 02, 2023 @03:11PM (#63571545) Journal
    Just a pedantic reminder... this is the same Getty who will probably (is already?) arguing that the re-creation of watermarks would prove an infringement, either ignoring, or leaving out, that there are also lots of public domain images on their site that have their watermarks smeared all over (and that Getty tries to license out), which seems really weird on so many levels.

    Getty is not an ally, IMO, Getty is scum.
    • by Luckyo ( 1726890 )

      To be fair, they had a far better case with that than they have with this. Copyright specifically does not constrain one's ability to learn from any material. It's well understood that this limit would destroy the natural progress of civilization, which requires those that come in the future to be able to learn from totality of knowledge gained in the past to progress forward.

      There are literal educational exceptions even for actual copyright that exists. Not for this hypothetical "right to block learning fr

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.

Working...