Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Courts

Judge Dismisses Lawsuit Over GitHub Copilot AI Coding Assistant (infoworld.com) 83

A US District Court judge in San Francisco has largely dismissed a class-action lawsuit against GitHub, Microsoft, and OpenAI, which challenged the legality of using code samples to train GitHub Copilot. The judge ruled that the plaintiffs failed to establish a claim for restitution or unjust enrichment but allowed the claim for breach of open-source license violations to proceed. InfoWorld reports: The lawsuit, first filed in Nov. 2022, claimed that GitHub's training of the Copilot AI on public GitHub code repositories violated the rights of the "vast number of creators" who posted code under open-source licenses on GitHub. The complaint (PDF) alleged that "Copilot ignores, violates, and removes the Licenses offered by thousands -- possibly millions -- of software developers, thereby accomplishing software piracy on an unprecedented scale." [...]

In a decision first announced on June 24, but only unsealed and made public on July 5, California Northern District judge Jon S. Tigar wrote that "In sum, plaintiff's claims do not support the remedy they seek. Plaintiffs have failed to establish, as a matter of law, that restitution for any unjust enrichment is available as a measure of plaintiffs' damages for their breach of contract claims." Judge Tigar went on to state that "court dismisses plaintiffs' section 1202(b) claim, this time with prejudice. The Court declines to dismiss plaintiffs' claim for breach of contract of open-source license violations against all defendants. Finally, the court dismisses plaintiffs' request for monetary relief in the form of unjust enrichment, as well as plaintiffs' request for punitive damages."

This discussion has been archived. No new comments can be posted.

Judge Dismisses Lawsuit Over GitHub Copilot AI Coding Assistant

Comments Filter:
  • by Rujiel ( 1632063 ) on Wednesday July 10, 2024 @03:03AM (#64614991)
    "...therefore there are no damages and Microsoft wins again."
    • by nsbfikwjuunkifjqhm ( 8274554 ) on Wednesday July 10, 2024 @03:43AM (#64615051)
      When you're a tech company, they let you do it. You can do anything.
    • In the same way, you cannot prove that your single vote influences the outcome of an election. Is that a reason to abolish voting altogether?
      • In the same way, you cannot prove that your single vote influences the outcome of an election. Is that a reason to abolish voting altogether?

        Yes, you can prove your single vote influences the outcome of an election. Several cases in fact [npr.org]. This one [npr.org] as well.

        • Still it is the same way: a lot of data is used and only in rare cases it is clear that one specific data item can be found decisive. The problem is getting the evidence out when the AI is (a) not yours, (b) the rest of the data is unknown and (c) the algorithms are unknown or prohibitively difficult to understand as well. It is a bit like looking for an iron molecule of what was once your needle in a haystack.
      • Voting isn't an individual activity, it's a group activity.

    • by Rei ( 128717 )

      The standard for copyright is not "did you homeopathically influence the product?".

      Another AI lawsuit bites the dust, despite the relentless cheering of "this will be an open and shut case against AI!" from the Luddites, who seem to think that copyright grants you some sort of a dictatorship to control anything ever created by anyone who so much as glances at your work.

      This was one of the strongest cases against AI, as you have a large model, a proportionally small training dataset, lots of dataset duplicat

      • Another AI lawsuit bites the dust, despite the relentless cheering of "this will be an open and shut case against AI!" from the Luddites, who seem to think that copyright grants you some sort of a dictatorship to control anything ever created by anyone who so much as glances at your work.

        This is pretty much the argument Disney and the other copyright extenders have used over the decades. So it's not like this came up out of nowhere. It's funny how it only applies legally when it's a massive behemoth company claiming copyright infringement. All the little guys pretty much get told their copyright means nothing at all, because someone else has made more profit from it.

        Behold the true winner in all this: Greed, or new God. Profit above all!

        • by znrt ( 2424692 )

          it only applies legally when it's a massive behemoth company claiming copyright infringement. All the little guys pretty much get told their copyright means nothing at all

          copyright was always about big money, little guys never were anything but useful idiots. some of them get little crumbs for the service now and then, most not even that, they can even get ripped off by big money anytime.

      • It's not enough to show that it's theoretically possible for a user to hack the model into replicating something by force-feeding it enough of the original (and hoping that it just happened to be heavily overtrained on that specific original so that it's even capable of doing so - the larger the dataset vs. model size, the less likely that becomes). Or even sneakier exploits, like glitch tokens or whatnot. It's not Adobe's fault if you use Photoshop to draw Donald Duck; it's your fault, and you don't even have to spend hours trying to find ways to sneakily manipulate Photoshop and rely on a lot of luck to be able to do so. If you hack a Google server that's hosting copyrighted Disney data and copy all the data off it, the copyvio isn't Google's fault. They were trying to stop you from doing so. It's not the normal usage of the product. In the normal use case of these models, they don't replicate works, and the copyright system is based on the replication of specific works (There's also character copyright, which is sort of a special case around well-delineated characters, but even that is considered to still stem from works). They're not compositors. They don't collage works together.

        That's quite a lot of conclusion based on such a narrow judgement.

      • by flink ( 18449 )

        If you hack a Google server that's hosting copyrighted Disney data and copy all the data off it, the copyvio[lation] isn't Google's fault. They were trying to stop you from doing so. It's not the normal usage of the product.

        Did google have the right to make and retain that that copy of the image in the first place? Did they have a right to profit off its use? If they weren't profiting off of it, what was it doing on a server being used for commercial purposes alongside millions of other copyrighted image

  • This is correct (Score:3, Interesting)

    by nashv ( 1479253 ) on Wednesday July 10, 2024 @04:56AM (#64615137) Homepage

    No human is born with the ability to speak, code, do sports or pretty much anything. We are educated throughout our life to pick up skill - you start using words, snippets and sentences your parents used. You may even move on to quoting Shakespeare. Using the same words as someone else has does not mean you owe them anything.
    If I read a particular piece of code and type it out, line by line in a different document, am I reusing your code? How will you prove that I even read your code?

    The same goes for AI - it "reads" code, and establishes correlations and patterns. The new thing here is the correlations and patterns it has established. It then will use those same words/code in a completely different instance. That's not very different from what everyone is doing all the time. There is no liability here.

    • Re:This is correct (Score:4, Insightful)

      by Dragonslicer ( 991472 ) on Wednesday July 10, 2024 @05:38AM (#64615179)

      If I read a particular piece of code and type it out, line by line in a different document, am I reusing your code?

      Yes, you are.

      How will you prove that I even read your code?

      But comparing the two pieces of code side by side and showing that they're identical.

      • However, that's not what AI is going to do. The code it "read" was used to train the weights in the LLM. You can't recover the original text because it is mixed in with millions of other examples. If you recover something similar, I suspect it is common enough that you could not claim copyright, like i = 1. The original open source licenses never covered someone reading the code and writing something similar. That's more along the lines of a patent violation. I believe that AI could be guilty of software pa
      • What about all the cases where there's only one rational way to write a particular line? Plenty of people have written identical lines of code independently because they were trying to accomplish the same thing and they all wanted their code to work.

        • No, you aren't going to be found liable for copyright infringement if you have "i = 0" in your code. Once you get beyond a couple lines of trivial code, though, the probability of multiple people writing character-for-character the same thing becomes vanishingly small.
          • by Rei ( 128717 )

            Honestly, that's not true. The similiarity between many frequently-implemented algos (such as say automated growing of buffers in C, which has probably been implemented tens if not hundreds of millions of times) guarantees that plenty will be exact copies.

            Also, copyright is based on a standard of creative endeavour, not "sweat of the brow". Rote or noncreative work cannot be copyrighted (among many other things).

      • That's why I always change the variable names in "copy pasta" code :)~

      • by znrt ( 2424692 )

        But comparing the two pieces of code side by side and showing that they're identical.

        that might be irrelevant.

        oracle tried quite hard to sue google for copying this code literally:

        private static void rangeCheck(int arrayLen, int fromIndex, int toIndex {
        if (fromIndex > toIndex)
        throw new IllegalArgumentException("fromIndex(" + fromIndex + ") > toIndex(" + toIndex+")");
        if (fromIndex arrayLen)
        throw new ArrayIndexOutOfBoundsException(toIndex);
        }

        to no avail. google even admitted that it was a direct copy. still, in this case common sense prevailed, since the judge correctly understood that what google actually copied was the api (which aren't copyrightable as anyone is entitled to provide alternative implementations) and that the code leftover was an oversight besides being the bloody obvious implementation for such a function. "mine mine mine" bullshit d

        • As you said, Google admitted that the code was a direct copy. From there, you get into other questions, such as fair use and damages. In your example, it would be a reasonable outcome by a jury to find that yes, the code is an infringing copy, but that damages are $0.
      • My copyrighted code includes

        for(int i=0:ij;++i) {

        My attorneys are preparing the lawsuits now

        • Thank auto correct for screwing up my loop statement. :(

        • You're neglecting a very important part of copyright infringement - the copy part. You would still need to show that someone copied your code and didn't independently write it. Copyright is not the same as a patent.
          • by micheas ( 231635 )

            You're neglecting a very important part of copyright infringement - the copy part. You would still need to show that someone copied your code and didn't independently write it. Copyright is not the same as a patent.

            Nope. Not in the USA. You only have to prove that it is identical and subject to copyright. Not that they had access to the original. However, a clean room implementation would have a strong argument that there was a lack of creative element in the copyrighted work and there for the work isn't copyrightable.

            • Nope. Not in the USA. You only have to prove that it is identical and subject to copyright. Not that they had access to the original.

              Do you have a citation for this? I've never heard that before, and I don't see anything in the law that would indicate that this is true.

              However, a clean room implementation would have a strong argument that there was a lack of creative element in the copyrighted work and there for the work isn't copyrightable.

              No, you do a clean room implementation so that the people writing the new implementation couldn't have possibly copied any part of the original implementation, which would make copyright infringement impossible. And it's completely absurd to argue that because two different (groups of) programmers wrote different implementations that neither implementation has copyright pr

          • by flink ( 18449 )

            Memorizing a copyrighted work and regurgitating it is a violation. It doesn't need to be a literal copy-paste copy. It also doesn't matter if you claim you came up with a word for word identical work independently if the other was published first. e.g. If I happen to write a song called "Shake it Off" that is word for word identical to the one by Ms. T. Swift, I wouldn't get very far claiming it's a distinct original work, even if it was true.

            But I doubt just a for loop in isolation is considered a creati

            • Memorizing a copyrighted work and regurgitating it is a violation. It doesn't need to be a literal copy-paste copy.

              Yes, this is true.

              It also doesn't matter if you claim you came up with a word for word identical work independently if the other was published first. e.g. If I happen to write a song called "Shake it Off" that is word for word identical to the one by Ms. T. Swift, I wouldn't get very far claiming it's a distinct original work, even if it was true.

              You wouldn't get very far with that claim because the probability of two people independently writing identical lyrics are so close to zero that no jury would ever believe you.

              • by flink ( 18449 )

                You wouldn't get very far with that claim because the probability of two people independently writing identical lyrics are so close to zero that no jury would ever believe you.

                Right so why is not the same thing true for code? You said You would still need to show that someone copied your code and didn't independently write it.. I think that for business logic, not rote boilerplate for standing up a framework, "the probability of two people independently writing identical code are so close to zero that no

                • It is the same for software, and I mentioned that in another post. I was only responding to the stupid post about suing everyone that had ever written a for loop. I didn't feel like also repeating the discussion about trivial/non-trivial code.
    • by gweihir ( 88907 )

      The same goes for AI - it "reads" code, and establishes correlations and patterns. The new thing here is the correlations and patterns it has established. It then will use those same words/code in a completely different instance. That's not very different from what everyone is doing all the time. There is no liability here.

      Bullshit. Electronic storage and processing of data is nowhere legally regarded the same as a human looking at it. While you may be too stupid to see it, the law recognizes that humans and machines are different.

      • by Rei ( 128717 )

        "Electronic storage and processing of data" is widely considered fair use, which is why, among countless other things, developing a search engine isn't illegal.

        • by gweihir ( 88907 )

          Actually, no. There are specific exceptions for web-browsers and search engines are only legal because it is assumed that by publishing you give permission to index. There are no general permissions for anybody to store or process anything they got from the web.

          • by Rei ( 128717 )

            Actually, no. There are specific exceptions for web-browsers and search engines are only legal because it is assumed that by publishing you give permission to index.

            Try suing a search engine for ignoring your robots.txt and let me know how well that goes for you. *eyeroll*. Yes, you can scrape websites. [natlawreview.com]

            It's not just search engines, it's "all Big Data". One of the most extreme examples was Authors Guild, Inc v. Google Inc. Here Google was mass-scanning copyrighted books, against explicit demands from the

          • LOL so just like how humans can read, copy and learn from something that was published.
            • by gweihir ( 88907 )

              Nope. Are you mentally challenged?

              • You're the one that can't see the obvious likeness on points of comparison and have fabricated errant distinctions. If anyone is challenged here it would be you.
      • by cstacy ( 534252 )

        The same goes for AI - it "reads" code, and establishes correlations and patterns. The new thing here is the correlations and patterns it has established. It then will use those same words/code in a completely different instance. That's not very different from what everyone is doing all the time. There is no liability here.

        Bullshit. Electronic storage and processing of data is nowhere legally regarded the same as a human looking at it. While you may be too stupid to see it, the law recognizes that humans and machines are different.

        Companies have been scraping the web since it was invented, feeding that into algorithms, and producing analytical products. With GPTs, the scraped data is probably even more diffused and ground up in the model. And this has always been perfectly legal.

        I don't like it either.

        If there had been licenses on the allowed use of the materials that said specifically they could only be used for humans to directly read, and not allowed for the training of algorithms, there would be a leg to stand on. What has happen

    • by HiThere ( 15173 )

      I think the question is "How large does an identical chunk need to be before it's infringing?". And I don't think there's a reasonable answer.

      • by cstacy ( 534252 )

        I think the question is "How large does an identical chunk need to be before it's infringing?". And I don't think there's a reasonable answer.

        That is ALWAYS the question of "Fair Use".
        There is no specific amount, large or tiny, that determines whether it's a Fair Use.
        There are other factors in the consideration.
        That's why every Fair Use has to be adjudicated.

        In the case of the GPT, there is usually no exact quotation in the model that can be attributed.
        The amount of copied material is legally ZERO,
        from the standpoint of Copyright.

        Sometimes we (society) thinks this is all well and good, and sometimes we don't like it. Different countries have diff

        • by flink ( 18449 )

          In the case of the GPT, there is usually no exact quotation in the model that can be attributed. The amount of copied material is legally ZERO, from the standpoint of Copyright.

          Except derivative works are a thing too and can run afoul of copyright. I don't think anyone has litigated whether the model or its output is a derivative work of the training material yet.

    • by engun ( 1234934 )

      That's not very different from what everyone is doing all the time. There is no liability here.

      I think there's a key point missing here. The amount of information a human can process is naturally constrained, whereas a machine is not. So while superficially, what these machines are doing is roughly the same as a human brain, that doesn't mean that our laws ever envisioned use of this nature and scale.

      We are just dealing with something unprecedented here, and everyone is learning how to deal with it. I wouldn't dismiss these concerns off hand.

      • by cstacy ( 534252 )

        That's not very different from what everyone is doing all the time. There is no liability here.

        I think there's a key point missing here. The amount of information a human can process is naturally constrained, whereas a machine is not. So while superficially, what these machines are doing is roughly the same as a human brain, that doesn't mean that our laws ever envisioned use of this nature and scale. We are just dealing with something unprecedented here, and everyone is learning how to deal with it. I wouldn't dismiss these concerns off hand.

        You have hit the nail on the head: nobody quite saw this coming (even though it's not even that different from what's been going on for many years).

        As for the models already trained, and maybe even the continuing and future revision of those models, the horse is already out the burning barn. You lose.

        When there are new laws about this, ten years from now, will they apply to GPT 211.13 which was trained between 2017-2034? Maybe if you can create a GPT before the new laws come into effect, you can claim every

        • You are missing the experience-flywheel effect. As millions of programmers solve their tasks with Copilot and GPT-4, lots of traces of (problem, approach) are going to be created and iteratively corrected in the LLM session. Training on these traces will make LLMs smarter and smarter, while Stackoverflow will slowly die off. Everyone will NEED to use AI to solve tasks if they are to remain competitive, bringing even more experience into the chat logs. This is not just for coding, but for all LLM tasks, even
      • > The amount of information a human can process is naturally constrained

        Used to be so before 1995, but now you got search engines to connect you to an unbounded amount of information. You can't read everything, but you can find anything in a huge amount of web text.
      • That doesn't seem like much of a key point. So what if they can do something faster and at larger scales? It doesn't change the underlying principals of fairness.
  • Would the decision be different if the AI was writing songs with complete lines picked from one song but few other lines from some other song ?

    • by phantomfive ( 622387 ) on Wednesday July 10, 2024 @05:32AM (#64615171) Journal
      From the judge's decision [courtlistener.com], it seems like they failed to find even a single example of copilot producing copied code. So presumably if the AI were writing complete lines, it would be different.
    • That would be a transformative work. The bar is pretty low for transformative work.
    • by Rei ( 128717 ) on Wednesday July 10, 2024 @06:45AM (#64615253) Homepage

      Oh God, the "AI music lawsuit" (UMG Recordings, Inc v. Uncharted Labs, Inc) is terrible. The recording industry is arguing exactly opposite what they were arguing like six years ago with the Blurred Lines lawsuit**. I can't wait for the defense response, because they're surely going to just massively quote the plaintiffs in their defense ;)

      Basically, Marvin Gaye's family sued Robin Thicke and Pharrell Williams over copyright infringement over the song "blurred lines" for sharing some musical similarities. The recording industry realized that having a lax standard on what musical similarities count as infringement would be disastrous for them, as huge amounts of their catalogs could be considered infringing. So they filed an amicus brief [techdirt.com] arguing:

      new songs incorporating new artistic expression influenced by unprotected, pre-existing thematic ideas must also be allowed.

      and:

      Most compositions share some elements with past compositions—sequences of three notes, motifs, standard rhythmic passages, arpeggios, chromatic scales, and the like. Likewise, all compositions share some elements of “selection and arrangement” defined in a broad sense. The universe of notes and scales is sharply limited. Nearly every time a composer chooses to include a sequence of a few notes, an arpeggio, or a chromatic scale in a composition, some other composer will have most likely “selected” the same elements at some level of generality.

      To keep every work from infringing — and to keep authors from being able to claim ownership of otherwise unprotected elements — this Court has stressed that selection and arrangement is infringed only when there is virtual identity between two works, not loose resemblance. The same principle should be recognized for music.

      and:

      To prevent nearly every new composition being at risk for liability, copyright claims based on “original contributions to ideas already in the public domain,” Satava v. Lowry, 323 F.3d 805 (9th Cir. 2003), are seen as involving a “thin copyright that protects against only virtually identical copying.” Id. at 812; see also Ets-Hokin v. Skyy Spirits, Inc., 323 F.3d 763, 766 (9th Cir. 2003) (“When we apply the limiting doctrines, subtracting the unoriginal elements, Ets-Hokin is left with . . . a ‘thin’ copyright, which protects against only virtually identical copying.”); Rentmeester v. Nike, Inc., 883 F.3d 1111, 1128-29 (9th Cir. 2018). This Court has long recognized this principle in claims involving visual art that allegedly creatively combines public domain elements, as with the sculptures in Satava or the photographs in Ets-Hokin and Rentmeester. The same should apply to music.

      But then they base UMG Recordings, Inc v. Uncharted Labs, Inc on exactly what they said should not be the standard, taking things that aren't "virutally identical copying", sequences of three notes, motifs, standard rhythmic passages, arpeggios, chromatic scales, and the like, and declaring that to be copyright infringement.

      But it's even worse - and I can't wait to see the defense response and the judge's reaction to it - because on top of that they're playing a game of "Million Monkeys On A Million Typewriters" without telling the court. E.g. not only deliberately trying to get the model to create a copyrighted work by "leading it on" (a concept of which the judge in this Github case just smacked down), but doing so over and over and over again until they can get a few seconds of similarity - but not mentioning that they did this in the filing.

      Here's an example - they cite this 4-second clip [youtube.com] claiming that when

      • by Rei ( 128717 )

        Whoops, sorry, that was the Stairway to Heaven case, not the Blurred Lines case.

        • by cstacy ( 534252 )

          Whoops, sorry, that was the Stairway to Heaven case, not the Blurred Lines case.

          Yeah, because Marvin Gaye Estate *PREVAILED* in Blurred Lines. Which I thought was a bullshit outcome.

      • by Rei ( 128717 )

        Actually, I forgot my favourite part of the case:

        ***NOTICE TO ATTORNEY REGARDING DEFICIENT COPYRIGHT FORM. Notice to Attorney Moez M. Kaba to RE-FILE Document No. 5 AO 121 Form Copyright - Notice of Submission by Attorney. The filing is deficient for the following reason(s): the PDF attached to the docket entry for the AO 121 Copyright form is not correct; Do not use the sample form. Re-file the document using the event type AO 121 Form Copyright - Notice of Submission by Attorney found under the event list

    • by cstacy ( 534252 )

      Would the decision be different if the AI was writing songs with complete lines picked from one song but few other lines from some other song ?

      We will see, with the image/video GPTs.

      The code-based ones, too hard to tell; the usual problem with copied code, so, No.

      The text-based ones don't usually have sufficient attributable output, so, No.

  • by AleRunner ( 4556245 ) on Wednesday July 10, 2024 @06:25AM (#64615229)

    So in fact didn't dismiss the lawsuit, just some aspects of it.

    • by gweihir ( 88907 )

      Indeed. And the license claims could be disastrous. Usually, you can either force them to respect the license (which they probably cannot do) or to stop using the work (which means retraining the whole model).

      • by Rei ( 128717 )

        This wasn't a ruling on the merits of the claim - it was a ruling on whether it's even possible to make such a claim. 20 of the 22 claims have been thrown before the trial even begins.

        And "they probably cannot" put in a filter for the specific works of the plaintiffs in this case? They can't stop using the work in training? Really? *eyeroll*. Also, you seem to have confused "stop using" with "simulate going back in time and undo the effects of having used it in the past".

        • by gweihir ( 88907 )

          You really are clueless, are you? The only way to remove anything from an LLM is to delete the LLM. Hence they would have to do a full retraining of their models.

          • by Rei ( 128717 )

            I'll repeat:

            You seem to have confused "stop using" with "simulate going back in time and undo the effects of having used it in the past".

            Nobody is using their code anymore, except possibly to train new future models. The code does not exist in the models.

            • by gweihir ( 88907 )

              They are continuing to use that data if they use an LLM trained on that data. Try to keep up. Legally, an LLM is just a processing result of its input data. Copyright violations do not go away because you use data derived from the original data you had no business using. Seriously, it is not that hard: You steal the data, you have to stop using _everything_ you made from it when caught.

    • by Rei ( 128717 )

      (22-claim lawsuit whittled down to just two claims before the trial even starts, including most of the potential damages)

      Luddites: "Here's why this is good news for our case!..."

  • It's not piracy if the output is not verbatim copied from a copyright protected content. Training on "A" and generating "B" is not piracy. And if that was piracy or copyright infringement, then we are all liable for code that even remotely is like a copyrighted code.
    • by flink ( 18449 )

      It's not piracy if the output is not verbatim copied from a copyright protected content. Training on "A" and generating "B" is not piracy. And if that was piracy or copyright infringement, then we are all liable for code that even remotely is like a copyrighted code.

      If I trace over your painting and color it in differently and then sell it, I'm profiting off of a derivative work of your original copyrighted material, and you would probably win if you sued me. It's yet to be seen whether producing B after training on similar thing B constitutes an legally equivalent situation.

  • Unfortunately, a company can just drop almost any claims they want into their ToS and it seems to empower them with whatever rights they claimed. So did Microsoft not even bother to update the Github ToS, did their actions exceed whatever legal rights they gave themselves, or are the plaintiffs challenging the power of the ToS? After investigating the linked PDF with the complaints, claim 191 states:

    GitHub made certain representations to Plaintiffs and the Class to induce them to publicly post their code o

  • The real test is to start typing the code of a GPL project and see if it suggests the rest of the copylefted code. If it does, they're caught red-handed.

To communicate is the beginning of understanding. -- AT&T

Working...