Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Books Piracy

Is AI Training on Libraries of Pirated Books? (nytimes.com) 96

The New York Times points out that so-called "shadow libraries," like Library Genesis, Z-Library or Bibliotik, "are obscure repositories storing millions of titles, in many cases without permission — and are often used as A.I. training data." A.I. companies have acknowledged in research papers that they rely on shadow libraries. OpenAI's GPT-1 was trained on BookCorpus, which has over 7,000 unpublished titles scraped from the self-publishing platform Smashwords. To train GPT-3, OpenAI said that about 16 percent of the data it used came from two "internet-based books corpora" that it called "Books1" and "Books2." According to a lawsuit by the comedian Sarah Silverman and two other authors against OpenAI, Books2 is most likely a "flagrantly illegal" shadow library.

These sites have been under scrutiny for some time. The Authors Guild, which organized the authors' open letter to tech executives, cited studies in 2016 and 2017 that suggested text piracy depressed legitimate book sales by as much as 14 percent.

Efforts to shut down these sites have floundered. Last year, the F.B.I., with help from the Authors Guild, charged two people accused of running Z-Library with copyright infringement, fraud and money laundering. But afterward, some of these sites were moved to the dark web and torrent sites, making it harder to trace them. And because many of these sites are run outside the United States and anonymously, actually punishing the operators is a tall task.

Tech companies are becoming more tight-lipped about the data used to train their systems.

This discussion has been archived. No new comments can be posted.

Is AI Training on Libraries of Pirated Books?

Comments Filter:
  • by serviscope_minor ( 664417 ) on Monday July 24, 2023 @07:47AM (#63710850) Journal

    Ignoring where they got the books from four now.

    It's not even clear that training on a copyright protected book is infringement. Legally speaking the legislation doesn't cover it and there has been no significant court rulings to set precedent.

    Aggregate statistics are not coined as a derived work . If you publish the word or letter frequency of various copyright books you're in the clear. If you make a system which emits long fragments of text from on of those books, you would not be. No one knows where sufficiently advanced statistics leave fair use and become infringement.

    Lots of people have opinions, but until we get precedent or legislation it's still up in the air.

    H4x0ring the books first though, that's clear.

    • by mysidia ( 191772 ) on Monday July 24, 2023 @07:59AM (#63710870)

      It's not even clear that training on a copyright protected book is infringement.

      Well the act of Downloading the works unauthorized by the copyright owner creates an Infringing copy, So it could be said that their company are Infringing by retrieving the books from the internet and saving a copy in order to train with

      The Act of Training is probably Not infringement -- I'm guessing they would be highly unlikely to prevail on suing for infringement on the basis of training alone, But downloading an Illegal copy of something, on the other hand is an Infringing act, and it's kind of necessary before the training process can be done, So you could say that the resulting dataset is the product of some illegal activity.

      • by Pinky's Brain ( 1158667 ) on Monday July 24, 2023 @08:13AM (#63710894)

        The only defense against infringement of the copies made during the act of training is fair use and I think it's a very weak defense.

        I think courts would be making a mockery of separation of powers to even deem themselves fit to rule it fair use. This clearly belongs in congress.

        • by dfghjk ( 711126 ) on Monday July 24, 2023 @09:51AM (#63711170)

          "I think courts would be making a mockery of separation of powers to even deem themselves fit to rule it fair use. This clearly belongs in congress."

          That would be true only if existing law had nothing to say. It clearly does. There is no entity more "fit", by definition, to rule on "fair use" than the courts. Why you think it is a "fair use" issue is another matter.

          • It's an entirely new use case which has no parallel in use during the original framing of the law.

            The level of framing needed to construe it as coming from the law itself or the intent of lawgiver would be pure sophistry. This is law making, not judging.

            • If the scenario is even remotely covered by fair use, then this is judging by existing law. Even adjudicating this is not within fair use scope in the current law is a judicial act that belongs in the court. Not an act of legislating.

              Now you can make the case current law is insufficient and the legislature needs to step in and revise and evolve the law. You might even argue they should get ahead of this and act now, instead of waiting for bad outcomes (or bad precedents) to force their hand.

              But the system d

              • I didn't say they weren't fit to rule it not fair use, I said they weren't fit to rule it fair use. Fair use is the exception, just going with the default is fine. Adding a new category for the exception with not a hint in the law or the lawmaking to justify it is not.

                • by mysidia ( 191772 )

                  I didn't say they weren't fit to rule it not fair use, I said they weren't fit to rule it fair use.

                  That really just sounds like "I don't like fair use or free speech". Of course the courts can find anything fair use. Fair Use stems from the 1st amendment, although there is also now statute codifying fair use; and its' the Courts' job explicitly to interpret what copyright law and the constitution mean, and make the findings of law about how these apply to different cases, and Balance 1st Amendment F

          • PS. I think it's a fair use issue, because OpenAI&Co have no other defense. It's fair use or they're proper fucked.

        • The courts <i>invented</> fair use. See Folsom v. Marsh and (going back further) the Statute of Anne. One could argue they should have instead just struck down copyright-enabling legislation on the basis of it conflicting with the first amendment.
          • by danda ( 11343 )

            ...and conflicting with basic human freedom.

          • by DarkOx ( 621550 )

            It would be nonsense to suggest that 1A prevents copyright legislation. Sure its one of the original amendments but like all other amendments it could have done more more than just append to the document. If 1A was intended to be understood to conflict with copyrights, It would have modified Article 1 Section 8, since it did not the only sensible thing to do for the courts is to read 1A in a way that does not prohibit copyright.

            • by mysidia ( 191772 )

              If 1A was intended to be understood to conflict with copyrights, It would have modified Article 1 Section 8

              They wouldn't have to amend it; the 1st Amendment comes later, so anything in it is understood to supercede prior text. Article 1 Section 8 only allows congress to create an exclusive right to Authors and Inventors -- that is Not the actual copyright statute.

              The copyright law is passed later as a normal law, And like all laws still subject to all the restrictions in the constitution as a whole, whi

        • by mysidia ( 191772 )

          The only defense against infringement of the copies made during the act of training is fair use and I think it's a very weak defense.

          Well the act of making extra temporary copies during your training process will be fair use almost certainly, providing they actually received a legal copy in the first place -- it's well established that it would be fair use to make a backup copy, and ephemeral copies on your system for your own purpose which you DONT provide anyone else access to.

          It's not Fair Use to downl

      • by Rei ( 128717 ) on Monday July 24, 2023 @08:31AM (#63710918) Homepage

        Well the act of Downloading the works unauthorized by the copyright owner creates an Infringing copy

        Except that it's not. Google's entire business is built around downloading, storing, and reading other peoples' copyrighted works without permission. There's a massive hole in copyright law carved out for automated processing of copyrighted works. Even when Google started posting entire pages of books that they scanned without author permission and against author wishes, verbatim, a court found in their favour.

        • Implied license doctrine applies, as people generally want their pages found and read. Also robots.txt was established long before google.

          People don't generally want their content used to train AI without being compensated.

          • by Rei ( 128717 )

            It has nothing to do with "implied license doctrine". In the EU, it's TDM. In the US, it's Fair Use.

            TDM = Text and Data Mining, a specific excemption to copyright law, first introduced in the UK, then spread to major EU powers, and now being implemented in the EU as a whole. Explicitly codifies that data mining is A-OK.

            The exemptions that companies like Google have fallen under in the US (incl. the aforementioned book-scanning case) are Fair Use - it's okay to copy a work to extract information that's not

            • It's a good thing that in the EU (note the UK is not in the EU) we have the right to ask companies what personal information they collect about us, and require them to erase / stop using our personal information any time we like.

              Would be a shame if people in the EU started requiring the AI companies to remove individiual data points containing private information from their training sets, and to cut out the mathematical influence of said data points from the trained model. A kind of targeted AI lobotomy i

              • by dfghjk ( 711126 )

                What precedent exists for ordering people to forget what they've learned? Why would a trained AI be different?

                Illegally obtained data could be ordered removed from training sets, and I don't see what would be the "shame" in doing that. "Cutting out the mathematical influence" would be essentially the court ordered destruction of property though. Could happen, so people who generate that property should consider whether they should use illegally obtained information in doing so. Making the information ill

                • by mysidia ( 191772 )

                  What precedent exists for ordering people to forget what they've learned? Why would a trained AI be different?

                  It's not that people have to forget what they learned. The issue is they connected to a Website the shadow library - the website they downloaded files from committed copyright infringement (By distributing files they weren't allowed to distribute it), And if the person downloading Know about the infringement, then the person downloading is Liable as well.

                  It is like if you paid for a CD and ha

            • Which was for scanned books, for crawling and search in Field v. Google they did use the implied license as defense in addition to fair use.

            • by dfghjk ( 711126 )

              "Google's servers aren't copying a book to read and enjoy it"
              but it is an assumption that this holds for AI training as well, and what's the legal definition of "read and enjoy"?

              Also, "fair use" is the allowing of "limited use" of copyrighted works without prior permission. The question here is legal access to the works AT ALL.

              From the link that you told others to read:

              "authors and publishers had expressed concern that Google had not sought their permission to make scans of the books still under copyright"

              • by Rei ( 128717 )

                All of your "but they were focusing on Google posting the stuff online!" arguments implies that there wasn't even any challenge to the notion that Google possessing copies of the books was illegal, which would itself undercut your point. Except:

                "authors and publishers had expressed concern that Google had not sought their permission to make scans of the books still under copyright"

                ... is itself an explicit declaration that they considered Google's copying of the books to be illegal. Except they lost.

                Ever

        • That's incorrect. Google have a special one-off-only-for-google dispensation in the USA due to having won the class action suit you mention: https://en.wikipedia.org/wiki/... [wikipedia.org]. You on the other hand are not Google, and you do not get to break copyright law. Neither do AIs. Maybe if you get sued first by some American author who purports to represent all book authors world-wide and if you also win, then you get to grab anything you want for the "public interest" too. And only in the USA.
          • by Rei ( 128717 )

            There is no section of the law that singles out Google itself. Indeed, that would be unconstitutional to do so.

          • by mysidia ( 191772 )

            That's incorrect. Google have a special one-off-only-for-google dispensation in the USA ...

            No.. Google WANTED a special dispensation. That settlement proposed settlement was eventually denied by the Judge; It was far too advantageous for them (almost more a reward than a punishment), so Google never got that dispensation - the case was not able to be settled; then the courts looked at the questions of class certification and fair use.. It was determined by the court to be Fair Use, and the Author

        • Books that they scanned? So physical copies that were already paid for.
        • by mysidia ( 191772 )

          Google's entire business is built around downloading, storing, and reading other peoples' copyrighted works without permission

          The facts surrounding Google's search engine are Different

          Generally you could rely upon the fact that publishing a website implies the website operator granting users permission to download and read the files. And you can make use of materials obtained legally due to Fair Use.

          However, Spidering the web is very different from downloading materials you have reason to know are being

      • by tlhIngan ( 30335 )

        But the courts have said the downloader is not responsible for the copyright infringement - it's the one sho made it available and sent it to the downloader that is actually infringing.

        Thus, I suppose it's infringing if the AI decides to cough up the text of the book but without the attribution to source. And then if you take that and publish it, there's secondary infringement as well.

        And AI has been known to do it - GitHub's CoPilot has spit out GPL2 code, complete with license block at times. Many image g

        • I'm not sure the "clearance tools" method will work. I can easily imagine ambulance chasing lawyers getting access to each AI tool and coaxing it to spit out GPL2 code (or a copyrighted book etc). At that point they have proof of a violation, and they can go to court or "settle" with the AI company. Then afterwards they can search for any known client of the AI company and write in a letter that they suspect that said client is using infringing output from the AI, since they've already proved that the AI is
        • by cstacy ( 534252 )

          But the courts have said the downloader is not responsible for the copyright infringement - it's the one sho made it available and sent it to the downloader that is actually infringing.

          Can you elaborate on which court said that?

          In the US, people are prosecuted for Copyright infringement all the time for downloading media like music recordings, TVs and movies, and books. Because downloading is making a copy. There are additional crimes for the providers who are making nd distributing copies.

        • by narcc ( 412956 )

          And AI has been known to do it

          It gets a lot less interesting when you learn why this happens. See, there's not room in the model to store more than a very tiny fraction of the information from the training data, it varies, of course, but that usually translates to a few bytes at most. The information that is ultimately encoded isn't even what is most unique, but what is most common.

          For the model to have enough information to reproduce anything verbatim one of two things needs to happen: Either the work in question was included in the

      • You can argue that, but I believe that generally speaking the accepted legal view is that the person distributing an unauthorized copy is the infringer, not the person receiving it. In the pre-internet age that meant the person selling bootleg copies of books, tapes, etc. was the criminal rather than the person buying them. And online that became the person providing the copy to download.

        Which is why virtually all of the MAFIAA copyright lawsuits targeted at private individuals were for using peer-to-peer

      • "Ignoring where they got the books from four now." Literally the first sentence.
      • by m00sh ( 2538182 )

        I do not authorize you to read this copyrighted text.

        • by mysidia ( 191772 )

          I do not authorize you to read this copyrighted text.

          That's okay; In order to post you had to agree to Slashdot Terms of Service where you warranted that you have the rights to the text and grant Slashdot Irrevocable sublicensable rights to the text.

          Slashdot in turn authorizes me to receive and read the text pursuant to Terms of Service.

      • You get in legal trouble for sharing (uploading) copyright content you don't have rights to share. When you get a copyright notice, its for sharing. If you were liable for receiving content then I could compel people into breaking the law with a silly banner ad campaign.
        • by mysidia ( 191772 )

          If you were liable for receiving content then I could compel people into breaking the law with a silly banner ad campaign.

          You can be liable for receiving when you connect to someone else's server and Initiate an action which causes them to infringe copyright. If you know the content to be infringing and request download of it.. Then it's exactly like any other where someone conspires with a 3rd party to break the law for their own benefit; such as in a murder for hire or any other kind of "illegal acti

      • If this is upheld by the courts, training will simply move to data centers in countries that allow it and nothing else will change.
        • by mysidia ( 191772 )

          If this is upheld by the courts, training will simply move to data centers in countries that allow it

          They can, but different countries' copyrights are governed by international treaties, and the the second some major rightsholders feel threatened; we're going to see a DMCA Version 2 and WIPO treaty version 2 get passed provide a rapid takedown process for any service found doing that.

          Even with no extra actions; the second they start offering services in the US Or copy the training data into the US

    • If training is fair use you still need the digital copy. Even if fair use, breaking the DRM to get access to the text might not be legal.

      If it's not fair use the industry is proper fucked. The fact that the model is or is not a derived work is besides the point, they made dozens of intermediary copies in the process of training. They have no implied license for that.

      • by dfghjk ( 711126 )

        "...they made dozens of intermediary copies in the process of training. They have no implied license for that."

        You have a lot of work to do to prove that. You assume they make "intermediary copies" at all, then assume that those copies aren't fair use.

    • Watch corporate personhood be extended to AI machines. If AI has some sort of personhood, why can't it watch movies, observe, consume and create its own art? Why can't it have the right to speech, give opinions, etc?

      • Very interesting. Of course an AI can easily be cloned, just copy the files and bring it up on another server. How would that work with personhood? Does right-to-life apply in this case? (we're talking asexual reproduction) Do all the new copies also get personhood? What if one of the copies kills a human being, do all the copies go to jail or just one of them?
    • by dfghjk ( 711126 )

      "Aggregate statistics are not coined as a derived work"

      Who has mentioned "aggregate statistics", "coining" or "derived works"? The issue in the article is "where they got the books", which you've decided to "ignore".

      "It's not even clear that training on a copyright protected book is infringement."

      Nor is that the subject.

      "No one knows where sufficiently advanced statistics leave fair use and become infringement."

      and you're use of "infringement" is unrelated to that of the article.

      "Lots of people have opinio

      • I love how you were so angry at me not addressing the point you wanted me to address that you posted your rant anyway when when you realised I had addressed it.

        Do you need me to be more explicit? Pirating books is bad mmmkay. If you pirate books you're bad mmmkay.

        Then downloading pirated copies isn't especially interesting, but whether or not training on in copyright books is piracy is interesting. Do you normally flip your shit and act like a wanker when your friends want to talk around a topic, or do you

      • There's a fair argument to be made that where they got the books doesn't legally matter.

        Copyright is easily enforced against anyone making and distributing unauthorized copies - a.k.a. "pirate libraries" are definitely guilty of distributing unauthorized copies of copyrighted works.

        However, *receiving* an unauthorized copy is generally not regarded as crime, or at least not a crime worth trying to prosecute. Which is why you see lots of lawsuits against individuals downloading from Napster and other peer-

        • And in the digital world making copies alone *can't* be regarded as a crime - not without completely rewriting copyright law to explicitly allow for the multiple copies made by your computer whenever you play a song or "move" it to a different drive.

          The entire premise of the validity of EULAs is that you're not authorised to make copies at all, including from where the software came from the storage device to RAM. Therefore to have permission to make that copy you need a license, hence the EULA. This is why

          • I stand corrected. I should know better.

            EULAs entire existence is based on extremely shaky legal reasoning (starting with the idea that someone can be bound by a contract they've never read or consented to, and degrading from there). It's probably a bad idea to base any further reasoning on anything decided around them.

            Sadly, thanks to the incompetence and/or "motivated reasoning" by courts and legislatures pressured by copyright and patent maximalists, we've gotten into a situation where adding the words

    • #1 simple possession, for not-for-profit reasons, of large amounts of copyrighted goods has been copyright violation per se since the mid 90s. It's one of the laws that ol' Bob Goodlatte (the author of the DMCA) helped pass.

      #2 This is a commercial venture, and the works are part of the training data. I can't see how that is anything but a commercial copyright violation because those works were not licensed for that purpose. The books themselves were licensed to be enjoyed in a particular capacity, and that

      • #2 This is a commercial venture, and the works are part of the training data. I can't see how that is anything but a commercial copyright violation because those works were not licensed for that purpose.

        Let's say google or whoever bought one copy of each book. They probably didn't in this case, but to address your point let's assume that.

        At that point you don't need a "license agreement" (probably). Copyright protects (legally) copying of works, but nothing else about them. If you own a copy of the work, yo

    • Yes, I agree, the AI model generates so averaged content, that if somebody manages to prove copyright infringement, then we can prove that every copyrighted author infringed our language and cultural heritage as well.
    • by m00sh ( 2538182 )

      It's called a library. You don't have to buy every book that you want to read a single paragraph of.

    • Ethically, I don't see a clear abuse of copyright if an AI or a person learns from a purchased copy of a book. There is a violation of copyight if a person or AI consumes a stolen copy of a work.

      We make an explicit exception to copyright for libraries because they serve the public good. The question is "What defines a library?" Is it physical copies on shelves in the Libray of Congress? Digital scans of physical books in the Internet Archive? Zbook?

      As a book hoarder from a time when "information wants

  • So the shadow libraries can exist but not to train AI with?
  • by ZERO1ZERO ( 948669 ) on Monday July 24, 2023 @08:28AM (#63710914)
    So folk are all upset about AI being trained on pirate books. (are people upset at the pirate books, or the AI being trained on it ? what about if the books were bought ? )

    Give it a short while and these AI will be training on audio recordings, and videos, every holyywood movie ever made. When will the day come where we can request what ever movie we like, with what ever actors and have a custom movie delivered via netflix 3000 16K (tm) ? or via my neural implant?

    I for one, welcome our AI overlords and will be toiling in underground sugarcaves

  • by HamidPayaamAbbasi ( 7143815 ) on Monday July 24, 2023 @08:38AM (#63710930)
    These libraries are boons to humanity and the people trying to restrict knowledge, often that they just bought the rights to and didn't even create, are the villains of humanity.
    • by nagora ( 177841 )

      These libraries are boons to humanity and the people trying to restrict knowledge, often that they just bought the rights to and didn't even create, are the villains of humanity.

      How many of your books are on there and how do you propose to eat without an income?

      • by kaur ( 1948056 )

        How many of your books are on there and how do you propose to eat without an income?

        Offtopic, but:
        In our country the libraries count book lendings, and the authors get paid by the government for their work being used. The system tries to compensate (some of the) lost sales to publishers and authors.

        Libraries and other copyright-related systems are ours design. They are not set in stone. We can change them to benefit the society as we see fit.

        • by edis ( 266347 )

          While I have come over vast amounts of stolen books, collected on sites clearly originating in russia. While their public figure, and toy president used by putin, Dmitry Medvedev doesn't hesitate to clearly announce, this is their way to make impact on the West, disrupting the income of publishers. Obviously, he does support and encourages the act. Most of books are copies of Western publications. You pick the author, and you find all his works made available. This is part of hybrid war, nothing else.

      • It's hard to make a living from writing nowadays, but aside from the structural inequalities built into the publishing model, books have been used as training material for centuries. I remember when I took a creative writing course, and the professor said to read as much as possible, understand the stories as a whole, and deconstruct how they convey emotion and point of view down to the order of words in a sentence. In other words, is the human equivalent of the LLM training process.

        today, artists will s
        • by narcc ( 412956 )

          Again the human equivalent of an LLM training scenario.

          While I'll agree that raining a model on copyrighted works shouldn't be considered infringement, I don't think this is the right argument. That is, I don't think that you can meaningfully compare an artist examining a painting or an author studying a book to training a model. Those things are similar in only the most superficial ways.

          So, my question would be, does a person who learns how to write by reading other authors' works or the creator of those works any money? Does the artist that learned painting by looking at works in a museum oh those creators owe them money?

          Ignoring the other problems for a moment, I'll point out that things not protected by copyright include style and technique.

          While we're on the subject, neither can you copyri

        • My question to you would be: why do you want to treat a non-human as a human? An LLM is clearly non-human, it cannot be deemed a member of the club of humans. Now the members of the club of humans have certain rights that entities which are not members of this club do not. One such right is to read and deconstruct a written work, usually under the proviso that the work was bought and paid for in advance. There is no reason to extend this right to non-human entities, and many humans will infact vehemently op
          • I think language is evolving, it uses humans as the medium, but it essentially replicates ideas. Ideas have their own lifecycle, they evolve faster than biological agents. Now AI has been feeding at the same language corpus and can do a lot of tasks, and retains much general knowledge. I attribute all abilities to language, not humans or LLMs.

            In this context I believe LLMs should be permitted to access all knowledge (ideas) from copyrighted sources, but taking care not to memorise the expression exa
          • yes, I do want to treat a nonhuman as a human. LLM systems are not a parrot because they will produce original concepts. Anytime it appears to regurgitate something it learned, it's acting like a human. Ask any teacher how many students have regurgitated content they were studying but not yet understood. as LLM systems improve, there will be better anti-plagiarism capabilities.

            like it or not, LLM systems act like humans when training on information and then are asked to solve problems. For that reason, I
      • How many of your books are on there and how do you propose to eat without an income?

        Whatever the answer to this question, it doesn't invalidate the original point of your parent poster.

        Also, there are a lot of false assumptions baked into your quenstion.

        • by nagora ( 177841 )

          How many of your books are on there and how do you propose to eat without an income?

          Whatever the answer to this question, it doesn't invalidate the original point of your parent poster.

          Also, there are a lot of false assumptions baked into your quenstion.

          There are a lot of generalisations, but that doesn't make them false. I do assume that the poster I replied to is a lazy cunt, but I'm pretty confident in that one. His post has many superficial and false assumptions baked into it too, wrapped up in a knee-jerk phrasing to pull in sympathy from the hard of thinking.

          Writers spend time writing - much more than you do reading the output. They need to eat and that time is an opportunity cost unless they are compensated for it. Sure, there are problems with the

          • I do assume that the poster I replied to is a lazy cunt, but I'm pretty confident in that one. His post has many superficial and false assumptions baked into it too, wrapped up in a knee-jerk phrasing to pull in sympathy from the hard of thinking.

            Being obscene doesn't make you right, you know.

            Writers spend time writing - much more than you do reading the output. They need to eat and that time is an opportunity cost unless they are compensated for it.

            Says who? Nobody owes anybody a living. I stay awake all day and spew nonsense, but I also have a job. I suggest to the average writer, if they want to eat, they should get one, too.

            Just stop for a second and think -- or try to argue -- what legitimacy has an artist, any artist, for asking for money.

            I'll wait here.

            And when you're done, you'll realize that there is only one: because generally, we consider art and culture kind-of "nice to have" enough, so we, as

  • It's not like Google doesn't already have an existing, massive corpus of books to train their AI.
  • And precious few of you have explored other economic systems besides capitalism:
    Island - Aldous Huxley - https://en.wikipedia.org/wiki/... [wikipedia.org]
    The Dispossessed - Ursula K. Le Guin - https://en.wikipedia.org/wiki/... [wikipedia.org]
    Culture series - https://en.wikipedia.org/wiki/... [wikipedia.org]
    • It is in a post scarcity economy where every material need is available to citizens for free. As such it is irrelevant to our societies where scarcity remains the order of the day because we don't have AIs clever enough to do every job and drones able to do the manipulation and heavy lifting.

    • Interesting:

      Dispossessed first occurred to Le Guin through a vision, revealed as if seen from a distance, first as a male physicist, his thin face, clear eyes, large ears, possibly recalling a memory of Robert Oppenheimer, and a vivid personality

  • If someone makes a copy of a book and gives it to me, I am not the infringer. If I read the book and note how many times word X occurs near Y (and do some other fancy statistics), I am still not infringing. If I report these statistics or use them in an app, that too is not infringing. That seems pretty clear.

  • So the AI that sniffs out Licence-Plate criminal tendencies was trained on illegal (criminal) material?

    Where does it begin? Isn't it true that all of us are criminals down to things like a "California Stop" at a stop-sign?

    Doesn't training AI on humans create an ethical quandary that is already being used to track down more "unethical",
    law-violating humans? And people think this is somehow going to better society?

  • I trained on libraries of pirated books.

  • Forgetting for a minute that GPT-x is not anything like "AGI" at this point, and may never be, though that may be beside the point, practically speaking.

    Our minds are unarguably "NGI" -- when we read books (copyrighted or not) our minds/brains compile that information down into memories, such that we (given the aptitude/inclination) could write another book in the same style, same characters, etc., which GPT-x+n presumably at some point could also do, even while still not being anything like an "AGI".

    Does a

    • by cstacy ( 534252 )

      Since we now have competing vested-interests beyond "The People vs. Copyright Holders", maybe this is a watershed moment for copyright law to get things back to something reasonable, like 7-year terms (with a limited number of extensions for living authors: maybe three or four) for copyright.

      If the AI's are not COPYING the books -- and things like ChatGPT are not -- then there is no Copyright case. Abstract information is explicitly NOT protected by Copyright.

      Your suggestion of changing the law to reduce Copyright is exactly the opposite of any changes that might be made.

      The more likely outcome would be new restrictions that make it illegal to "use any machine, now known or later conceived, to compile statistics or otherwise analyze any protected work".

      Meanwhile, the Copyright owners will have

  • The costs of publishing ie. editing, printing, distributing, and advertising have all dropped steadily esp. over the past two decades.

    The cost of published material however has risen steadily.

    And that's the problem. So while publishers charge $30+ for a hardcover, $25+ for a softcover, $100+ for a textbook, almost the same for the E-COPY as for a paper versions, and charge $30+ per journal paper (even though they're written, edited, and submitted for free)... then expect people to seek out alternatives.

Make sure your code does nothing gracefully.

Working...