Are AI-Generated Search Results Still Protected by Section 230? (msn.com) 63
Starting this week millions will see AI-generated answers in Google's search results by default. But the announcement Tuesday at Google's annual developer conference suggests a future that's "not without its risks, both to users and to Google itself," argues the Washington Post:
For years, Google has been shielded for liability for linking users to bad, harmful or illegal information by Section 230 of the Communications Decency Act. But legal experts say that shield probably won't apply when its AI answers search questions directly. "As we all know, generative AIs hallucinate," said James Grimmelmann, professor of digital and information law at Cornell Law School and Cornell Tech. "So when Google uses a generative AI to summarize what webpages say, and the AI gets it wrong, Google is now the source of the harmful information," rather than just the distributor of it...
Adam Thierer, senior fellow at the nonprofit free-market think tank R Street, worries that innovation could be throttled if Congress doesn't extend Section 230 to cover AI tools. "As AI is integrated into more consumer-facing products, the ambiguity about liability will haunt developers and investors," he predicted. "It is particularly problematic for small AI firms and open-source AI developers, who could be decimated as frivolous legal claims accumulate." But John Bergmayer, legal director for the digital rights nonprofit Public Knowledge, said there are real concerns that AI answers could spell doom for many of the publishers and creators that rely on search traffic to survive — and which AI, in turn, relies on for credible information. From that standpoint, he said, a liability regime that incentivizes search engines to continue sending users to third-party websites might be "a really good outcome."
Meanwhile, some lawmakers are looking to ditch Section 230 altogether. [Last] Sunday, the top Democrat and Republican on the House Energy and Commerce Committee released a draft of a bill that would sunset the statute within 18 months, giving Congress time to craft a new liability framework in its place. In a Wall Street Journal op-ed, Reps. Cathy McMorris Rodgers (R-Wash.) and Frank Pallone Jr. (D-N.J.) argued that the law, which helped pave the way for social media and the modern internet, has "outlived its usefulness."
The tech industry trade group NetChoice [which includes Google, Meta, X, and Amazon] fired back on Monday that scrapping Section 230 would "decimate small tech" and "discourage free speech online."
The digital law professor points out Google has traditionally escaped legal liability by attributing its answers to specific sources — but it's not just Google that has to worry about the issue. The article notes that Microsoft's Bing search engine also supplies AI-generated answers (from Microsoft's Copilot). "And Meta recently replaced the search bar in Facebook, Instagram and WhatsApp with its own AI chatbot."
The article also note sthat several U.S. Congressional committees are considering "a bevy" of AI bills...
Adam Thierer, senior fellow at the nonprofit free-market think tank R Street, worries that innovation could be throttled if Congress doesn't extend Section 230 to cover AI tools. "As AI is integrated into more consumer-facing products, the ambiguity about liability will haunt developers and investors," he predicted. "It is particularly problematic for small AI firms and open-source AI developers, who could be decimated as frivolous legal claims accumulate." But John Bergmayer, legal director for the digital rights nonprofit Public Knowledge, said there are real concerns that AI answers could spell doom for many of the publishers and creators that rely on search traffic to survive — and which AI, in turn, relies on for credible information. From that standpoint, he said, a liability regime that incentivizes search engines to continue sending users to third-party websites might be "a really good outcome."
Meanwhile, some lawmakers are looking to ditch Section 230 altogether. [Last] Sunday, the top Democrat and Republican on the House Energy and Commerce Committee released a draft of a bill that would sunset the statute within 18 months, giving Congress time to craft a new liability framework in its place. In a Wall Street Journal op-ed, Reps. Cathy McMorris Rodgers (R-Wash.) and Frank Pallone Jr. (D-N.J.) argued that the law, which helped pave the way for social media and the modern internet, has "outlived its usefulness."
The tech industry trade group NetChoice [which includes Google, Meta, X, and Amazon] fired back on Monday that scrapping Section 230 would "decimate small tech" and "discourage free speech online."
The digital law professor points out Google has traditionally escaped legal liability by attributing its answers to specific sources — but it's not just Google that has to worry about the issue. The article notes that Microsoft's Bing search engine also supplies AI-generated answers (from Microsoft's Copilot). "And Meta recently replaced the search bar in Facebook, Instagram and WhatsApp with its own AI chatbot."
The article also note sthat several U.S. Congressional committees are considering "a bevy" of AI bills...
Does AI create, or doesn't it? (Score:3)
In the linked article, Gorsuch is quoted as saying:
[AI] generates polemics today that would be content that goes beyond picking, choosing, analyzing or digesting content. And that is not protected.
*Does it* go beyond analyzing or digesting content? If so, why can't AI-generated content be copyrighted? If it doesn't, then why wouldn't it be protected by Section 230?
Re:Does AI create, or doesn't it? (Score:5, Interesting)
Re: (Score:2)
Re:Does AI create, or doesn't it? (Score:5, Informative)
The judge is consistent by saying that a machine is not eligible to hold a copyright. Another example is your coffee spoon. It is not eligible for holding a copyright either. However, people who direct a machine to produce output *may* be eligible to hold copyright in the output.
You are begging the question by sneakily introducing the phrase "since that author is not human". There is no author here, just a machine.
Reality - Congress seeking to tax and control AI (Score:4, Funny)
Given any new growth area for business, Congress will talk regulations and taxes to get inbound lobbying money from the industry and setup a future framework to tax (most important) and regulate (only important to ensure future campaign donation payments) the industry.
Oddly, Canada lawmakers apparently used AI to generate thousands of amendments to bills in order to block the bills.
https://www.ctvnews.ca/politic... [ctvnews.ca]
"OTTAWA - Members of Parliament are expected to vote for up to 15 hours in a row Thursday and Friday on more than 200 Conservative amendments to the government's sustainable jobs bill.
The amendments are what's left of nearly 20,000 changes the Conservatives proposed to Bill C-50 last fall at a House of Commons committee.
Liberals now contend the Conservatives came up with the myriad amendments using artificial intelligence, in order to gum up the government's agenda."
Re: (Score:2)
I think he's making a useful point. Corporations are not human but they can hold copyrights. Clearly there are subtle distinctions being made here either intentionally or implicitly.
Re: (Score:3)
All these arguments about whether the *algorithms* are slicing and dicing data in a novel way are irrelevant in the big picture. The companies that run the algorithms are breaking the law *today* by copying data onto their servers without a license. They are pirates, and the AI models are proceeds of crime.
Re: (Score:1)
Analyzing and digesting content that doesn't belong to you is going beyond merely analyzing and digesting content: into the territory of criminal copyright infringement.
Oof- best blindfold yourself. You're analyzing and digesting content for which I have the copyright right this very second.
Re: (Score:3)
None of which is relevant to AI machines.
Re: (Score:1)
That's correct. And ordinarily an organization like Slashdot would not be able to show it to me, but luckily you have agreed to let them have limited publishing rights to your prose, and there are other provisions such as the very section of law that we are discussing to keep you from asserting your full rights in this instance.
It's true, I have granted Slashdot such a right. I have not granted you the right to read it though- and that is the point. You don't need me to grant you that right, and neither does a website scraper.
Presumably, nothing that AI looks at wasn't similarly published.
It's not like it's reaching into your hard drive and looking at your nudes. It's scraping the publicly available internet- exactly like a search engine does.
None of which is relevant to AI machines.
You're trying to create a distinction between different uses of the data, because you
Re: (Score:2)
There's only one publisher interacting with me, and only one publisher interacting with you in this case. The rights have been agreed, as long as we all stay within the agreement.
Fo
Re: (Score:1)
Still no. The comment I read was published by slashdot to me, because you gave slashdot limited rights to do so. Slashdot gave me, with your permission, limited rights to read and respond to the forum posts it publishes to me. Slashdot published my response to you (and everyone here). That's what the terms of service are for.
I gave slashdot rights to publish- you can read the terms yourself. I gave them no rights they can confer on to you, and indeed they did not.
If someone publishes a thing publicly, you may read it. You may not copy it, of course, but you may read it.
It's like picking up a book at a library. It's the legal theory that allows web scraping.
For anything else I decide to do with your comment, I will need your permission. For example, if I were to scrape this site for all your past comments, and write a blog post on my own server quoting everything you've written, maybe commenting or editing or misattributing it, that would be a clear copyright violation.
Correct.
However, once that content is in your brain, any inspiration you derive from it is in fact your own.
I do not have the right to publish your comments myself (you only gave that right to slashdot, remember). I do not have the right to scrape slashdot's forums (that would be against the terms of service). I do not have the right to replicate all your comments and mix them with my own writings on my own blog server (that's not fair use). The list goes on.
Indeed you do not. And again- slashdot inferred no right to you
Re: (Score:2)
Scraping to make a full-text copy of something so that you can distribute it (as your own, or otherwise) is definitely not protected. Re-reading, I see room for someone to claim that I'm trying to say that. I am not.
However, caselaw shows indexing, and even scanning for the purpose of indexing, are transformative, and fair use.
In the case of training a NN, it's hard to imagine there is any use that could possibly be more transformativ
Re: (Score:2)
IMHO that is debatable. It seems to me that the current T&Cs (Section 6, paragraph 5) state a much more comprehensive set of rights granted:
Re: (Score:2)
By sending or transmitting to us Content, or by posting such Content to any area of the Sites, you grant us and our designees a worldwide, non-exclusive, sub-licensable (through multiple tiers), assignable, royalty-free, perpetual, irrevocable right to link to, reproduce, distribute (through multiple tiers), adapt, create derivative works of, publicly perform, publicly display, digitally perform or otherwise use such Content in any media now known or hereafter developed.
Ok- that's fair.
It seems I did give them the right to sublicense (grant you license) to my work.
But still, they have not.
Not all consuming of a work is licensed, or needs to be.
As far as I can tell if Slashdot wants to operate at a basic functionality level it probably only needs the right to create derivative works, eg to display your comment with changed fonts, reformatted or hidden paragraphs, but also to be able to publish later comment responses that happen to blockquote some pieces of the original.
Agreed.
However, we users have granted them much more than that. Slashdot can assign the rights to the posted content to anyone they like through sublicensing. Why wouldn't they choose to assign derivative rights from the posted content to other users? Seems like an obvious way to make the discussion part of the site work as intended. But IANAL, this is as far as I am willing to argue.
Agreed, and noted above regarding sublicensing. They certainly retained that right.
Regarding neural networks, from the mathematical point of view it is quite clear and well understood that these systems are lossy compression models. The links and equivalences between information theory, statistical models and compression have been worked out more than 50 years ago. The specific details of the AI systems today are irrelevant, these models are derivative works of the training data in a literal and very explicit iterative way.
These systems are *not* lossy compression! Where is this understood?
The relationship they build doesn't compress data, it transforms it into relationships like an index of unimaginable complexity with an unimaginable amount of links.
It does
Re: Does AI create, or doesn't it? (Score:2)
Licenses can be implied, and need not be in writing. Unless your grant to slashdot required an explicit, written sublicense?
Re: (Score:2)
Licenses can be implied, and need not be in writing.
Oh absolutely- you have an implied license to copy this data to your computer RAM right now.
And since there is no Terms gating your viewing of it, that implied license can't be revoked.
That's generally how "the public internet" is regarded, anyway.
There have been arguments that the existence of "terms behind a link at the bottom of the page" would imply explicit license terms, however, that's been struck down by the courts on multiple occasions. There's even a fancy name for it- I just forget what it is
Re: Does AI create, or doesn't it? (Score:2)
You are mixing two different, and in this case directly contradictory things.
Section 230 has nothing to do with copyright, its about whether the "defendant" (service) is a passive distributor (bulletin board), a publisher with editorial discretion (the new york times), or something weird in between (reddit).
Historically the first is not liable for content, the second is liable for certain content under tort and criminal law - and the third is protected under section 230 because algorithmic or crowdsourced r
No. (Score:3, Insightful)
As AI is integrated into more consumer-facing products, the ambiguity about liability will haunt developers and investors,
This should already be enough evidence that it is fundamentally wrong. We shouldn't be bending over backwards to make this legal, we should be explicitly banning its use. He's right at least about the part that there should be no ambiguity.
Re: (Score:2)
Yeah they're both the publisher and the source, can't dodge liability for that.
However, I think there's room for treating private responses to detailed user prompts as user-generated.
No S230 for AI (Score:3)
This law is meant to protect online publishers, not programs. You want an AI law? Make one that can stand on its own.
Re: (Score:3)
Re: (Score:3, Informative)
the safe harbor provision is meant to allow online services to not be treated as publishers but rather as common carriers.
It's not meant to do that in the slightest.
You should read it. [cornell.edu]
It's quite different, conceptually, from that of the common carrier [house.gov].
Particularly, in the duty to serve the public.
Rather, S230 is quite far from a common carrier, and that's the problem people have with it- they want "information content providers"/"interactive computer services" to have to follow the rules of a common carrier, rather than the unusually high level of freedom they currently enjoy.
Re: (Score:3)
There is considerable disagreement, including between the Eleventh and Fifth Circuit courts about the common carrier element present in Section 230. You should read the opinions.
What my brief comment was intended to mean was that in practice the "platforms" want to be treated like common carriers from a liability perspective, as opposed to "publishers" who can be held liable for what goes out on their servers. Section 230 was bought by the tech industry for this purpose. There
Re:No S230 for AI (Score:4, Insightful)
Another Slashdot lawyer?
Nope. Just someone who can read and assign meaning to different concepts.
There is considerable disagreement, including between the Eleventh and Fifth Circuit courts about the common carrier element present in Section 230. You should read the opinions.
You are incorrect. You yourself should probably read those opinions.
The 5th circuit does not claim that S.230 creates a common carrier. Rather, they created it themselves by what the plaintiffs do- transport speech.
The 11th circuit correct rejected that S.230 makes a common carrier.
The 11th and 5th circuit are not in disagreement, here.
In my opinion, the 5th circuit's holding that anything that transports speech must be a common carrier is dubious, but not entirely without merit.
But regardless- their opinion did not in any way imply that S.230 indicated that Congress intended for them to be common carriers as defined by Federal Law, but rather held that their transport of speech made them common carriers in common law, and thus subject to regulation by the State of Texas.
From the 5th circuit opinion:
The Platforms’ contention that federal law does not treat them as common carriers is similarly beside the point. See 47 U.S.C. S. 223(e)(6) (clarifying that certain provisions of federal law should not “be construed to treat interactive computer services as common carriers”). No party is arguing that the Platforms’ common carrier obligations stem from federal law. The question is whether the State of Texas can impose common carrier obligations on the Platforms. And no party has argued that S. 223(e)(6) preempts state common carrier regulation.
As you can see, your interpretation of the opinion is simply wrong.
The opinion is additionally odd, because Federal Law *does* preempt state common carrier regulation. I can see why the all the online law board/school discussion on the topic has called the opinion a giant pile of shit.
What my brief comment was intended to mean was that in practice the "platforms" want to be treated like common carriers from a liability perspective, as opposed to "publishers" who can be held liable for what goes out on their servers.
What they want is what the law provides. The law literally provides for them to be as you describe.
Section 230 was bought by the tech industry for this purpose.
S.230 was not bought by anyone.
It was rushed through Congress with broad bipartisan support after a disastrous court case that held that forum operators were liable for the speech on them, which would have literally been the end of internet forums.
There is a larger discussion around how they want to have it both ways, but I was responding to a specific claim at the top of this thread.
The law gives it to them both ways. What they want is what is law. Disagreeing with that law is just fine. Wanting it to be changed is fine. Trying to pretend like the law wasn't designed for this purpose, despite the fact that it very, very clearly says that it was, is asinine.
Re: (Score:2)
Section 230 was bought by the tech industry for this purpose.
The way I heard it, Section 230 was really about politicians who were squeamish about pornography. They wanted to give the tech industry the ability to censor pornography in a way which wouldn't make companies liable for everything that their users posted.
I'm sure that the industry had a lot of input into how it was written, but describing it as bought favor really doesn't seem very accurate.
Re: (Score:3)
This was long before the era of what we now call social media.
A judge decided that since a company filtered posts, it's exercising editorial review, and thus is a publisher, and should thus be treated like a newspaper, even though it'd literally be a newspaper.... full of nothing but other random people's opinions, with zero association
Re: Rephrasing a bit (Score:2)
That would result in all user generated content platforms being almost immediately overrun by spam, porn, and porn spam, because the liability for moderating anything would be crippling. So basically do that if you want to turn the Internet into a push only medium like TV.
Re: (Score:2)
* Section 230: big tech can censor whatever it wants without consequences
Correct. Not just big tech. Small tech too. You. Me. Anyone who operates a server that people connect to.
* Common carrier: if you censor, you are liable for all content, but if you do not, you are not liable
Incorrect.
If you censor, then you're *not* a common carrier, and then you're a publisher, and therein lied the problem.
Common carrier was the Prodigy rule and worked a lot better.
No. That's literally why S.230 was passed. Because forcing a forum operator to either behave as a common carrier, or as a publisher was fucking stupid.
Treating someone's server as a common carrier is fucking stupid, and about as anti-freedom as you can get.
Not quite (Score:1)
S230 isn't common carrier. Some of the worst people on earth *want* it to be because then they could crap flood every forum on the planet and dominate online discussion with their specific political and social views.
No, S230 is different because it allows the platform to both moderate content and not be legally responsib
Re: (Score:2)
Also, you are aware that common carriers like rail lines can kick unruly passengers off, right?
Re: Stepping on a rake (Score:2)
The "censorship" "problem" is because at the scale social media operates, it's impossible to do moderation correctly.
Just have AI provide all sources (Score:3)
Re: (Score:3)
You know very well that ain't happening - no way the search engines of the world will let you check how they put together the "AI-generated" responses you see.
Re: (Score:2)
Re: (Score:2)
Sigh... I don't even know where to begin, given that you very clearly don't know how any of this works. Bing chat will do an ordinary search and include text from some of those results as context. Those are the URLs that it's linking. That's all there is to it. It doesn't collect facts, determine what is relevant, and compile them into a reply. It can't tell you where any particular part of its reply came from because that's not how these things work.
It can not "fact check" because it has no ability to
Re: (Score:3)
Q: How fast is the total student debt in the US rising? Is that rate accelerating or slowing down?
A: According to the Federal Reserve, the total student debt in the US was more than $1.5 trillion in the second quarter of 2022https://www.usatoday.com/money/blueprint/student-loans/average-student-loan-debt-statistics/. This was an increase of about 4% compared to the second quarter of 2021, when the total student debt was about $1.45 trillionhttps://www.nbcnews.com/data-graphics/student-loan-debt-america-charts-rcna44439. The annual growth rate of student debt has been slowing down since 2013, when it peaked at 14.7%https://www.cfr.org/backgrounder/us-student-loan-debt-trends-economic-impact. In 2021, the annual growth rate was 3.9%https://www.cfr.org/backgrounder/us-student-loan-debt-trends-economic-impact. However, the growth rate may change depending on the economic conditions, the policy responses, and the borrowing behavior of students and graduates. Some of the factors that may affect the growth rate of student debt are: The COVID-19 pandemic and its impact on the labor market, the income, and the repayment capacity of borrowershttps://www.cnbc.com/2020/12/22/us-student-debt-has-increased-by-more-than-100percent-over-past-10-years.html. The federal student loan payment pause and interest waiver, which was extended until January 31, 2022, and may be extended further. The potential student loan forgiveness or cancellation plans, which could reduce the outstanding debt for some or all borrowers. The enrollment trends and the cost of attendance of postsecondary education, which could influence the demand and supply of student loans. Therefore, it is difficult to predict how fast the total student debt in the US will rise in the future, as it depends on many factors and uncertainties. I hope this information was helpful to you. blush
Note the sources of information linked right in the text (the actual answer contained footnotes with just numbers which were links, also listed at the bottom of the answer, which was better formatted than the above dumb paste). So while I don't claim specifics of how copilot got the answer, was it some software on top of LLM which provides the links, or additional hidden prompts, or was Bill Ga
Re: (Score:2)
It's like I'm talking to wall.
Believe whatever dumbass thing you want about chatbots. You're clearly not interested in reality.
Re: (Score:2)
Once again, these things don't operate on facts. They don't understand things. There is no way to know, let alone for the model to know, what bits of training data influenced any particular response.
So no, it can't attribute a source to it's responses. That's impossible So how does that appear to happen on Bing? It's not complicated.
Search. For each result, generate response including page text in context, output url, repeat without old page text in context. You'll notice that there is no guarantee tha
Re: (Score:2)
When AI answers the question, it can simply provide (link when possible) all sources used to generate the answer.
No- it can't. That's not how an LLM works.
If it has no source to back up its answer, it can simply skip it, or provide it clearly qualifies as an unverified speculation.
No. That's not how an LLM works.
Re: (Score:3)
Re: (Score:2)
The links are not generated by it, though.
Re: (Score:2)
Re: (Score:2)
Then it seems Microsoft has figured out how to attribute the LLM answers with source links.
Not quite- it augments and guides LLM output with regular search indexes. It's a neat system.
The LLM itself cannot store any kind of source- like I said, it just doesn't work that way.
The aggregate product that is Bing Chat, that uses traditional indexing to prompt and augment the LLM output is absolutely do-able, but it's hard to legislate that as a requirement for an LLM.
Re: (Score:2)
I suspect it works the other way.
It searches the text it generates and plugs in sources that come up in the results.
It seems like a pretty nifty way to make sure the results are credible and do further investigation.
Re: (Score:2)
When AI answers the question, it can simply provide (link when possible) all sources used to generate the answer.
That's not how AI works.
If it has no source to back up its answer, it can simply skip it
That's not how AI works.
or provide it clearly qualifies as an unverified speculation.
Say it with me ... That's not how AI works.
You're misunderstanding something about S230 (Score:2)
So yes, platform holders are still protected from user generated content, regardless of how the users generated it
Re:You're misunderstanding something about S230 (Score:4, Insightful)
That's only true when the content isn't created by the platform holder themselves.
If I run a forum, you can't sue me for what the forum posters post; you have to sue them. But if I post something (say, by running an AI that creates content), then I can be sued for what the AI I operate posts. If I'm the platform holder, then I can be sued. Section 230 is not really relevant in this case, because when Google posts their own content the S230 safe harbor provisions just don't apply. They're not rehosting content, they are creating it.
Re: (Score:3)
That's only true when the content isn't created by the platform holder themselves.
This is correct. The statute can be summed up as, "person who wrote the shit is responsible for it, not the people who own the medium the person posted on."
If I'm the platform holder, then I can be sued. Section 230 is not really relevant in this case, because when Google posts their own content the S230 safe harbor provisions just don't apply. They're not rehosting content, they are creating it.
This is most likely the case.
It *is* however arguable.
Even a search engine, to an extent, "generates content" while linking.
Calling the output of an LLM "generated" is kind of a gray area. It's more like a really, really good search engine, but also a really really bad one- at the same time.
It uses bits of information digested from other people, and
Why should it? (Score:2)
Why should section 230 protect Google or others when their AIs do bad things? action 230 was meant to be an extension of the principle that the printing press is not responsible for the content, but the creator of the content is. With AI the owner of the computer should be responsible for allowing it to be published. Otherwise, how do we get accountability?
Re: Why should it? (Score:2)
Oh wait... Are you saying that the person doing the search is responsible for the output? That's interesting. So no longer Copyright Google... Copyright whomever entered the search and generated the answer.
Why should we extend Section 230? (Score:2)
Section 230 was written with a specific end in mind: insuring that the creator of the content was the one liable for it, not the platform hosting it. I don't see a compelling reason to change that just because platforms are using AI to generate their own content. If AI is, as it's proponents claim, not just copying and mixing other people's content but creating something new from it, then what it creates belongs to whoever's running the AI and they should be liable for it just as Section 230 says. If they w
Re: (Score:1)
In practice it's not a legal issue (Score:2)
It will be decided by lobbyists shoving mostly incompetent lawmakers in different directions. Whoever shoves the hardest wins.
Lawmakers write horrible legislation. Or to be more accurate, congressional aides who are not the sharpest legal minds write the text. And in many cases non-profits (or even for profit entities) write model legislation that is copy/pasted into black letter law. Often it is not reviewed by anyone in the government, e
Indeed (Score:2)
but it's not just Google that has to worry about the issue. The article notes that Microsoft's Bing search engine also supplies AI-generated answers (from Microsoft's Copilot). "And Meta recently replaced the search bar in Facebook, Instagram and WhatsApp with its own AI chatbot."
And Brave ... and DuckDuckGo (well, via some kind of linkage with ChatGPT or Claude, you get to configure which ... which is actually kinda cool).
Good luck playing whack a mole with every search engine out there.
EULA for the win? (Score:2)
We sign away all rights to anything but arbitrations in tons of situations (meaning you can't sue them in a regular court, and the company will stack the deck). It's hard to imagine anything important/big might happen here.
"Please use your brain and research any big decisions you make based on information we offer you here."