NYC's Government Chatbot Is Lying About City Laws and Regulations (arstechnica.com) 57
An anonymous reader quotes a report from Ars Technica: NYC's "MyCity" ChatBot was rolled out as a "pilot" program last October. The announcement touted the ChatBot as a way for business owners to "save ... time and money by instantly providing them with actionable and trusted information from more than 2,000 NYC Business web pages and articles on topics such as compliance with codes and regulations, available business incentives, and best practices to avoid violations and fines." But a new report from The Markup and local nonprofit news site The City found the MyCity chatbot giving dangerously wrong information about some pretty basic city policies. To cite just one example, the bot said that NYC buildings "are not required to accept Section 8 vouchers," when an NYC government info page says clearly that Section 8 housing subsidies are one of many lawful sources of income that landlords are required to accept without discrimination. The Markup also received incorrect information in response to chatbot queries regarding worker pay and work hour regulations, as well as industry-specific information like funeral home pricing. Further testing from BlueSky user Kathryn Tewson shows the MyCity chatbot giving some dangerously wrong answers regarding treatment of workplace whistleblowers, as well as some hilariously bad answers regarding the need to pay rent.
MyCity's Microsoft Azure-powered chatbot uses a complex process of statistical associations across millions of tokens to essentially guess at the most likely next word in any given sequence, without any real understanding of the underlying information being conveyed. That can cause problems when a single factual answer to a question might not be reflected precisely in the training data. In fact, The Markup said that at least one of its tests resulted in the correct answer on the same query about accepting Section 8 housing vouchers (even as "ten separate Markup staffers" got the incorrect answer when repeating the same question). The MyCity Chatbot -- which is prominently labeled as a "Beta" product -- does tell users who bother to read the warnings that it "may occasionally produce incorrect, harmful or biased content" and that users should "not rely on its responses as a substitute for professional advice." But the page also states front and center that it is "trained to provide you official NYC Business information" and is being sold as a way "to help business owners navigate government." NYC Office of Technology and Innovation Spokesperson Leslie Brown told The Markup that the bot "has already provided thousands of people with timely, accurate answers" and that "we will continue to focus on upgrading this tool so that we can better support small businesses across the city."
MyCity's Microsoft Azure-powered chatbot uses a complex process of statistical associations across millions of tokens to essentially guess at the most likely next word in any given sequence, without any real understanding of the underlying information being conveyed. That can cause problems when a single factual answer to a question might not be reflected precisely in the training data. In fact, The Markup said that at least one of its tests resulted in the correct answer on the same query about accepting Section 8 housing vouchers (even as "ten separate Markup staffers" got the incorrect answer when repeating the same question). The MyCity Chatbot -- which is prominently labeled as a "Beta" product -- does tell users who bother to read the warnings that it "may occasionally produce incorrect, harmful or biased content" and that users should "not rely on its responses as a substitute for professional advice." But the page also states front and center that it is "trained to provide you official NYC Business information" and is being sold as a way "to help business owners navigate government." NYC Office of Technology and Innovation Spokesperson Leslie Brown told The Markup that the bot "has already provided thousands of people with timely, accurate answers" and that "we will continue to focus on upgrading this tool so that we can better support small businesses across the city."
But the Bot Says Otherwise... (Score:5, Funny)
This information didn't happen to be provided by the bot?
The bot should run for office (Score:4, Insightful)
Since the bot tells lies it would make the perfect politician.
This is the real danger (Score:4, Interesting)
All of these boomers say we need to limit AI research because AI will "kill us all".
But here we find the real danger - people who wantonly deploy AI without realizing the limits it has, including hallucination of answers, AI as we know it today, should never exist in the role they have placed it in, given how it can simply make up information.
Instead some kind of advanced search engine should hewn been applied to look up online docs, or else the AI should have been heavily constrained to have to point to origin sources with a secondary system deciding if the origin source agreed with what it said.
Until we are anywhere close to eliminating hallucination we must allow for open ongoing research, and be more cautious about rolling out AI in positions of public trust.
Re: (Score:3)
LLM's don't hallucinate. They malfunction.
Re: (Score:1)
They malfunction but I would say they are especially prone to malfunctioning in a way that people have termed "hallucination", where the results have made-up facts.
Re:This is the real danger (Score:5, Interesting)
...in a way that people have termed "hallucination"....
I agree with you, and my comment wasn't directed at you. I've seen so many people think that LLM's have some kind of intelligence (and therefore sentience) that I am actively fighting the use of anything that might suggest these programs are anything other than pattern matching algorithms.
Great point (Score:1)
I am actively fighting the use of anything that might suggest these programs are anything other than pattern matching algorithms
I see, have to agree that is a really great point. I will stop using that term myself as I agree with your overall goal.
I think I would maybe say, malfunctioning with sometimes backwards output.
Re: This is the real danger (Score:2)
Re: This is the real danger (Score:2)
Re: This is the real danger (Score:2)
Re: (Score:1)
FTFY
Which means everyone, including you.
That's one problem, but hardly the only one. The "training" materials are effectively a closed loop of reinfocing the correcteness or flaws of the
Bad typo on my part (Score:1)
All of these boomers which have given me everything good I have in the world and a safe place to indulge myself
I can understand your offense at this, but please know I myself am a boomer as well, and that what happened there is I wrote "doomer" and auto-correct altered it for me.
Doomers have never given us anything good, only fear.
The "training" materials are effectively a closed loop of reinfocing the correcteness or flaws of the training materials,
I totally agree with this, in fact I think general LLMs w
Re: (Score:2)
All of these boomers which have given me everything good I have in the world and a safe place to indulge myself
I can understand your offense at this, but please know I myself am a boomer as well, and that what happened there is I wrote "doomer" and auto-correct altered it for me.
Doomers have never given us anything good, only fear.
The "training" materials are effectively a closed loop of reinfocing the correcteness or flaws of the training materials,
I totally agree with this, in fact I think general LLMs will get much worse over time because of this.
However domain specific LLM's carefully trained not on their own bullshit, so to speak, might offer quick insights into a body of understanding.
Along with totally wrong and misleading "hallucinated" stuff mix in, in an undetectable way, so that the user is fooled.
Re:This is the real danger (Score:4, Informative)
The problem fundamentally is that the LLM doesn't "know" anything. It strings words together in a way that is statistically similar to the way a human would do it, according to training data.
But they don't know what the words mean. They don't know what the sentences mean. They don't know the difference between true and false, right and wrong, real and fake.
They are fluid bullshit machines.
Re: (Score:2)
Re: (Score:2)
Google in terrible danger here (Score:1)
Advanced search engines will not help you. The AIs are already writing junk into the very web pages that your search engine will read.
I'm thinking of advanced search engines that are only indexing real documents you have written by humans, many places for example that offer SDKs have extensive documentation that today at least, was all written by humans... so a classic search engine that only referenced that catalog for example would still be reliable.
Otherwise I agree with you, and in fact if you think abo
Re: (Score:2)
Re: This is the real danger (Score:1)
All we need to do is keep the bots to their word. If a company or government implements it and it gives an answer that is not in favor of that entity, it must nevertheless accept it. Whether that is a discount or an ordinance, this bot has now held that certain regulations are invalid, thatâ(TM)s great, keep it in place.
Some day, in the future... (Score:2)
AI chatbots will give excellent customer service
Unfortunately, today people fall for the hype and put crappy, preliminary prototypes into service
Geat job! (Score:5, Insightful)
Natural morons installing an artificial moron.
New expansion for "AI" (Score:4, Funny)
I often find myself thinking that "AI" anymore stands for "Artificial Ignorance".
Re: New expansion for "AI" (Score:2)
How about incompetence? Because an AI "knows" tons of things in the sense that the information is in there somewhere, but it doesn't know anything in the sense of being able to project the data into rational information.
Re: (Score:2)
Well, if it was just incompetent, that would not be so bad. But instead it has delusions of being great at things and starts to hallucinate far too often.
I mean, a lexicon is "incompetent" (cannot do anything), but really useful. A version where a bit more obscure articles are often pure fabrications loses that usefulness.
Re: (Score:2)
Artificial Idiot is also pretty fitting. Also takes into account that an Artificial Idiot may be able to take the job of a natural idiot and do it cheaper.
Re: (Score:3)
I often find myself thinking that "AI" anymore stands for "Artificial Ignorance".
I myself prefer “Autocomplete Insanity”
Re: (Score:2)
Nice one!
Re: (Score:2)
Well yeah - the name is kind of the problem, and for my money, the whole "because people are stupid" meme does not apply here.
The tech industry put the word "Intelligence" right in there in the name and now we're acting surprised that people assume some relationship to the actual meaning of the word?
If a company sold contracts for something called "Car Insurance" and it turned out to be time-shares, would we be sitting around joking about how people are too dumb to understand legal contracts, or would we th
Lawsuits (Score:5, Insightful)
We've yet to even really begin to see the lawsuits resulting from the wide-scale adoption of LLMs like this.
Everyone thinks jobs will be replaced en masse and that AI will pose an existential threat for humanity, but I think those theories are vastly, wildly speculative at this point - and definitely not anything that we'll see in practice in the next 5, 10 years. Why?
They're only looking at a very small subset of utility and basing their assumptions solely on the technology, not on how it will be used or the assumptions people will make based on it.
You all saw the NVIDIA "AI nurses" bit recently, I'm sure. Why nurses, and not doctors? Well, nurses don't practice medicine, and doctors do. Nurses can't be sued for malpractice, and do not need malpractice insurance.
Now imagine for a second the lawsuits that would result from tens of thousands of people getting bad medical advice, even harmful medical advice, due to hallucinations. Imagine the lawsuits from discrimination against minorities, or some other bias which was programmed in. A poorly trained workforce which makes mistakes in reading and comprehension of the laws, rules, etc. is one thing - you'll have a spread of abilities across all employees - but a singular AI with the same biases writ large is another.
Now remove the filter of experience from these chatbots and other tools I'm sure they'll try to create, and you start to see a broader problem: AI controlled military drones which, maybe 5% of the time, intentionally target civilians. AI which will hallucinate the wrong license plate and send a ticket to someone unrelated. People getting told by an automatic "nurse" a cancer prognosis with admonitions that it's not a big deal and they don't need further care. And so on...
Those liabilities will sink any company relying on the technology extremely quickly.
LLMs have a long way to go before they're more than a hype gimmick for broad adoption. We'll find general utility in them to improve our own workflows in the near to immediate term (1-5 years), but we may find a limit to techniques and models which make broad adoption impossible at a societal scale.
Re: (Score:2)
We've yet to even really begin to see the lawsuits resulting from the wide-scale adoption of LLMs like this.
Well if the lawyers would just use LLM for all their case work this lawsuit would get moving a lot faster.
Re: Lawsuits (Score:2)
Re: (Score:2)
So far, only the legal profession has seen strong liability for uncritical use of AI. For example, a lawyer had an AI write a brief supporting his client's position and submitted it to the court. When the judge found that the cited case law was fictional (a hallucination), the lawyer was summoned for a show cause hearing to decide if he would be allowed to continue practicing law.
Re: (Score:2)
Right, and that's half the point.
We haven't seen general applicability of LLMs yet because they're not very good. The results are (often) obviously wrong, or wrong often enough that "AI as a service" isn't tenable.
But "AI as a service" - doctors, nurses, lawyers, mechanic diagnosticians, etc. - is the objective here. They want a turnkey solution that you can turn to as an expert source of truth. That's both the holy grail, and the only commercially viable outcome which won't lead to lawsuits and mass disill
Well that was lame (Score:2)
I thought I'd be able to get some hilarious answers by asking absurd questions, but each time it just responded with a generic "I'm sorry Dave, I'm afraid I can't do that" response. Now I'll never know the answers to my important burning questions about NYC, such as:
If a homeless person steals my Cheetos, am I obligated to buy him a Tesla model 3?
Can I fly a drone inside a cardboard box?
If my business runs out of beer, will I have to climb on the roof and sing an oompa loompa song about the dangers of alco
Flawed by design, coz doesn't know it doesn't know (Score:1)
AI powered chabots are flawed by design because they don't know that they don't know. How can one be stupid enough to use a chatbot without knowing that?
Re: (Score:2)
It's the Kunning-Druger effect. :D
Re: Flawed by design, coz doesn't know it doesn't (Score:1)
Re: (Score:3)
The solution is don't assume the chat bot knows anything. Any prompt should be treated as a query against the primary law text. The one and only valid result is a citation of that text. The bot can summarize it and relate it back, but that citation has to be a bottleneck in the system.
This is impossible, because there is no connection between the law (original input), and the output from the chatbot. The output is the input, run through a word blender. There is no place in the system where what the chatbot says can be related to the input. There is no place to put a citation or reference or anything like that. All an LLM does is make word salad out of the input, with no meaning at all, and statistically you hope it comes out saying something similar to the original input. It does NOT kno
Re: Flawed by design, coz doesn't know it doesn't (Score:2)
Re: Flawed by design, coz doesn't know it doesn' (Score:3)
It is making people feel like the insights are within reach, but nothing is ever going to replace reading the book because it just takes a lot of Engli
Re: (Score:3)
You're right but you are also a bit wrong. Let me explain. The LLM model does not contain factual information, but it does capture the structure of language.
It's just doing statistical auto-complete on word fragments. It's a pile of shit.
Re: Flawed by design, coz doesn't know it doesn't (Score:2)
Worth reading (Score:2)
In their defence.. (Score:1)
ChatGPT gives better answers (Score:2)
Re: (Score:3)
There is something seriously wrong with how this problem was tackled. The bot seems to have no cognizance of how cities are typically structured bureaucratically and does not recognize entities like "City Attorney Office" ...
I hate to break it to you, but the bot does not know what any of the words mean. Not your prompt, not the output it generates, nothing. Ever.
That is not how LLMs work.
They do not contain facts.
Just meaningless word fragments that it strings together without any possible regard for what anything means at any point.
It would be possible, of course, to make a system (which I suppose you could call "AI' if that gives you tingles) that does what you propose here. Have a "knowledge" of what the pieces of the Gover
So asl MyCity ... (Score:2)
GIGO (Score:2)
Basically, if you're looking for consensus & a reflection of what most people claim to be true
Re: GIGO (Score:2)
Re: (Score:2)
Re: (Score:2)
AI will never unlearn this behavior (Score:2)
Because there are no means to punish it into learning NOT to lie. We punish kids when they lie to untrain that behavior and we punish adults who lie. Lying is a natural state. How are you going to punish a computer program?
A "lie" is intentionally false (Score:2)
Is that what the story is alleging?
Ten correct answers don't cancel-out an error. (Score:3)
Leslie Brown told The Markup that the bot "has already provided thousands of people with timely, accurate answers"
When asked about the fallen debris, city engineers told reporters "we already made thousands of correct measurements..."
When asked about the customers suffering from food poisoning, the kitchen owners told reporters "we've already provided thousands of satisfactory meals..."
I get that mistakes will be made -- hell, the chatbot might even make fewer mistakes. What puts a bee in my bonnet is this attitude that nobody can be held accountable for hallucinations and errors made by the clockwork device you purchased hoping to avoiding giving a human a job.