Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
AI Businesses Privacy

Salesforce Study Finds LLM Agents Flunk CRM and Confidentiality Tests 14

A new Salesforce-led study found that LLM-based AI agents struggle with real-world CRM tasks, achieving only 58% success on simple tasks and dropping to 35% on multi-step ones. They also demonstrated poor confidentiality awareness. "Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance," a paper published at the end of last month said. The Register reports: The Salesforce AI Research team argued that existing benchmarks failed to rigorously measure the capabilities or limitations of AI agents, and largely ignored an assessment of their ability to recognize sensitive information and adhere to appropriate data handling protocols.

The research unit's CRMArena-Pro tool is fed a data pipeline of realistic synthetic data to populate a Salesforce organization, which serves as the sandbox environment. The agent takes user queries and decides between an API call or a response to the users to get more clarification or provide answers.

"These findings suggest a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios," the paper said. [...] AI agents might well be useful, however, organizations should be wary of banking on any benefits before they are proven.

Salesforce Study Finds LLM Agents Flunk CRM and Confidentiality Tests

Comments Filter:
  • by rsilvergun ( 571051 ) on Monday June 16, 2025 @06:40PM (#65454375)
    When I first connect to a business for support they hand me over to a barely functional chatbot. If that doesn't work they escalate me to another chatbot with more computing power behind it. If that doesn't work there's usually at least one more layer of chatbot before a human being. Sometimes two.

    The entire thing is miserable as a customer but because we've had 40 plus years of market consolidation I don't have a lot of options. I could shop at boutique places but they are usually 20 to 30% more not because they are small businesses but because they have to eke out a niche in order to survive and so they tend to sell more expensive stuff for specific purposes.

    The end result is that any company I do business with has managed to use chatbots to reduce my interaction with their customer service reps by somewhere between 20 and 50%.
  • AI is garbage and those who rashly implement it now better enjoy smelling like shit tomorrow when we all realize that no it's not a new shiny, it's just a pile of smelly garbage
    • by quenda ( 644621 )

      I bet if you got an AI to write that article, or the /. summary, it would at least have had the brains to define "CRM".

      Salesforce researchers tested how well AI agents handle real-world business tasks, especially in areas like customer service

      If only we could train humans to write so clearly.

  • I use GitHub Copilot frequently. It's useful for a lot of small tasks that I spell out in detail. But anything that goes beyond one step or one logical leap, it flunks.

    For example, when removing a parameter from a function signature, it's not smart enough to locate that parameter in callers, or within the function itself. It suggests edits anyway, which (if I didn't stop it) would destroy variables and values that I did NOT choose to delete. The point is, it takes a second level of logic to realize the ripple effects of removing a parameter from a function signature. LLMs are worse than useless with this kind of complexity, and that's not really that complex.

    So, all you developers who are worried about LLMs taking your jobs, relax. It's nowhere near that level of sophistication, even if you are a junior developer.

    • by dfghjk ( 711126 )

      "So, all you developers who are worried about LLMs taking your jobs, relax. It's nowhere near that level of sophistication, even if you are a junior developer."

      Your opinion on that doesn't matter, it's that developer's company that does. AI companies aren't trying to sell AI on merit, they are selling to management on hype.

      • You're right, of course. And if you're working for such a company, you're probably better of working just about anywhere else. The company will soon find out that AI can't run a dev team, but not before things get really, really bad for the few who are left behind after the RIFs. Find a company that's doing real work and cares about its customers, and you'll find a company that doesn't quickly fall for the AI hype. Find a PE company that's in the process of being flipped...and beware. Those guys will fall f

    • by RobinH ( 124750 )
      The thing is, we already have advanced refactoring tools. If I want to change a function signature and update it everywhere, that's a solved problem, as least in Visual Studio. It's right 100% of the time and I don't need to check its work in excruciating detail. Why would I ask an LLM to do it?
      • Your refactoring tools can't figure out what to do with the removed or added parameters. For example, add a parameter called employeeId, and that function is called from a for loop with a local variable employeeId as the index of the for loop, one would expect AI to be able to figure out that it should change the *caller* to add the new parameter in the call to the function. Sometimes AI will figure this out, sometimes not. Sometimes it figures it out, but inserts employeeId in the wrong position in the par

  • Yep, went like that in the last few AI hypes as well. Grande promises, tons of morons thinking the world will fundamentally change, small actual results and impact.

  • That is code for, "it doesn't lie well". Meaning it doesn't misrepresent capabilities or features and doesn't hide faults. Sounds to me like LLM's are functioning exactly the way consumers would like them to work. But this brings to mind an interesting concept. With human sales or support who lie or misrepresent, in consumer centered cases, it should be possible to force companies to cough up their LLM training modules and prove they are lying in court!!!!

  • They are admitting that what they say is bullshit.

    https://www.entrepreneur.com/b... [entrepreneur.com]

Marriage is the sole cause of divorce.

Working...