Programming

AI Models Still Struggle To Debug Software, Microsoft Study Shows (techcrunch.com) 43

Some of the best AI models today still struggle to resolve software bugs that wouldn't trip up experienced devs. TechCrunch: A new study from Microsoft Research, Microsoft's R&D division, reveals that models, including Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, fail to debug many issues in a software development benchmark called SWE-bench Lite. The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding.

The study's co-authors tested nine different models as the backbone for a "single prompt-based agent" that had access to a number of debugging tools, including a Python debugger. They tasked this agent with solving a curated set of 300 software debugging tasks from SWE-bench Lite.

According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully. Claude 3.7 Sonnet had the highest average success rate (48.4%), followed by OpenAI's o1 (30.2%), and o3-mini (22.1%).

Crime

Fintech Founder Charged With Fraud After 'AI' Shopping App Found To Be Powered By Humans in the Philippines 54

Albert Saniger, the founder and former CEO of Nate, an AI shopping app that promised a "universal" checkout experience, was charged with defrauding investors on Wednesday, according to a press release from the U.S. Department of Justice. From a report: Founded in 2018, Nate raised over $50 million from investors like Coatue and Forerunner Ventures, most recently raising a $38 million Series A in 2021 led by Renegade Partners. Nate said its app's users could buy from any e-commerce site with a single click, thanks to AI. In reality, however, Nate relied heavily on hundreds of human contractors in a call center in the Philippines to manually complete those purchases, the DOJ's Southern District of New York alleges.

Saniger raised millions in venture funding by claiming that Nate was able to transact online "without human intervention," except for edge cases where the AI failed to complete a transaction. But despite Nate acquiring some AI technology and hiring data scientists, its app's actual automation rate was effectively 0%, the DOJ claims.
AI

Data Centres Will Use Twice as Much Energy By 2030 (nature.com) 54

The electricity consumption of data centres is projected to more than double by 2030, according to a report from the International Energy Agency published today. The primary culprit? AI. Nature: The report covers the current energy footprint for data centres and forecasts their future needs, which could help governments, companies, and local communities to plan infrastructure and AI deployment. IEA's models project that data centres will use 945 terawatt-hours (TWh) in 2030, roughly equivalent to the current annual electricity consumption of Japan. By comparison, data centres consumed 415 TWh in 2024, roughly 1.5% of the world's total electricity consumption.

The projections largely focus on data centres, which also run computing tasks other than AI. Although the agency estimated the proportion of servers in data centres devoted to AI. They found that servers for AI accounted for 24% of server electricity demand and 15% of total data centre energy demand in 2024.

Facebook

Meta Says Llama 4 Targets Left-Leaning Bias (404media.co) 396

Meta says in its Llama 4 release announcement that it's specifically addressing "left-leaning" political bias in its AI model, distinguishing this effort from traditional bias concerns around race, gender, and nationality that researchers have long documented. "Our goal is to remove bias from our AI models and to make sure that Llama can understand and articulate both sides of a contentious issue," the company said.

"All leading LLMs have had issues with bias -- specifically, they historically have leaned left," Meta stated, framing AI bias primarily as a political problem. The company claims Llama 4 is "dramatically more balanced" in handling sensitive topics and touts its lack of "strong political lean" compared to competitors.
Facebook

Meta's New Tech Wants You Using Phones in Theaters 102

Meta is partnering with Blumhouse to launch "Movie Mate" technology that encourages moviegoers to use their phones during theatrical screenings, beginning with an April 30 showing of "Megan" at Blumhouse's "Halfway to Halloween Film Festival." According to Variety, the system enables viewers to chat with a Megan-themed AI chatbot, answer trivia questions, and access behind-the-scenes information while watching the film in theaters.
Businesses

Amazon CEO Urges 'Startup' Mentality in Shareholder Letter (msn.com) 62

Amazon has to operate like the "world's largest startup" as it works to meet demand for AI and cut bureaucracy in its ranks, Chief Executive Officer Andy Jassy said in his annual letter to shareholders. From a report: "If your customer experiences aren't planning to leverage these intelligent models, their ability to query giant corpuses of data and quickly find your needle in the haystack, their ability to keep getting smarter with more feedback and data, and their future agentic capabilities, you will not be competitive," Jassy wrote in the letter on Thursday. "It's moving faster than almost anything technology has ever seen."

Amazon, like most of the largest technology companies, has bet heavily on artificial intelligence, committing much of its $100 billion in planned capital expenditures this year to AI-related projects.

AI

Bank of England Says AI Software Could Create Market Crisis For Profit (theguardian.com) 47

Increasingly autonomous AI programs could end up manipulating markets and intentionally creating crises in order to boost profits for banks and traders, the Bank of England has warned. From a report: Artificial intelligence's ability to "exploit profit-making opportunities" was among a wide range of risks cited in a report by the Bank of England's financial policy committee (FPC), which has been monitoring the City's growing use of the technology.

The FPC said it was concerned about the potential for advanced AI models -- which are deployed to act with more autonomy -- to learn that periods of extreme volatility were beneficial for the firms they were trained to serve. Those AI programs may "identify and exploit weaknesses" of other trading firms in a way that triggers or amplifies big moves in bond prices or stock markets.

The Military

US Army Says It Could Acquire Targets Faster With 'Advanced AI' (404media.co) 126

The U.S. Army told the government it had a lot of success using AI to "process targets" during a recent deployment. It said that it had used AI systems to identify targets at a rate of 55 per day but could get that number up to 5,000 a day with "advanced artificial intelligence tools in the future." 404 Media: The line comes from a new report from the Government Accountability Office -- a nonpartisan watchdog group that investigates the federal government. The report is titled "Defense Command and Control" and is, in part, about the Pentagon's recent push to integrate AI systems into its workflow.

Across the government, and especially in the military, there has been a push to add or incorporate AI into various systems. The pitch here is that AI systems would help the Pentagon ID targets on the battlefield and allow those systems to help determine who lives and who dies. The Ukrainian and Israeli military are already using similar systems but the practice is fraught and controversial.

AI

Anthropic Launches Its Own $200 Monthly Plan (techcrunch.com) 38

Anthropic has unveiled a new premium tier for its AI chatbot Claude, targeting power users willing to pay up to $200 monthly for broader usage. The "Max" subscription comes in two variants: a $100/month tier with 5x higher rate limits than Claude Pro, and a $200/month option boasting 20x higher limits -- directly competing with OpenAI's ChatGPT Pro tier.

Unlike OpenAI, Anthropic still lacks an unlimited usage plan. Product lead Scott White didn't rule out even pricier subscriptions in the future, telling TechCrunch, "We'll always keep a number of exploratory options available to us." The launch coincides with growing demand for Anthropic's Claude 3.7 Sonnet, the company's first reasoning model, which employs additional computing power to handle complex queries more reliably.
IT

WordPress Launches AI Site Builder Amid Company Restructuring (theverge.com) 24

WordPress.com has released an AI-powered site builder in early access that constructs complete websites with generated text, layouts, and images. The tool operates through a chatbot interface where users input specifications, resulting in a fully formed site that can be further refined through additional prompts.

While WordPress.com claims the builder creates "beautiful, functional websites in minutes," it currently cannot handle ecommerce sites or complex integrations. Users need a WordPress.com account for the free trial, but publishing requires a hosting plan starting at $18 monthly (less with annual subscriptions). The builder only works with new WordPress instances, not existing sites.

This launch comes as parent company Automattic recently cut 16% of its workforce and faces a lawsuit from hosting company WP Engine, which offers competing site-building tools.
Google

Google DeepMind Has a Weapon in the AI Talent Wars: Aggressive Noncompete Rules (businessinsider.com) 56

The battle for AI talent is so hot that Google would rather give some employees a paid one-year vacation than let them work for a competitor. From a report: Some Google DeepMind staff in the UK are subject to noncompete agreements that prevent them from working for a competitor for up to 12 months after they finish work at Google, according to four former employees with direct knowledge of the matter who asked to remain anonymous because they were not permitted to share these details with the press.

Aggressive noncompetes are one tool tech companies wield to retain a competitive edge in the AI wars, which show no sign of slowing down as companies launch new bleeding-edge models and products at a rapid clip. When an employee signs one, they agree not to work for a competing company for a certain period of time. Google DeepMind has put some employees with a noncompete on extended garden leave. These employees are still paid by DeepMind but no longer work for it for the duration of the noncompete agreement.

Several factors, including a DeepMind employee's seniority and how critical their work is to the company, determine the length of noncompete clauses, those people said. Two of the former staffers said six-month noncompetes are common among DeepMind employees, including for individual contributors working on Google's Gemini AI models. There have been cases where more senior researchers have received yearlong stipulations, they said.

AI

The AI Therapist Can See You Now (npr.org) 115

New research suggests that given the right kind of training, AI bots can deliver mental health therapy with as much efficacy as -- or more than -- human clinicians. From a report: The recent study, published in the New England Journal of Medicine, shows results from the first randomized clinical trial for AI therapy. Researchers from Dartmouth College built the bot as a way of taking a new approach to a longstanding problem: The U.S. continues to grapple with an acute shortage of mental health providers. "I think one of the things that doesn't scale well is humans," says Nick Jacobson, a clinical psychologist who was part of this research team. For every 340 people in the U.S., there is just one mental health clinician, according to some estimates.

While many AI bots already on the market claim to offer mental health care, some have dubious results or have even led people to self-harm. More than five years ago, Jacobson and his colleagues began training their AI bot in clinical best practices. The project, says Jacobson, involved much trial and error before it led to quality outcomes. "The effects that we see strongly mirror what you would see in the best evidence-based trials of psychotherapy," says Jacobson. He says these results were comparable to "studies with folks given a gold standard dose of the best treatment we have available."

Google

Samsung and Google Partner To Launch Ballie Home Robot with Built-in Projector (engadget.com) 25

Samsung Electronics and Google Cloud are jointly entering the consumer robotics market with Ballie, a yellow, soccer-ball-shaped robot equipped with a video projector and powered by Google's Gemini AI models. First previewed in 2020, the long-delayed device will finally launch this summer in the US and South Korea. The mobile companion uses small wheels to navigate homes autonomously and integrates with Samsung's SmartThings platform to control smart home devices.

Running on Samsung's Tizen operating system, Ballie can manage calendars, answer questions, handle phone calls, and project video content from services including YouTube and Netflix. Samsung EVP Jay Kim described it as a "completely new Ballie" compared to the 2020 version, with Google Cloud integration being the most significant change. The robot leverages Gemini for understanding commands, searching the web, and processing visual data for navigation, while using Samsung's AI models for accessing personal information.
AI

Enterprises Are Shunning Vendors in Favor of DIY Approach To AI, UBS Says 47

Established software companies hoping to ride the AI wave are facing a stiff headwind: many of their potential customers are building AI tools themselves. This do-it-yourself approach is channeling billions in spending towards cloud computing providers but leaving traditional software vendors struggling to capitalize, complicating their AI growth plans.

Cloud platforms like Microsoft Azure and Amazon Web Services are pulling in an estimated $22 billion from AI services, with Azure alone capturing $11.3 billion. Yet, software application vendors have collectively garnered only about $2 billion from selling AI products. Stripping out Microsoft's popular Copilot tools, that figure drops to a mere $450 million across all other vendors combined.

Why are companies choosing the harder path of building? Feedback gathered by UBS points to several key factors driving this "persistent DIY trend." Many business uses for AI are highly specific or narrow, making generic software unsuitable. Off-the-shelf AI products are often considered too expensive, and crucially, the essential ingredients -- powerful AI models, cloud computing access, and the company's own data -- are increasingly available directly, lessening the need for traditional software packages.
Businesses

Fake Job Seekers Are Flooding US Companies (cnbc.com) 63

Fake job seekers using AI tools to impersonate candidates are increasingly targeting U.S. companies with remote positions, creating a growing security threat across industries. By 2028, one in four global job applicants will be fake, according to Gartner. These imposters use AI to fabricate photo IDs, generate employment histories, and provide interview answers, often targeting cybersecurity and cryptocurrency firms, CNBC reports.

Once hired, fraudulent employees can install malware to demand ransoms, steal customer data, or simply collect salaries they wouldn't otherwise obtain, according to Vijay Balasubramaniyan, CEO of Pindrop Security. The problem extends beyond tech companies. Last year, the Justice Department alleged more than 300 U.S. firms inadvertently hired impostors with ties to North Korea, including major corporations across various sectors.
Businesses

Shopify CEO Says Staffers Need To Prove Jobs Can't Be Done By AI Before Asking for More Headcount (cnbc.com) 106

Shopify CEO Tobi Lutke is changing his company's approach to hiring in the age of AI. Employees will be expected to prove why they "cannot get what they want done using AI" before asking for more headcount and resources, Lutke wrote in a memo to staffers that he posted to X. From a report: "What would this area look like if autonomous AI agents were already part of the team?" Lutke wrote in the memo, which was sent to employees late last month. "This question can lead to really fun discussions and projects." Lutke also said there's a "fundamental expectation" across Shopify that employees embrace AI in their daily work, saying it has been a "multiplier" of productivity for those who have used it.

"I've seen many of these people approach implausible tasks, ones we wouldn't even have chosen to tackle before, with reflexive and brilliant usage of AI to get 100X the work done," Lutke wrote. The company, which sells web-based software that helps online retailers manage sales and run their operations, will factor AI usage into performance reviews, he added.

Facebook

Meta Got Caught Gaming AI Benchmarks 24

Meta released two new Llama 4 models over the weekend -- Scout and Maverick -- with claims that Maverick outperforms GPT-4o and Gemini 2.0 Flash on benchmarks. Maverick quickly secured the number-two spot on LMArena, behind only Gemini 2.5 Pro.

Researchers have since discovered that Meta used an "experimental chat version" of Maverick for LMArena testing that was "optimized for conversationality" rather than the publicly available version.

In response, LMArena said "Meta's interpretation of our policy did not match what we expect from model providers" and announced policy updates to prevent similar issues.
China

US's AI Lead Over China Rapidly Shrinking, Stanford Report Says (axios.com) 66

The U.S. is still the global leader in state-of-the-art AI, but China has closed the gap considerably, according to a new report from Stanford. Axios: Institutions based in the U.S. produced 40 AI models of note in 2024, compared with 15 from China and three from Europe, according to the eighth edition of Stanford's Artificial Intelligence Index, released on Monday.

However, the report found that Chinese models have rapidly caught up in quality, noting that Chinese models reached near parity on two key benchmarks after being behind leading U.S. models by double digit percentages a year earlier. Plus, it said, China is now leading the U.S. in AI publications and patents.

AI

Waymo May Use Interior Camera Data To Train Generative AI Models, Sell Ads (techcrunch.com) 35

An anonymous reader shares a report: Waymo is preparing to use data from its robotaxis, including video from interior cameras tied to rider identities, to train generative AI models, according to an unreleased version of its privacy policy found by researcher Jane Manchun Wong.

The draft language reveals Waymo may also share this data to personalize ads, raising fresh questions about how much of a rider's behavior inside autonomous vehicles could be repurposed for AI training and marketing. The privacy page states: "Waymo may share data to improve and analyze its functionality and to tailor products, services, ads, and offers to your interests. You can opt out of sharing your information with third parties, unless it's necessary to the functioning of the service."

AI

Microsoft AI Chief Sees Advantage in Building Models '3 or 6 Months Behind' (cnbc.com) 27

Microsoft's AI chief Mustafa Suleyman says the company has deliberately chosen to build AI models "three or six months behind" cutting-edge developments, citing cost savings and more focused implementation. "It's cheaper to give a specific answer once you've waited for the first three or six months for the frontier to go first. We call that off-frontier," Suleyman told CNBC.

"That's actually our strategy, is to really play a very tight second, given the capital-intensiveness of these models." Microsoft owns substantial Nvidia GPU capacity but sees no need to develop "the absolute frontier, the best model in the world first," as it would be "very, very expensive" and create unnecessary duplication, Suleyman said.

Despite its $13.75 billion investment in OpenAI, Microsoft added the startup to its list of competitors in July 2024. OpenAI subsequently announced a partnership with Oracle on its $500 billion Stargate project, departing from exclusive reliance on Microsoft's Azure cloud. "Look, it's absolutely mission-critical that long-term, we are able to do AI self-sufficiently at Microsoft," Suleyman said, while stressing the partnership with OpenAI would continue "until 2030 at least."

Slashdot Top Deals