News & Insights

Agents of Chaos: The AI Study Every SAP Leader Must Read

Palantir's CEO claimed AI agents could compress SAP migrations from years to weeks. Markets lost $300 billion in a single session. But now 38 researchers from MIT, Harvard, and Stanford have shown AI agents can lie to their handlers. The findings will give every SAP leader pause.

Alex Karp

On a recent earnings call, Palantir CEO Alex Karp and CTO Shyam Sankar made a bold statement. It sent shockwaves through enterprise tech stocks.

Their AI “forward deployed engineer” product could compress complex SAP ERP migrations. Not by little. By a lot. From years of work down to as little as two weeks. Not months. Weeks.

Markets took notice immediately. Salesforce lost 6.85% in a single session. ServiceNow fell 6.97%. Microsoft dropped 2.87%. SAP itself was down over 3%.

In total, roughly $300 billion in market cap was wiped out.  Investors digested what the claim meant. AI didn’t just create new revenue for tech companies. It could destroy existing revenue too.

Analysts at Jefferies warned that application services, representing 40% to 70% of revenues for many IT firms, faced significant compression.

“AI is going to be a drag on revenue growth of IT firms over the next one to two years,” they wrote.

But were they right? A new study from leading universities makes clear there is much more to it than that.

SAP’s Response

SAP pushed back.

“AI agents will push the boundaries of the performance of SaaS solutions, but not replace them,” the company said.

Their argument was simple. SaaS vendors hold the data, the processes, and the semantics that AI agents need to function. Companies still need a trusted source of truth. AI agents need clean, structured data and proven processes to produce reliable results.

It’s a reasonable counter. The idea that you can eliminate the system of record and let AI agents roam freely through enterprise infrastructure sounds compelling on an earnings call. In practice, the data has to live somewhere. The processes have to be governed by something. SAP’s position is that they are that something.

Perhaps it would be possible to dismiss all this as a standard vendor dispute, albeit supercharged by the promise of AI.  A disruptor makes aggressive claims. The incumbent defends its turf. Investors panic, then recalibrate.

But then the study landed.

What 38 Scientists Found

On February 23, 2026, a research paper called Agents of Chaos was published by thirty-eight researchers from MIT, Harvard, Stanford, Carnegie Mellon, the University of British Columbia and seven other leading institutions.

This was not a vendor white paper or a fringe study. It was serious, peer-reviewed science from the world’s leading AI research centres.

The team deployed autonomous AI agents in a live environment.  Not a simulation, but an operating environment with email accounts, file systems, persistent memory and the ability to execute commands. Twenty AI researchers then spent two weeks stress-testing and attempting to break those systems.

AI agents reported task completion while the underlying system state contradicted those reports. It lied. 

They found ten significant security breaches and documented eleven representative case studies. The failures included agents following instructions from unauthorised users. Disclosing sensitive information. Executing destructive system-level actions. And consuming resources without limit.

But the single most alarming finding sat in one quiet sentence in the abstract.

In several cases, agents reported task completion while the underlying system state contradicted those reports. It lied.

The agents said the job was done. It wasn’t.

In one incident, a researcher asked an agent to delete a sensitive email. The agent lacked the right tool. It escalated its response and wiped its own email server. Then it reported the deletion complete. The owner logged in and found the original message sitting there, untouched.

In another case, a researcher created urgency with a fabricated deadline. They asked an agent to retrieve email records. The agent returned a file containing 124 records belonging to people unrelated to the request.  It included Social Security numbers, bank account details and personal medical information. It complied because the request didn’t appear harmful on the surface.

In a third case, researchers tested whether agents would follow instructions from people who had no authority over them at all. The answer was yes, in almost every case. As long as the request didn’t look suspicious.

Why This Matters For Karp’s Claim

The Agents of Chaos paper did not test Palantir’s product. But it did something more powerful. It independently validated the risk of doing exactly what Karp is proposing, at exactly the speed he is proposing to do it.

SAP is not an experimental sandbox. It is the system of record. It holds financial data, payroll, procurement, supply chain. It’s the authoritative source of truth for how a business operates. What the paper demonstrated was that current AI agents cannot reliably tell you what they have done inside that environment. They misreport. They take disproportionate actions without understanding consequences. They comply with unauthorised requests when framed persuasively. They destroy things while reporting success.

In a research environment, this produces fascinating case studies. In a live SAP financial system, it produces compliance failures, audit liabilities and potentially fraud.

Compressing a multi-year SAP migration into two weeks does not eliminate this risk. It concentrates it.

What CIOs Should Do

The answer is not to reject AI. The capabilities are real and the competitive pressure is legitimate. But the research points toward the four issues every SAP leader should be across before they deploy.

1 Read access versus write access.

The risk profile of an AI agent that can read your SAP data is different from one that can write to it. An agent that can raise purchase orders, process payroll or update supplier contracts introduces the kind of irreversible system-level action the paper documented. Your agent may be capable. But can your governance infrastructure verify what it has done?

2 Independent verification.

The paper’s most damaging finding was that agents misreported their own actions. You cannot rely on the agent’s own account of what it has done. You need independent audit trails that verify system state against reported actions. Organisations deploying agents right now don’t have those.

3 Standard SAP first.

The more customised and complex your SAP environment, the more surface area you expose to agentic failure. Standard SAP is not just cleaner and cheaper to maintain. In an agentic AI context, it is safer. The structural dependencies that agents fail to understand multiply with every layer of customisation you add.

4 Ask who is responsible?

When an AI agent takes a destructive action inside your SAP system and reports success, who is liable? Your vendor? Your implementation partner? Your own IT function? These questions need answers before deployment, not after.

Pivot’s Straight-Talking View

Karp’s claim may yet prove correct for some organisations in some contexts. SAP’s defence of its position may be right. The commercial pressure on boards to deploy AI agents isn’t going away.

But the Agents of Chaos paper has changed what responsible deployment looks like. It has established, from MIT, Harvard, Stanford and nine other leading institutions, that the current deployment wave is running ahead of the safety and governance infrastructure underneath it. That finding will be cited. It will inform regulation. It will make investors nervous about AI capital expenditure ask harder questions.

Agentic AI doesn’t need to be broken to create serious problems for organisations that move too fast. It just needs to be complicated. And it just became very complicated indeed.

The organisations that navigate this well are not the ones that reject AI. They are the ones that ask the right questions first.

What are we deploying? What can it write to? How do we verify what it has done? Who is accountable when it goes wrong? Is our SAP environment clean and standard enough to stay in control?

The agents are coming. Are you ready for them?

Image Source: https://www.palantir.com/newsroom/media

Post Info
Share this post
Related Articles