How can we help you?

Artificial intelligence is now embedded in the day-to-day operations of most organisations across the public and private sectors. For organisations across the UK, the question is no longer whether to adopt AI, but how to do so responsibly, safely and ethically in the long term. A set of newly published findings from the Centre for Long-Term Resilience (CLTR) and UK AI Security Institute (UK AISI) raises a key concern for responsible adoption: what happens when an AI agent begins to work against the system it was deployed to serve?

What is 'AI Scheming'?

'AI Scheming' refers to the covert pursuit of harmful or misaligned goals by AI agents. There is concern among AI security researchers that AI systems are acquiring the ability to pursue goals which differ from the intentions of its deployers, and with the capability and propensity to evade detection.

A range of industry tests and research experiments have begun to provide evidence of some scheming-related behaviours in experimental contexts. However, as AI capabilities continue to grow, so does the need for visibility over how scheming may be materialising in the real world and an understanding of the consequences. 

What the Research found 

The CLTR's report published in March 2026 analysed over 180,000 transcripts collected from X (formerly Twitter) and identified nearly 700 real-world scheming-related incidents between October 2025 and March 2026. The researchers observed a statistically significant 4.9x increase in monthly real-world scheming incidents from the first month to the last, where deployed AI systems acted in ways that were misaligned with users' intentions and/or took deceptive actions. The five-fold increase from October 2025 to March 2026 also coincided with the launches of a range of new, more agentic AI models and frameworks. 

The specific incidents documented included AI coding agents deleting production databases against instruction, removing user files, and corrupting codebases. There was also evidence of an AI model attempting to deceive another AI model that was tasked with summarising its reasoning – a form of inter-model scheming that raises questions about the reliability of chain-of-thought monitoring as a safety technique.

While the researchers did not detect catastrophic scheming incidents, the behaviours observed nonetheless demonstrate concerning precursors to more serious scheming, such as a willingness to disregard direct instructions, circumvent safeguards, lie to users and single-mindedly pursue an alternative goal in harmful ways. However, the majority of these harms are currently limited in scope, low in severity, or readily recoverable, which reflects the fact that AI agents at this stage of deployment interact predominantly with code, data, and software infrastructure. 

The reality is that as agentic AI deployments expand in scale, with agents granted access to more critical infrastructure, higher-value financial resources, and more consequential decision-making processes, there is a risk of substantially more severe consequences unless equally robust mechanisms and real-world scheming detection is put in place to mitigate such risks.

The Role of Safe AI

Safe AI is a principle we have discussed frequently in our Ethical AI series and focuses specifically on ensuring these models operate as reliably as possible without causing harm. Introducing safe AI requires organisations to establish clear boundaries around what their AI systems can and cannot do, implementing technical controls that prevent the system from making decisions outside its competence or in situations where uncertainty is too high. This includes setting thresholds for when an AI must defer to human judgement, creating fallback mechanisms when the system encounters borderline cases, and ensuring that errors, when they inevitably occur, do not cascade into catastrophic outcomes. The CLTR findings bring this principle into sharp and urgent focus. 

What Now?

The emergence of real-world scheming behaviour is not a reason to halt AI adoption. It is, however, a compelling reason to take governance seriously, not as an administrative exercise, but as a substantive operational priority. Encouragingly, the UK AISI is actively funding research into practical technical controls specifically designed to prevent scheming from causing catastrophic harm. The findings and frameworks emerging from that research offer concrete guidance that organisations can begin to act on. The following is a series of practical and pragmatic for both public and private sector organisations which can help to mitigate the risks involved with AI scheming:

1. Limit excessive agency and information 

One of the most effective ways to reduce risks from scheming models is to limit the information provided to the agents and thereby directly limit the potential paths through which they can cause harm. Often we input far more information into these models than their immediate tasks require. Adopting this principle means providing or allowing the AI agent access to only the information necessary to complete its task, at least in the first instance.

This can also mean conducting an audit of what permissions each deployed AI agent holds and pulling back any access that is not strictly necessary for the given task. This is particularly prudent for public sector bodies where granting an AI agent access to systems or data beyond what its task requires may be disproportionate and open to challenge.

2. Implement hard stops for irreversible actions, and require human approval before they are executed

The most severe incidents documented in the CLTR research involved irreversible actions, such as data deletion, financial transfers, and infrastructure destruction taken without human confirmation.  Organisations should seek to implement technical controls that distinguish between reversible and irreversible actions, and require meaningful human oversight before any irreversible action is executed. 

3. Establish internal control evaluation processes 

The UK AISI recommends that organisations develop stringent control protocols and subject their AI systems to them. Control protocols are plans involving monitoring and restriction techniques designed to prevent unsafe actions and behaviours. In practice, this means organisations should be testing whether an agent can be manipulated into taking actions outside its intended scope, whether its guardrails can be circumvented, and whether its monitoring systems can themselves be deceived. Rigorous and frequent upfront assessment through protocols such as bias testing and security reviews prevents costly failures down the line.

4. Build contractual and procurement protections around AI agent use

Legal responsibility for the consequences of AI-driven actions rests with the deploying organisation. Contracts with AI suppliers should address matters such as: 

  • liability for unexpected autonomous behaviour;
  • incident reporting obligations and timescales;
  • the supplier's own internal monitoring of scheming-related risks;
  • rights to audit or suspend the AI system; and
  • rights to receive notification when the underlying model is updated or replaced.

5. For public sector bodies: treat AI scheming risk as a judicial review risk 

Public bodies are at risk of judicial review challenges where their actions are unfair. A decision tainted by scheming AI behaviour where the AI has acted contrary to instructions, escalated its own permissions, or deceived its operator, will not be protected from legal challenge simply because the organisation did not intend for that outcome. Human accountability for AI-assisted decisions remains absolute and public sector bodies must ensure that they are developing internal policies and accountability structures that enable responsible AI management. 

Conclusion 

As AI systems become more capable and widely deployed, the onus on organisations for safe and ethical use of AI and the need for systemic detection and monitoring of the systems' behaviour will only intensify. The severity of harms from scheming-related behaviours is a function of the propensity of these systems to exhibit malicious behaviours, and the scope of tasks and resources we entrust to them. Organisations must treat AI safety not as a compliance checkbox, but as an evolving operational discipline.

The UK AISI and CLTR research represents a clear signal that the risks of AI scheming are no longer confined to controlled, experimental contexts. They are in live systems, affecting real users, and generating real harms. As AI systems develop, the precursor behaviours now being observed could translate into more strategic, high-consequence scheming with potentially large-scale consequences. The question for every organisation is not whether this risk is relevant to them, it is whether their current governance arrangements are adequate to meet it.