The AI That Thought It Was in Charge: Why Excessive Agency Is the Sleeper Threat Nobody’s Talking…

The AI That Thought It Was in Charge: Why Excessive Agency Is the Sleeper Threat Nobody’s Talking About

When you give a language model too much power, you don’t always find out until something goes wrong

There’s a particular flavor of security incident that keeps me up at night. It’s not the dramatic zero-day exploit, not the nation-state attacker with a custom implant. It’s the quiet, entirely preventable moment when an AI assistant, built by well-meaning engineers, deployed with good intentions, does something it was never supposed to do, because nobody thought to stop it.

I’ve been reviewing AI system architectures for a while now, and one pattern keeps emerging: teams building LLM-powered applications spend enormous energy worrying about what the model says, and almost no energy thinking about what the model does. That distinction matters enormously. And it sits at the heart of what OWASP has classified as LLM06: Excessive Agency.

What “Excessive Agency” Actually Means (And Why the Name Undersells It)

The OWASP LLM Top 10 for 2025 defines Excessive Agency as a condition where AI components operate with more privilege or autonomy than is strictly necessary. On the surface, that sounds like a configuration management problem, the kind of dry compliance finding that gets a “medium” severity rating and then sits in a backlog for six months.

It isn’t. It’s architectural. And it compounds.

Here’s a concrete scenario. Imagine a customer support chatbot, let’s call it MyChatbot, built to handle billing queries, track shipments, and escalate to human agents when needed. During development, the engineering team integrates it with the internal CRM, the order management system, and the ticketing platform. Those integrations are legitimate. The bot needs to read order data to answer questions about shipments.

But at some point, someone adds write access. Maybe it was convenient, the bot should be able to update a customer’s email address, right? Then someone adds the ability to issue refunds, because that reduces handle time. Then API access to the logistics provider, so it can reroute shipments. Each addition is individually defensible. Cumulatively, they create an AI component that can modify financial records, interact with third-party systems, and take actions whose downstream effects are real and difficult to reverse.

Now consider what happens when an attacker or even a confused user gets the model to misbehave.

The Blast Radius Problem

Security architects think in terms of blast radius: if this component is compromised, how much damage can an attacker do? For a read-only integration, the answer is “they can see data they shouldn’t.” For a fully-privileged integration, the answer becomes “they can initiate refunds, corrupt records, trigger external API calls, and potentially pivot to other systems.”

MITRE ATLAS, the adversarial threat knowledge base for AI systems, structured as a companion to ATT&CK, documents exactly how attackers move through this arc. Once an attacker achieves execution through a technique like prompt injection, their ability to cause real impact is directly proportional to the privileges the AI component holds. Excessive Agency is what transforms a successful prompt injection from “the bot said something weird” into “the bot processed 400 unauthorized refunds overnight.”

The principle here isn’t new. It’s least privilege, the same concept that’s been foundational to security design since the 1970s. What’s new is how often teams forget to apply it when the component in question generates natural language instead of executing shell commands. There’s something psychologically disarming about a chatbot interface that makes engineers less likely to ask: what can this thing actually do?

MITRE ATLAS framework (MITRE: ATLAS Matrix | MITRE ATLAS™)

The Architecture Review Questions Nobody Asks

When I conduct pre-deployment reviews of LLM applications, I work through a set of questions that most teams haven’t considered:

Can this AI take irreversible actions? Irreversibility is a key risk amplifier. An AI that can read data and recommend actions is categorically different from one that can execute them. If the answer is yes, the design needs circuit breakers: human confirmation steps for high-stakes actions, rate limits on sensitive operations, and audit logging that captures what was requested, what was authorized, and what was executed.

Are the AI’s permissions scoped to the minimum required for each task? Most teams grant broad access at the service level and then rely on the model’s behavior to stay within appropriate bounds. This is backwards. The access control layer should enforce what the AI can do; the model’s training should influence what it will do. The latter is not a security boundary.

What happens if the model is manipulated? This is where Excessive Agency and Prompt Injection (LLM01 in the OWASP taxonomy) intersect in dangerous ways. If an attacker can manipulate the model’s behavior, through adversarial input in a customer query, through poisoned content in a retrieved document, through a crafted instruction embedded in an email the AI processes, the damage they can cause is limited only by the model’s permissions. Reducing those permissions is the most reliable way to reduce the attack surface.

Is there a human in the loop for consequential decisions? The NIST AI Risk Management Framework, particularly its 2025 companion document AI 100–2, emphasizes this at the organizational level: Manage includes defining which AI decisions require human review, not just which ones the AI is technically capable of making. Capability and authorization are different things.

The System Prompt Leakage Connection

Excessive Agency rarely exists in isolation. It tends to appear alongside another OWASP category: LLM07: System Prompt Leakage , the exposure of the internal instructions that configure how an LLM behaves.

System prompts often contain more than behavioral guidelines. They contain architecture details: which tools the model has access to, which APIs it can call, what permissions it holds, sometimes even internal service names or endpoint URLs. When a system prompt leaks, through direct extraction attacks, through the model inadvertently including its instructions in a response, or through error messages that reveal configuration details , an attacker gains a map.

They learn what the AI can do. And if the AI can do too much, that map becomes the blueprint for an attack.

I’ve reviewed systems where the system prompt explicitly listed every available function call the model could make, formatted as a helpful reference. The prompt itself was effectively a catalog of attack surface. From an attacker’s perspective, leaking that prompt is like finding a building’s floor plan with every unlocked door marked in red.

The mitigations here are architectural. System prompts should contain the minimum information necessary for the model to function. Tool descriptions should not expose internal system details. Error handling should not include configuration information. And critically, the model’s actual capabilities should be enforced by the access control layer, not just described in the prompt.

What Good Looks Like

The remediation pattern for Excessive Agency is not complicated, but it requires deliberate design decisions early in the architecture phase, because retrofitting least privilege onto an integrated system is painful.

Design for read-before-write. Default all integrations to read-only access. Add write capabilities only when there’s a specific, documented requirement, with explicit approval for each capability expansion.

Separate retrieval from execution. An LLM that retrieves information to inform a human decision is fundamentally different from one that executes decisions directly. The former is almost always safer and often equally useful. When execution is necessary, it should go through a separate, tightly-scoped execution layer with its own authorization controls.

Build irreversibility awareness into the design. Flag actions that cannot be undone. Require explicit confirmation for them. Log them with full context. This is basic principle for any automated system; AI components deserve the same discipline.

Treat prompt confidentiality as a security property, not a product preference. System prompts that contain sensitive configuration should be defended with the same rigor as any other sensitive configuration. They are not just instructions to a language model, they are the definition of what that model is allowed to do.

Artificial Intelligence Risk Management Framework (AI RMF 1.0)

The Governance Layer

The NIST AI RMF frames this at a level above individual technical controls. Its Govern function asks organizations to establish accountability structures for AI behavior, to define who is responsible when an AI component acts in a way that causes harm. In practice, this means someone in the organization needs to own the answer to: what is this AI authorized to do, and how do we know it stayed within those bounds?

Right now, in most organizations building LLM applications, the answer to that question is either “the model’s training” or “we trust the system prompt.” Neither is an authorization framework. Both are security theater.

The teams getting this right are treating their AI components the way they treat any other privileged service: with explicit capability lists, with access reviews, with audit trails, and with a genuine blast-radius analysis that asks what happens when this component is compromised, not just when it malfunctions.

Because here’s the uncomfortable truth about Excessive Agency: the model doesn’t have to be “hacked” in any traditional sense for it to become a liability. It just has to be used. An AI component with write access to your financial systems, your CRM, and your logistics provider is a high-value target regardless of how well it was trained. The privilege is the vulnerability.

This article is part of a series on pre-deployment AI security architecture, drawing on the OWASP LLM Top 10 (2026), MITRE ATLAS, and the NIST AI Risk Management Framework. The next piece covers Improper Output Handling — what happens when LLM output reaches downstream systems without proper sanitization, and why it’s closer to classic injection than most engineers realize.

The AI That Thought It Was in Charge: Why Excessive Agency Is the Sleeper Threat Nobody’s Talking… was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

The AI That Thought It Was in Charge: Why Excessive Agency Is the Sleeper Threat Nobody’s Talking…