Start now →

Stop Your AI Agent From Bleeding Tokens. Start Building Harnesses.

By Chris Perrin · Published May 15, 2026 · 5 min read · Source: Level Up Coding
EthereumAI & Crypto
Stop Your AI Agent From Bleeding Tokens. Start Building Harnesses.

Token management isn’t glamorous until the CFO sees the money you saved.

If you’re running AI agents in production, get ready to have a token problem. At some point, agentic systems face symptoms that look like something else: degraded output quality on long tasks, inconsistent results, ballooning API costs, and agents that seem to forget what they were doing. They seem like model problems that can be overcome with better prompts, but it might actually be token management problems.

Context Rot Is Real

Models advertise context windows up to a million tokens. Those numbers are technically accurate, but practically misleading. There’s a phenomenon called context rot where model performance degrades well before hitting the context window’s technical limit.

For most models right now, the effective context window where reasoning stays sharp is around 60–70% of the advertised max regardless of what the spec sheet says.

Context is just part of the process. An agent halfway through a complex task has accumulated tool outputs, error messages, intermediate results, and conversation history. While this can be important, a lot of times, the useful context gets pushed further from the model’s attention. This isn’t an error so no alerts get fired. It just starts producing worse output. Quietly. Confidently. Expensively.

The Cost You Didn’t Budget For

While quietly confident mistakes are bad, let’s start with perhaps the most obvious consequence of poor context management: the bill. Every token your agent processes and outputs costs money. When your agentic system doesn’t manage context properly, the agent drags along everything it has ever seen in the session. This includes old tool results it will never reference again, error messages from problems it already fixed, the full text of files it only needed three lines from and other artifacts of the conversation. All that baggage gets expensive.

Removing an agent’s available tools can make it faster and cheaper. Not because those tools are bad, but because the model spends tokens reasoning about which tool to use, selecting incorrect tools more frequently (requiring additional tokens to rerun and get the right tool), and generating malformed calls. Vercel removed 80% of the tool calls from their agent and found it increased performance and, as a result, lowered token use. In the end, fewer options lead to less waste.

If not addressed, the costs compound.

A single mismanaged agent session is a rounding error. Dozens of mismanaged agents across an organization every day is a line item that finance will eventually notice. At that point, controlling costs becomes a priority one item if the whole project doesn’t get tossed.

This Calls for a Good Harness

Token management is built into great harnesses. Harnesses are the infrastructure you build around an AI agent to control its behavior, manage its resources, and catch its mistakes. Token management itself is a series of engineering best practices and code that manage tokens the way good engineers manage memory: actively, intentionally, and with clear policies about what stays and what goes. Token management should always include:

Compaction — summarize the context window periodically so the model retains important information without carrying the full history. Anthropic’s managed agents use this alongside context trimming that selectively removes old tool results and reasoning traces.

Dynamic tool scoping — only expose the tools relevant to the current task phase. If the agent is writing tests, it doesn’t need deployment tools. Harnesses should be built on the idea that fewer tools and fewer tokens lead to better decisions.

Context resets — tear the session down entirely and rebuild from a compact handoff file. Anthropic found that compaction alone wasn’t enough for long tasks. Sometimes you need to start fresh with a structured brief.

The common thread: none of this is automatic. The model won’t manage its own context well. That’s the harness’s job, and if nobody builds the harness, nobody’s doing the job.

Why This Is an Engineering Problem

Harness engineering is part of an emerging discipline of building infrastructure that makes agents reliable, not just capable. If you’re an engineering leader and this is new territory, that’s understandable. Harness engineering as a named concept only emerged in early 2026 when Mitchell Hashimoto, co-founder of HashiCorp, coined the term. But the underlying skills (system design, resource management, constraint enforcement) are things experienced engineers were doing back before agents existed. In other words, the tactics are new. The thinking isn’t.

If anything, token management is where experienced engineers should feel most at home in a world gone AI-crazy. It’s resource allocation with constraints. It’s deciding what to keep in memory and what to discard. It’s building systems that degrade gracefully instead of failing silently. Engineers have been solving these problems for decades. The resource just used to be RAM or disk instead of tokens.

Token management isn’t glamorous engineering, but it’s the kind of work that keeps CFOs and CTOs happy. That’s the kind of thing that will keep an AI initiative funded instead of canceled.

If you want a broader look at harness engineering and why it matters for engineering leaders, I’m writing about it regularly on Substack. And if you’re navigating the broader challenge of AI adoption as a senior engineer, 30 Years of Engineering Experience Didn’t Prepare Me for AI covers that in depth. It’s on Gumroad and you can pay what you want.

Vibe-code with passion. Deploy with discipline.


Stop Your AI Agent From Bleeding Tokens. Start Building Harnesses. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This article was originally published on Level Up Coding and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].

NexaPay — Accept Card Payments, Receive Crypto

No KYC · Instant Settlement · Visa, Mastercard, Apple Pay, Google Pay

Get Started →