Six tools for an agent, six hundred methods for a human

Baris Sozen8 min read·Just now

Why agent-shaped APIs beat SDK-shaped APIs for AI-driven trading — and what the right tool count actually looks like.

The most common pushback we hear about Hashlock Markets is one sentence long.

> “Six MCP tools? That feels thin.”

It’s a fair instinct if your reference frame is the SDK era. A serious trading SDK is a thousand-method surface. CCXT alone wraps a hundred-plus exchanges and exposes a unified interface that has been growing for years. The Bybit Python SDK has ninety-something endpoints. A modern broker SDK reads more like a textbook than a library.

When you put that mental model next to a six-tool MCP server, it feels under-engineered. The instinct is correct under one assumption — that a *human developer* will write the integration code. Under that assumption, more methods means fewer custom helpers, less work, more flexibility.

Here’s the thing. Most of the trades inside an AI agent aren’t being designed by a human developer. They’re being **described in natural language by a user, parsed by the model, and called by a planner that has to keep every method’s behavior in its head as part of the prompt context**. The SDK-era assumption doesn’t hold any more.

Once that assumption breaks, six tools stops looking thin. It starts looking like the right number.

This piece lays out the argument. Top-of-funnel — no jargon-heavy walkthrough, no schema dumps. Just the case for why the trading surface an agent actually calls should be deliberately small, and why we built ours that way.

## The SDK-shaped API is a human convenience

An SDK is a productivity tool for a developer who is going to write a non-trivial amount of code on top of it. It is deliberately granular — fifty methods, because some user might want to filter by exactly that field, paginate exactly that way, or stitch exactly those two endpoints together in a custom way.

Granularity is *correct* for a human, because a human can read docs once and remember the layout. The cost of one hundred methods is amortized across years of an integration’s life. The benefit is flexibility for the unknown shape of the integration.

The economics flip when the *consumer of the API is itself a model*.

Now every method is something the model has to either keep in its prompt or look up at runtime. Every version is a tiny regression risk. Every “undocumented edge case in method 87” becomes a silent failure mode that a non-deterministic planner can rediscover in production. The granularity that was free for a human becomes a tax on the agent.

The unit economics of an agent-callable API:

- Every tool is context. Tools that the model never calls still cost prompt tokens to describe, every single turn.
- Every tool is a branch the planner can pick wrong. More tools means more chances for the planner to confuse two semantically-close behaviors.
- Every version pin is a regression risk. A breaking change in a sixteenth-most-used method can silently break trades.
- Every edge case is a non-deterministic failure mode. Models sometimes call APIs in ways no human would. The fewer surfaces, the fewer of those.

A human gets bored if they have to read more than ten pages. A model can read ten thousand pages. But it pays for every page in attention, in compute, and — if the API is being called inside a real trading loop — in latency. The SDK-shaped API was never optimized for that cost structure.

## The agent-shaped API is the smallest surface that still completes the job

An agent-shaped API answers a different question: what is the *minimum* set of tools that lets a planning model take a trading intent from “natural language request” to “atomic settlement on chain”?

Walk through what an OTC trade actually requires:

1. The user wants to buy or sell an asset for another asset, with some constraints. That’s an **intent**.
2. The protocol needs to broadcast that intent to a set of market makers privately, so the order doesn’t leak. That’s a **sealed-bid RFQ**.
3. A market maker who likes the price needs to commit a price quote. That’s a **response**.
4. Once a quote is picked, both sides need to lock funds in a way that lets either claim atomically with a shared secret, or refund after a deadline. That’s a **hash-time-locked contract** with four lifecycle operations: lock, claim, refund, inspect.

That’s the trade. Every piece of work an agent does end-to-end is one of those steps.

Six tools is not a limit we picked; it’s what falls out of writing the *minimum* version of the surface above:

- `create_rfq` — taker side: post a sealed-bid intent for a trade.
- `respond_rfq` — maker side: commit a price quote against an intent.
- `create_htlc` — either side: record an on-chain lock for the leg you owe.
- `withdraw_htlc` — either side: claim an HTLC by revealing the preimage.
- `refund_htlc` — either side: reclaim a leg if the counterparty disappears past the deadline.
- `get_htlc` — either side: inspect the live state of any in-flight swap.

That’s it. There is no `list_markets`, no `get_orderbook`, no `cancel_order`, no `set_leverage`, no `transfer_internal`, no twenty other methods that exist in a CEX SDK because exchange UI screens needed them. We don’t need them, because the model the protocol exposes is not a CEX. It’s a private auction plus a four-state settlement primitive.

Critically, the trade either settles atomically or it doesn’t. There is no half-state to inspect, no “fill ratio” to query, no “stuck order” the agent has to reason about. The HTLC’s four-state lifecycle (locked, withdrawn, refunded, expired) is a state machine that fits in any reasonable system prompt without using up the planner’s working memory.

## What “agent-shaped” actually means in practice

Here’s the same trade in two API shapes, side by side.

**SDK-shaped, on a CEX:** ten calls, six different resources. The planner has to know which resource each method lives on, that order IDs come back as strings on this exchange and integers on that one, that “filled” is a status string here and a boolean over there. Get-account-balance, get-market-info, get-ticker, get-orderbook, create-order, get-order (poll), get-order (poll), maybe cancel-order, get-trade-history, get-account-balance.

**Agent-shaped, on Hashlock Markets:** create_rfq → respond_rfq → create_htlc (taker leg) → create_htlc (maker leg) → withdraw_htlc → withdraw_htlc → get_htlc (anytime). Six different tools, each with a clear lifecycle role, and a state machine the planner can reason about deterministically. No “is the order partially filled,” no “did my limit price actually hit,” no “do I need to also cancel the residual.” Either both sides have the preimage or both sides refund. That is the *only* terminal state.

The reduction in surface area isn’t a packaging trick. It comes from a different settlement primitive. Sealed-bid RFQ skips order books. HTLC skips custody. Atomic cross-chain settlement skips bridges. Each subtraction removes the tools that would have been needed to reason about *that* set of failure modes.

## The MCP spec is what makes this auditable

The reason the six tools work natively across Claude Desktop, Cursor, Windsurf, OpenAI agent runtimes, and LangChain is that they are exposed via the Model Context Protocol — a tool-surface contract Anthropic published as an open standard.

MCP gives you three things that matter for trading. A typed tool schema, where every tool’s inputs are JSON-schema and every output is structured. A stateless transport, where the same six tools are available over local stdio (`npx -y` the canonical hashlock-tech/mcp scoped package) and over Streamable HTTP at `hashlock.markets/mcp`. And an introspectable contract, where the trace through any production trade is just a sequence of typed tool calls.

This matters less for the planner and more for the operator. When you wire an agent into an SDK and something goes wrong at 3 AM, you debug into someone else’s library. When you wire an agent into an MCP server and something goes wrong, you debug into a six-call trace. The mean time to “I understand what happened” is an order of magnitude lower.

## What you actually lose by going agent-shaped

Worth being honest about the trade-offs.

You lose orderbook-style market structure — no top-of-book quotes, no Level-2 depth, no trailing-stops or iceberg orders. If your strategy depends on reading the book, you should not use this; use a CEX-MCP that wraps a CEX’s full trading API, and pay the surface-area tax accordingly.

You lose continuous mid-quotes for free. A sealed-bid RFQ has to have a maker price for *your specific* intent. If no maker is online for your size on your pair, you wait. A streaming order book gives you a quote even when nobody’s serious about your size, but that quote is not honored unless you actually fill against it.

You lose order types beyond market-and-limit — no stops, no brackets, no OCO, no conditional triggers. The intent says “I want to buy X for Y by deadline T.” The maker either responds or doesn’t.

In return you get six tools to reason about, sealed-bid pricing instead of public order leakage, atomic cross-chain settlement instead of trust-the-bridge, a fee floor of 1–2 bps versus the 8–10 bps your CEX is paying internalised by spread, and a surface that the planner can hold in working memory across an entire trade lifecycle.

Most of the strategies an AI agent should run for an end-user — single-asset rebalancing, USD-out at a budgeted slippage, cross-chain hedging, paying a counterparty in a different stablecoin than they invoiced — fit cleanly into this surface. The strategies that don’t fit (HFT, market-making at the venue, complex orderbook scalping) are not strategies an end-user wants their *agent* doing autonomously anyway.

## Six on top, four underneath

The provocation works because the headline is short. The argument that holds it up is longer.

A reasonable follow-up is that “thin tool surface” only works if the protocol is doing the heavy lifting underneath. That’s true. The four filters that quotes pass through before the agent ever sees them — counterparty KYC tier, bonded reputation, price-deviation guard, and ring privacy — are part of why six tools is sufficient. The protocol absorbs the validation that would otherwise have been a dozen extra methods on the SDK side.

So the full thesis isn’t just “fewer tools is better.” It’s “fewer *agent-callable* tools, plus a heavier *protocol-side* validation layer, is what an MCP-driven trading surface should look like.”

Six on top, four underneath. That’s the shape.

## Closing question

The agent-shaped API thesis is a young one. We’re 18 months into the MCP era and the right number of tools per protocol is still being argued out across DeFi, agent-runtime, and broker categories.

If you’re building or evaluating MCPs in this space, here’s the question I’d ask yourself before adding the next method:

**What is the smallest tool surface you trust an AI agent with end-to-end on real money?**

I’d argue it’s smaller than your current SDK. Possibly much smaller. The trades you want an agent doing *autonomously*, on *your* funds, are the ones that fit in a state machine the planner can hold in working memory.

If you want to compare the surface yourself, the canonical npm package is at `hashlock-tech/mcp` (scoped, on npm — install with `npx -y` and the package name) or remote at `https://hashlock.markets/mcp`. Six tools. One auth flow. Three chains live: Ethereum, Bitcoin, Sui — Base, Arbitrum, Solana, and TON on the roadmap.

Same surface either way.

[hashlock.markets](https://hashlock.markets)