The wider harness

Harness engineering gave us a better name for the work around the model. Building digital workers asks the harness to stretch one layer further.

The phrase harness engineering has done a quiet but useful job in the last eighteen months. It gave us a better name for the work most teams were still calling prompt engineering.

The shorthand carries Harrison Chase’s argument about Deep Agents, Anthropic’s published reflections on Claude Code, and how platforms like Wonderful position themselves around skills and continuous evaluation. What used to be write a better prompt has become engineer the scaffolding around the model: planning tools, subagent isolation, filesystem as scratchpad, context discipline, evaluation running in production. The model is one component. The harness is the system.

The discipline is real, and the architectural promise is being delivered. But the harness that has been formalised so far is the harness for a task agent — an agent given a discrete piece of work, expected to complete it inside one or a few sessions, then move on. The Claude Code harness exists so a model writes code well across a few hours. The Deep Agents harness exists so a research agent doesn’t unravel across thirty tool calls. A customer-service harness exists so a skill stays robust across deploys.

What some teams are starting to build in production is a different unit of work entirely. Not a task agent operating a workflow, but a digital worker operating a function. The difference is not size. It is horizon.

The unit shift

A task agent is contracted for an output. A digital worker is contracted for an outcome that recurs.

You don’t deploy that. You onboard it.

The vocabulary shift is not cosmetic. The system prompt does the work of identity — until two analysts disagree about who owns the worker. The toolset does the work of capability — until an auditor needs to know what the worker is permitted to touch. The memory store does the work of cognition — until a thesis is updated and you need to reconstruct which historical decisions were made under which version. The logs do the work of governance — until something goes wrong and who do I call has no single answer.

None of these are model problems. They are harness problems, but they sit at a layer above the loop.

Six dimensions of the wider harness

The shape the wider harness lands on is not an invention. It is the set of questions any organisation asks before a hire: who are you, what do you need to know, what are you able to do, how do you behave, how do you learn, who owns the result.

Identity answers who is this? For a task agent, identity collapses into a system prompt. For a digital worker, identity is a contract: a name, a purpose stated in one sentence, an explicit scope, an explicit non-scope, and a single human sponsor who owns the function the worker now operates beside. Tracking identity at this resolution is what makes a worker hireable — and, when needed, retireable.

Context answers what does it need to know to operate? In task-agent harnesses this is essentially the prompt plus a retrieval system. In a digital worker, context is inherited from the function: the active strategy of the area that contracts the worker, the brand voice it speaks in, the named humans it relates to on each side, the environment of systems it touches. Strategy is inherited from the area-client, not invented by the engineering team. That separation protects the function from becoming a software project and the studio from becoming a strategy department.

Capability is the closest the wider harness gets to the existing one. Skills, tools, workflows, deliverables — the layer where most of the harness-engineering literature already lives. The discipline transfers cleanly: composable units, governed independently, validated in isolation, propagating fixes through composition rather than through prompt edits. Where the wider frame adds something is in capability boundaries. A digital worker has named capabilities, each with a defined trigger, a defined output, a measurable acceptance criterion. The capability is not what the worker can do. It is what it is accountable for.

Conduct answers how does it act? This is where autonomy lives, and where the task-agent harness is least equipped. In a task agent, autonomy is a configuration: tools either get called or they don’t, approvals either happen or they don’t. In a digital worker, autonomy is a gradient — granular per action type and progressed on measured maturity. Detecting an announced funding round in European press is reversible and low-risk; it can run autonomously within weeks. Dispatching outreach to a founder is irreversible; it stays in explicit human approval indefinitely. Conduct is also where the worker’s predictability under unexpected scenarios is engineered. A good harness here means a sponsor can guess what the worker will do before something unusual happens.

Cognition answers how does it learn, remember, and consult? Memory, in the task-agent harness, is a store. In the wider harness, cognition is a structure: a hierarchy of sources of truth, an explicit conflict-resolution order, a learning loop closed against outcomes the worker can observe over time. A worker that recommends investment decisions needs to know its scoring model is calibrated against passes it has made. A worker that runs capacity allocation needs to remember that last Monday’s divergence between its proposal and the human’s decision is data for next Monday’s tuning. None of that is the model. All of it is the harness.

Governance answers how is it managed and held to account? This is the layer the task-agent literature has thought about least, and the layer that distinguishes a digital worker in production from a demo that survives a quarter. A worker without a named owner, a measurable KPI, and a contingency works — until the day it doesn’t. The harness here includes a service account dedicated per worker, audit trails that distinguish worker actions from human actions, a documented retirement path, and a pause mechanism the sponsor can pull without engineering on call.

Six layers, six questions, one page. The vocabulary is stable because the questions are old. What is new is treating them as engineering surface.

Field notes

The discipline transfers under load.

In one implementation, a single digital worker owns the capacity-management function of a several-thousand-person services firm. Five capabilities, one human sponsor, three phases of autonomy progressed by metric — not calendar — over twelve weeks. The worker proposes, the human approves in Phase 0, the human approves by exception in Phase 1, the worker executes within a defined envelope in Phase 2. The 6D canvas is the artifact that gates production: a single page, six rows, no gaps.

In another, five digital workers operate the functions of a venture capital firm — investment pipeline, portfolio stewardship, capital raising, reporting, decision intelligence. Each worker owns an entire function, with its own sponsor, its own KPIs, its own progression criteria. The waves of implementation are sequenced so the first capability of each worker reaches Phase 0 before the next worker enters the build queue. The same canvas, filled five times.

Different functions, different firms, different scales. What stays constant is the harness shape.

Closing

Prompt engineering became harness engineering when we stopped pretending the model was the product. Harness engineering becomes something larger when we stop pretending the agent is the product.

The product is the function — the recurring outcome a human used to own and now a digital worker owns alongside a human. Engineering the wider harness is the work that makes that possible.

In some teams, this is already the daily job. The vocabulary is the part still catching up.

Daniel Braz is a CTO and practitioner-researcher focused on agentic software engineering, spec-driven development, and AI-assisted workflows.

The wider harness was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.