Real people, plausible settings, and a protocol that refuses to skip the useful steps
If you want an AI to think with you instead of flattering you, give it a room with friction. In The Dialectic Prompt, I argued for a debate, which consists of real and named figures whose positions tension against one another in good faith, then anchored that debate in a plausible situation so the model stays in their registers instead of collapsing into generic advice. Not a persona. Not a system prompt with three bullet points. Do not ask it to act as a senior engineer. Use real people with known stances who can challenge your assumptions. A place they could plausibly be standing. A Moderator who refuses to skip to the answer. A synthesis that names what was traded away.

I used it, and the rough edges showed up quickly: places the model skipped steps, maintenance pain from one giant inlined blob, too much protocol copied and pasted by hand. So I refined it, made it composable, and made new rooms authorable without duplicating the whole thing each time. The sequel was never going to be a bigger BASHES. It was going to be a system: a family of casts, each one a different group of real figures in a different plausible setting, plus shared protocol, bounded guest drafting, and packaging that lets those constraints show up where the agent actually decides.
That is Agent Brain Trust: it takes the dialectic prompt and turns it into a reusable system, a standing set of expert collectives for architecture, codebase tactics, design patterns, prompt engineering, product, organisation design, UX, visual communication, technical writing, and science explanation. Each one is a distinct cast. Each one knows what it will and will not do. Guest drafting is bound to a real roster. The rooms ship as skills and plugins instead of living as one heroic prompt. All of them are reachable from your editor by stating the problem in ordinary language.
What it gives you that a single chat does not
Open a chat. Describe a mess. By the time you finish typing, a chair has named who is about to speak and who is going to push back. Each named guest states, in their own voice, in one sentence, what they think you are actually asking. You confirm. The chair proposes how aggressively to dig and what shape of answer would count as success. You confirm. The room partitions itself into smaller groups, so each one carries real tension, the same tradeoff argued from more than one side; the chair names those axes and shows you how the groups are drawn. Then they debate. Then they reconcile. The reconciliation does not just tell you the answer. It tells you what point of view was sacrificed to reach it, and why.
If you have ever sat in a good design review, this should feel familiar. If you have ever sat in a polite one, this should feel like a relief.
That relief is not magic. It comes from a few constraints that make common failure modes harder to hide: skipped steps, invented authority, and consensus that arrives before disagreement has done its work. This relief is the result of a system that treats friction as a feature rather than a bug.
Try it before reading on
You do not need to understand the protocol to feel the difference, and you really should feel it before reading the rest of this. Pick the path that matches your client.
Cursor or Claude Code: download agent-brain-trust-cursor-plugin.zip or agent-brain-trust-claude-plugin.zip from Releases and install it. From a clone, npm run install:cursor-plugin or npm run install:claude-plugin does the same. The important part is not convenience. It is that once the rooms are installed as skills, they become available in the agent's normal decision surface: attach organically when your description matches, or be called on demand when you already know which room you want.
Then, in a new chat, describe an actual problem. Do not slash command anything yet; the rooms are designed to be attached by description, not summoned by name. Try one of these unedited:
Run a workshop for this idea: Real time whiteboard, ~50 concurrent editors. CRDTs, or OT plus row locks? Nobody will pin down partition behaviour or who repairs after a split brain.
Run a workshop for this idea: Pricing engine greenfield started as plain functions; now there is a Visitor per discount type. Is this pattern stacking, or is it still earning its keep? Our domain expert cannot read the graph.
Critique this graphic: Board pack: funnel graphic trial→paid with no time axis. Is it implying a causal story the data does not support? The CFO narrates it as “levers,” but marketing says awareness is not represented.
Run an editorial pass on this draft: RFC 042: three pages of history before the decision. Why did security still open five threads the “Background” was supposed to preempt? Focus on structure, not wording polish.
The first should attach bt-software-systems-workshop. The second bt-design-patterns-workshop. The third bt-visual-communication-critique. The fourth bt-technical-writing-editorial. You will see the chair convene, draft an Expert Witness and a Designated Challenger, run the Readings round, and ask you to confirm a Grounding Statement before anyone gets to argue. Watch what they push back on, what the room refuses to skip, and what has to be traded away to get to a coherent answer. That is the product.
The rest of this piece is for after you’ve felt that.
The pattern, in four insights
The previous post argued for the dialectic itself; this one is about how to make it stick as a reusable system instead of a one-off prompt. The brain trust has been in real use for a month, not years, and it is unapologetically a first working version of the pattern. Four insights are doing most of the work. Three of these are general. One is a deliberate excess that is easier to dial back than to wish I had added.
Real people in plausible situations. This is the load-bearing one and the easiest to misread. The room, such as a Strange Loop hallway or a technical periodical conference, is not the point. The room is the plausible setting that licenses the casting. What that setting actually does is steer the model into the right grooves so that placing real, named figures inside it produces responses that sound like those figures rather than a generic chatbot doing accents. The people carry the weight. The setting is what stops the prompt from reading like fan fiction. Get this right and much of the rest follows for free. The outlier on each cast, Escher in software systems, Adams in technical writing, Lanier in organisation design, applies the same idea at the boundary: people in a real room are not all from the same camp.
Turn-taking as protocol. Every named figure is told, explicitly, when they speak and what they are required to do when their turn comes. Readings, Inquiry, Value Constraints, Trajectory, Tension Axes, Cohort Construction, Position, Rebuttal, Refine. This version takes turn-taking to an extreme, and I am happy to admit that. The honest tradeoff is simple: it is easier to dial back a protocol that forces continuation than to wish you had ever been given the option. Models, especially the polite ones, will skip contestable steps that are not named. So I name them.
The dialectic. I argued for it properly in the previous post, so I will not relitigate it here. What is new is the concrete machinery: cohorts that straddle a tension axis, so the debate runs inside each cohort, not only across them; a Designated Challenger whose job is to pressure the emerging consensus; and a synthesis whose job is to name what was traded away. The dialectic is the move. This version makes it structural instead of ambient.
Nothing useful is permitted to be skipped. Two roles, Expert Witness and Designated Challenger, are mandatory and must be drafted before anyone speaks substantively. One distinct cohort guest per cohort. No shared guests. No skipped slots. And critically, those drafts cannot be improvised: draft-experts resolves the request through an MCP-served taxonomy of real persona cards, so the agent picks names that exist in the bundle, rather than producing plausible-sounding ones it cannot reason about. The bounded roster is not there for flavour. It is there so the system cannot silently invent expertise and call it judgment. More importantly, the guest does not have to come from the standing room's home domain. The writing room can draft an agent systems witness; the architecture room can draft a language or delivery expert; the product room can draft someone from finance or operations if that is where the problem actually lives.
This is not just procedural fussiness. It is how the rooms escape their own unavoidable baggage. Every standing cast comes with priors baked in. The software systems room will over-index on architecture. The design patterns room will naturally see patterns. The writing room will care about prose even when the underlying problem is actually missing domain knowledge. Guest slots are the correction mechanism. An Expert Witness brings in the niche knowledge that the standing cast does not have, even when that knowledge sits outside the room’s nominal domain. A Designated Challenger pressures the room exactly where its consensus would otherwise get too comfortable. Cohort guests let each side import the missing edge it needs for a fair fight. In each case, the mechanism matters because it closes off a failure mode: missing domain depth, premature consensus, or a debate where one side never gets the expert pressure it needs.
The no skip rule and the bounded index are halves of the same insight. Enforcement requires a real list to enforce against; without it, “draft a guest before they speak” decays into “invent a name and proceed.”
That is the pattern. Cast real people, set them somewhere they could plausibly be, name the turns, hold the tension of a real debate, and refuse to let any useful step be skipped quietly. Most of the file machinery downstream exists, so each new cast inherits all four for free.
The trusts that ship
The trusts are organised into two distinct profiles. The technical workshops focus on architecture, patterns, and tactics for convergent and defensible decisions. The editorial rooms provide a critique panel to improve the clarity and integrity of a piece of work without overrunning the writer’s intent.
These panels cover domains ranging from software systems and organisation design to visual communication and technical writing. Each room is a standing cast with specific priors, supplemented by drafted guests to correct for the biases of the room. A full directory of the ten available trusts and their specific constraints is provided in Appendix B.
The architecture, briefly
The repo is a monorepo. That choice is not aesthetic. It is how I stop prompt drift. The interesting half is content/: thin skill files, the rosters and framings you saw above; shared protocol fragments, one readings rule, one synthesis rule, one persona fidelity rule, used by every room; persona cards for around eighty named figures; and a topic to expert taxonomy used by draft-experts. The other half is the build that turns all that content into installable artefacts: per skill zips, full Cursor and Claude Code plugin zips, and a published MCP server.
The point of the layered design is that adding a new room is the size of a YAML stanza and a paragraph of framing, with no copy-pasted protocol body, no forked synthesis rule, and no risk of one room quietly drifting from the others on something as load-bearing as “you must name what you traded away.” One repo, one source of truth, many generated artefacts. That is the whole trick.
Plugins
Cursor and Claude Code use the same content, packaged differently. Both plugin zips ship a generated .mcp.json that runs the MCP server via npx -y @bahulneel/[email protected] at the same version that built the plugin, so the server cannot drift from the skills. There are three install paths: signed Releases, workflow artefacts, or npm run install:* from a clone. Consuming any of them does not require Node.
This matters because the plugin is not just a wrapper around prompts. It puts the rooms where the agent already makes choices. That means the workshops can fire organically, because the user described a mess that clearly matches one, and they can fire on demand, because the user asked for a specific room by name. The same install gives you both modes: ambient attachment when you want the tool to notice the shape of the problem, explicit invocation when you already know the shape yourself. More importantly, it moves the protocol out of rhetoric and into the agent’s operating surface, where “do not skip this step” can attach to real tool use and bounded retrieval instead of remaining a noble suggestion.
That is also why expert-opinion matters. Not every problem deserves a meeting. Sometimes the right product is a single drafted expert, selected through the same bounded roster, answering in one voice with no panel overhead. The plugin makes it a natural sibling to the rooms rather than a separate trick.
The MCP server on its own is not the product; it is the bounded index that makes the no skip rule above enforceable. When a panel says “draft a guest before they speak,” or when expert-opinion says "pick the one best expert for this question," the agent reaches into a real taxonomy of real persona cards rather than improvising. The plugin is what does the asking. The server is what stops the asking from returning fiction. The trust boundary here is not "the model" or "the room." It is the combination of visible protocol stages, human confirmation checkpoints, and bounded expert selection.
Testing
The point of testing here is not “do my functions return the right values?” It is “do the rooms behave like rooms when a real client talks to them?” The project is a month old, and the test bed is correspondingly young. The most useful piece is npm run test:mcp-drafting: it spins up the MCP server, hands the model a realistic task, and asserts that draft-experts resolves the right topics and returns expert IDs that exist in the bundle. It is a check that the no skip rule survives contact with a real client. Beneath it, structural tests refuse to ship a build if any expert reference is dangling or any persona has fallen off the taxonomy. The frontier I have not crossed yet, and the one worth doing next, is testing whether the behaviour holds under provider drift: does the cast still produce a synthesis that names a sacrifice, or has the model started skipping the move when no one is watching? That is the test I want and do not yet have.
Why this version is worth shipping
The original article gave you one room and a protocol you could copy. That was the right shape for a manifesto. It is the wrong shape for actual use, because you cannot have one room for everything. UX critique should not run cohorts the way an architecture workshop does. A design patterns debate should not be chaired the way a scientific explanation gets edited. A product strategy session needs people in the room who would not be caught dead at the architecture one.
Agent Brain Trust is the version where each problem can summon the room it deserves, where the rooms share the few moves that make the dialectic worth doing, and where the answer is better not because it sounds more authoritative, but because certain kinds of failure are harder to hide: invented authority, skipped contestable steps, and consensus that arrives before the argument has done its work. It does not guarantee correctness. It gives you a better review surface. That last part is still the product.
The original piece said: " Build the room. This one says: cast real people into plausible settings, name the turns so nothing useful gets skipped, keep several casts on call, and demand a synthesis that names the sacrifice. The messier the question is, the more valuable it becomes.
A standard system prompt can sound wise, but a real room can tell you what it traded away to reach the truth.
Appendix A: A trust in action on GPT-5.4
Here is a condensed example of the technical writing room reviewing an early draft of this article in Cursor on GPT-5.4. The full transcript was longer and more procedural. This version keeps my prompts verbatim, compresses the room’s prose hard, and preserves the actual progression of the session.
1. Convene the room
Me: /bt-technical-writing-editorial @drafts/2026-04-19-from-the-dialectic-prompt-to-agent-brain-trust.md
Before it said anything substantive about the prose, the room refused to skip setup. It drafted an Expert Witness and a Designated Challenger first:
- Lilian Weng for agent systems rigour: are the claims about bounded personas, protocol, tools, and control actually true as system design?
- Ethan Mollick for accountability and adoption pressure: is the piece overstating what the system buys, and does it say enough about when to trust or verify it?
That alone is the pattern at work. The technical writing room did not solve its uncertainty by drafting more technical writers. It reached into the world of the article itself: agent systems, trust, adoption, and accountability. The room does not begin with line edits. It begins by deciding what kinds of intelligence it is missing before it speaks with confidence.
2. Read the draft before arguing about it
Me: ok
The first round was ten one-sentence readings: eight standing members, plus Weng and Mollick. The room broadly agreed that the draft was already late stage and publishable in spirit, but that it was still carrying too much of its claim through tone and repetition rather than through a small set of explicit distinctions.
The recurring notes were consistent:
- The essay needed to say earlier that Agent Brain Trust was a systemised sequel to the dialectic prompt, not just a bigger prompt
- Several sections were arguing the same promise again, with fresh phrasing
- The trust boundary was still blurry
- The mechanism was present, but some of it was still being asserted rather than demonstrated
3. Set the editorial contract
Me: ok
The first confirmation accepted the value constraint: make this the most publishable version, with documentation utility as a secondary goal.
Me: ok
The second accepted the grounding statement: treat the piece as a late-stage technical essay explaining a bounded system of rooms, protocol, guest drafting, and plugin delivery.
Me: ok
The third accepted the trajectory: explanatory editing first, compression second.
That sequence matters because it stops the session from quietly mutating into a different job. The room was not there to invent a new article. It was there to make this one clearer and more trustworthy.
4. Split the room along live tensions
Me: ok
The room then divided itself into two smaller arguments:
- Cohort A: mechanism, reader load, explanation
- Cohort B: stakes, framing, memorability
It drafted Denny Zhou into the first cohort to test whether the protocol claims reflected real scaffold rather than ritual, and Marty Cagan into the second to pressure the reader contract and product value.
The tension axes were exactly the ones I wanted the article to survive: mechanism versus felt effect, plainness versus vividness, product explanation versus larger idea, and operational distinction versus literary momentum.
5. Ask the witness what would make the claims real
Me: ok
This was the most useful phase. Weng pushed three clarifications that made it directly into the revision:
- Keep prompt rhetoric, orchestration policy, and bounded external resources separate
- Frame the bounded roster as an auditability and authority constraint, not just a craft improvement
- Stop saying the room is “better” in the abstract and say which failure modes it reduces: skipped steps, invented experts, vague consensus, and uninspected synthesis
Her cleanest line was the one I ended up using as a test for the rewrite: architecture lands better when it can be restated as “this mechanism prevents this failure.”
6. Let the room converge
Me: ok continue to the end
Once the debate collapsed into synthesis, the most influential cohort argued for a simpler governing claim:
Agent Brain Trust is not better because it sounds like a room of experts; it is better, when it is better, because it constrains failure in visible ways.
That final synthesis produced the revision checklist I actually used:
- State the dialectic prompt to the reusable system transition much earlier
- Translate major claims into failure prevention language
- Compress repeated advocacy across the middle sections
- name the trust boundary more plainly
- Add one sentence of humility: this does not guarantee correctness; it gives a better review surface
- keep the live lines about friction, skipped steps, and naming the sacrifice
The point of including this appendix is not to pretend the transcript itself is magical. It is to show the trust behaving like a trust: the room convened, named its missing expertise, fixed the editorial contract, split along real tensions, pulled on the weak claims, and converged on a sharper draft without flattening the voice that made the piece worth writing in the first place.
Appendix B: Directory of the trusts
Ten panels ship, plus two utility skills. They split into two profiles: technical workshops, working sessions that want to converge on a defensible decision, and editorial rooms, critique panels that improve a piece of work without overrunning the writer’s chosen stopping point. Same protocol skeleton, different temperament. Each room knows what it is not for, which is worth knowing before you reach for one.
The technical workshops
bt-software-systems-workshop: architecture and paradigms The descendant of BASHES, including Byrd, Alvaro, Sussman, Hickey, Steele, and Escher. Designed for architecture, paradigms, abstractions, DSLs, and distributed systems trade-offs. Focuses on partition behaviour and termination arguments rather than tooling preferences. It will not recommend a specific stack.
bt-codebase-tactical-planning: refactors and migrations. Includes Fowler, Beck, Parnas, Henney, Armstrong, and Hickey. Distinct from the architecture workshop on purpose: this room reads the code you have, not the system you wish you had. For refactors, migrations, subsystem extractions, and the smallest safe sequence of changes. Polyglot and tactical, with language-specific depth supplied through Expert Witnesses rather than the standing cast. It will not redesign your architecture for you.
bt-design-patterns-workshop: forces and consequences. Includes Gamma, Helm, Johnson, Vlissides, Fowler, and Hickey. Four members of the Gang of Four sit alongside Hickey as the outlier. The room exists to surface forces and consequences, not to recommend a pattern. Hickey is there to ask whether you needed the pattern at all.
bt-prompt-engineering-trust: agents in production. An unusually small room of five voices: Weng, Zhou, Karpathy, Ng, and Mollick. Built for the specific question of how instructions and skills shape what agents do in production: interfaces, failure modes, evaluation, and who carries the risk when outputs ship. It refuses to debate clever phrasings in isolation.
bt-product-strategy-workshop: bets and roadmaps. Includes Cagan, Torres, Perri, Christensen, Porter, and Hansson. Pressure tests problem framing, evidence, bets, and roadmaps, and is built specifically to ask what you are not funding. Hansson is the outlier and will, on cue, ask whether the problem the others are seriously analysing is a problem at all.
bt-organisation-design-workshop: teams and coordination. Includes Drucker, Mintzberg, Kim Scott, Lencioni, McConnell, and Lanier. For team shape, authority, coordination, and incentives. Lanier is in the outlier seat for a reason: he is there to ask whether the organisation you are designing is one a person could live inside.
The editorial rooms
bt-frontend-ux-critique: flows and usability. The first editorial room, including Zhuo, Norman, Nielsen, Cooper, Sierra, and Spiekermann. For flows, prototypes, layout, navigation, and usability evidence. The room will not give you “try shorter copy” advice; it will argue about whether the sequence matches the job the user is actually trying to do.
bt-visual-communication-critique: charts and integrity. Includes Tufte, Escher, Spiekermann, Gleick, Sierra, and Feynman. For charts, slides, diagrams, and infographics. Tufte is the chair you would expect; Escher is the outlier. The room is explicitly about integrity rather than taste: does the graphic imply a story the data does not support?
bt-technical-writing-editorial: docs and posts. The largest room, deliberately, including Knuth, Kernighan, Kidder, Gleick, Sierra, Adams, Fowler, and Feynman. For docs, READMEs, RFCs, and posts. Adams is the outlier and will demolish any ceremony. The room respects the writer’s chosen stopping point and will not push for “publish” if you came in for “rev 6.”
bt-science-explanation-editorial: pitch and honesty. Includes Feynman, Gleick, Pinker, Kernighan, Knuth, and Adams. For explaining a hard idea at the right level without dumbing it down. Built around the question “is this draft actually teaching what it claims?”
The utilities
expert-opinion: one voice, no meeting. The one-shot version is used when you do not want a meeting. It routes through draft-experts, picks the single best fit, loads only that persona, and answers in that one voice. It still carries the Brain Trust discipline, but without the ceremony of a chaired room. The right move when you want a sharpened individual take rather than a workshop: the VP of Engineering does not want six people arguing in public; she wants a one-page memo from one strong specialist.
draft-experts: the bounded router. Not a room at all, a router. Given topics or a task, it returns a checkable mapping to taxonomy leaves and the experts who actually exist in the bundle. Most users will never call it directly: the rooms call it on themselves to draft Expert Witnesses, Designated Challengers, and cohort guests against a real index instead of inventing names.
A note on guest slots
Those guest slots matter more than they may look at first glance. They are not decorative extensions to a fixed panel. They are how a room compensates for the blind spots that come with being a room at all. A standing cast gives you consistency, tone, and reusable tension. Guest drafting gives you the niche correction that stops consistency from turning into dogma. That is a substantial improvement over the original one-room pattern: the cast no longer has to pretend it contains the whole world of the problem before it begins.
If you are reading these and noticing absences (no security room, no data room, no SRE room), you are reading them correctly. The next ones I plan to add are exactly those.
From the Dialectic Prompt to Agent Brain Trust was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.