How I Built a Multi-Agent AI Inventory Pipeline with MCP, and Ontology — VIP Preemption as YAML…

How I Built a Multi-Agent AI Inventory Pipeline with MCP, and Ontology — VIP Preemption as YAML, Agent Reorders the Stock Itself

Policy-Driven Ontology Series · Case 3 — autonomous reorder and a manager briefing on the exception.

[ Business Case 3 in four panels: limited stock vs many orders, tier decides priority, beyond rules the agent decides, inventory strategy as one yaml]

The whole Case 3 thesis in one frame. A limited stock pool, four orders competing for it, a deterministic rule that lets VIP preempt — and then the part rules can’t reach: when demand still can’t be filled, an Inventory Agent detects the gap, decides how much to reorder, and writes the manager a briefing. Normal flow is determinism; the exception is judgment.

Where I Left Off

In Business Case 2, one “Closed Won” click fanned out to four agents through a single YAML rule. The last thing that case did was create an Odoo Sales Order with storable line items — physical inventory, not just software licenses. The moment a storable SO confirms, Odoo auto-generates a Delivery Order, and that Delivery Order is where Case 3 begins.

Case 1 was the front door (who is this customer). Case 2 was the sales spine (quote to order). Business Case 3 is fulfillment — the order is signed, now who actually gets the stock when there isn’t enough of it.

And that question splits cleanly into two regimes, which is the entire point of this article.

Two Regimes: Rules, Then Judgment

Most inventory contention is boring and deterministic. A VIP order and a Standard order both want the same SKU, stock is short, the company policy says VIP first. That’s a rule. No model needs to “decide” anything — you encode the priority once and the engine enforces it forever. In my system that path makes zero LLM calls.

The interesting regime is the one rules can’t reach. A VIP order comes in, stock is zero, and there is nothing to preempt — no Standard reservation to claw back, no incoming shipment to borrow against. No deterministic rule can fill this order, because the thing the order needs doesn’t exist yet. Something has to judge: how much should we reorder? Just this gap, or a safety buffer? How urgent is it given which customers are blocked? And then someone has to tell a human, in a way they can act on.

That second regime is where I let an agent think. Everywhere else, the YAML decides. That split has a cost side I’ll come back to near the end: the deterministic regime makes no model calls at all, so the token bill ends up tracking judgments, not operations.

Architecture: Who Owns “Is There a Shortage?”

[Architecture: INPUT (Stock Received / Inventory Shortage) → OOSDK Core Engine (3 inventory entities, deterministic rules) → 3-Tier Memory · Execution layer with the Inventory Agent as the new card]

The Business Case 2 engine, extended for inventory. New entity types (stock_received, inventory_shortage_detected), the VIP-preemption and replenish-on-receipt rules, and one new execution card — the Inventory Agent — that owns detection, judgment, and the Odoo(ERP) writes. The core engine still makes zero LLM calls; the only model calls live inside the agent’s two judgment steps.

The design decision I spent the most time on wasn’t a rule. It was a boundary question: where does “is there a shortage, and how big” live?

My first version computed the shortage in the MCP trigger (the entry point), then injected the number into the rule condition — if shortage.unmet_qty > 0. It worked. But it quietly broke the layering the whole series is built on. The ontology is supposed to hold strategy — which agent fires, with what policy. The moment a YAML rule condition depends on a multi-table inventory aggregation, the ontology is no longer pure strategy; it's coupled to a specific pre-computation pipeline. Every other rule in the system gates on a plain event fact (receipt.qty > 0); only this one needed a heavy derived value.

So I moved it. The shortage calculation is a data-access helper, but the decision to gather it now belongs to the Inventory Agent, exactly like the sibling allocation actions already gather their own move queues. The rule gate collapsed back to a pure event:

# ontology/ontology.yaml — the gate is strategy, not math
inventory_replenish_on_shortage:
  priority: 385
  if: "entity == 'inventory_shortage_detected'"   # pure event — no domain aggregation
  then:
    delegate_to:
      - agent: inventory_agent
        action: create_replenishment_po     # detect + judge qty + create receipt
      - agent: email_agent
        action: send_replenishment_alert     # LLM briefing → notify_to
      - agent: crm_agent
        action: log_interaction

The clean split that fell out: ontology = routing strategy, agent = domain reasoning (detect → judge → act), service = raw data access. The agent owns inventory end-to-end, which is exactly the story the narration tells — and now the code tells the same story.

The Rules

Three rules cover the inventory layer. Two are deterministic; one is the exception handler.

The priority numbers don’t compete here — each rule keys off a different entity — but they make the layering legible. The two deterministic rules are the “normal path.” The third is the only place a model touches an inventory decision.

The Deterministic Part: VIP Soft Preemption

When a VIP delivery is short and Standard customers are holding reservations that haven’t been picked yet, the policy is to take them back. “Soft” matters: I only reclaim moves Odoo has in the assigned state — reserved but not physically picked. Anything mid-pick or done is untouchable.

# mcp_server/agents/inventory_agent.py — allocate_with_preemption (excerpt)
# Only 'assigned' (reserved, not yet picked) Standard moves are reclaimable.
candidates = await asyncio.to_thread(
    odoo_service.list_open_moves_for_product, product_id, ["assigned"])
candidates = [c for c in candidates if not _same_self(c)]   # never reclaim my own SO

covered = 0.0
for cand in candidates:
    if covered >= shortage:
        break
    if await asyncio.to_thread(odoo_service.unreserve_move, cand["id"]):
        covered += float(cand.get("quantity") or 0)

# then reserve the freed stock onto the VIP picking
await asyncio.to_thread(odoo_service.reserve_move, vip_move_id)

There is no model here, and there shouldn’t be. “VIP wins, but never yank stock off a forklift mid-pick” is a sentence, and a sentence belongs in policy.

The four open orders in Odoo, with two custom fields I added so tier and unit count read at a glance — Customer A/B (Standard, 10 and 5), Customer C/D (VIP, 200 and 300). The “Tier” column is computed from the customer’s category tag; the quantity column sums the USB line. Both are non-stored computed fields, so the list reflects reality without a sync job.

After a 200-unit receipt and one Claude Desktop request, the VIP order (Customer C) is fully assigned 200 of 200; the Standard orders and the second VIP order sit in the waiting queue. VIP was placed later and still won — the tier rule decided, with no model in the loop.

The Judgment Part: Sizing a Reorder, Writing a Briefing

When preemption and re-allocation both run dry, create_replenishment_po runs. Because the agent now owns detection, it gathers the shortage itself if the caller didn't supply it:

# mcp_server/agents/inventory_agent.py — create_replenishment_po (excerpt)
shortage = context.get("shortage")
product_id = (shortage or {}).get("product_id") or context.get("product_id")
if not shortage and product_id:                       # the agent owns detection
    shortage = await asyncio.to_thread(
        odoo_service.get_open_demand_for_product, product_id)

unmet = float(shortage.get("unmet_qty") or 0)
if unmet <= 0:
    return {"action": "create_replenishment_po", "skipped": True,
            "reason": "no unmet demand"}

advice = await self._replenishment_qty_advisor(policy, shortage)   # ← judgment
po = await asyncio.to_thread(
    odoo_service.create_incoming_picking,
    product_id, advice["recommended_qty"], vendor_name, origin, confirm=True)

The judgment is one constrained LLM call. I feed it the facts — unmet demand, how many blocked orders are VIP, lead time, what’s already inbound — and ask for a quantity and an urgency. It runs at temperature 0, and every failure path falls back to a deterministic baseline (ceil(unmet + safety_buffer)), so a model outage downgrades the feature to a dumb-but-correct reorder rather than breaking the line:

# _replenishment_qty_advisor (excerpt) — judgment with a fail-safe floor
facts = {
    "product_name": shortage.get("product_name"),
    "total_unmet_demand": unmet,
    "vip_blocked_count": len(vip_blocked),
    "incoming_already_on_the_way": shortage.get("incoming"),
    "lead_time_days": policy.get("lead_time_days", 3),
}
try:
    raw = await asyncio.to_thread(
        generate_text_with_system, system_prompt=SYSTEM, user_prompt=json.dumps(facts),
        temperature=0.0, max_tokens=300)
    advice = self._parse_advisor_json(raw)          # {recommended_qty, urgency, rationale}
    if advice["recommended_qty"] < unmet:           # never under-order the known gap
        advice["recommended_qty"] = math.ceil(unmet)
    advice["source"] = "llm"
except Exception:
    advice = {"recommended_qty": math.ceil(unmet + safety_buffer),
              "urgency": "HIGH" if vip_blocked else "MEDIUM", "source": "fallback_rule"}

The second judgment call is the briefing. The email agent doesn’t send a templated “stock low” alert — it gets the blocked orders, the impact, and the recommended quantity, and writes the manager a short decision-oriented note. If the model is unavailable, it falls back to a deterministic template. Either way a human gets something they can act on, not a row in a report nobody reads.

*The same Stock replenishment, seen from both sides: in Claude Desktop, and written straight into Odoo(ERP)*

*The same briefing, seen from both sides: in Claude Desktop, and the manager’s email lands in Gmail*

One request — “the orders that are still short, please reorder and email the manager.” The Inventory Agent detects 315 unmet units, the advisor recommends a reorder quantity with urgency and a one-line rationale, a receipt is created in Odoo, and the email agent’s briefing goes out to the operations manager. The two model calls are the qty and the prose; everything between them is deterministic.

The Dashboard Is the Audit Trail

Every one of these decisions lands in the same place I built for Cases 1 and 2 — a dashboard that reads the engine’s decision records, not a separate logging system.

The audit surface. Each row is one decision with its matched rule and full dispatch plan. The inventory events read as “Stock Received → VIP-first re-allocate” and “Unfulfillable → Autonomous Replenish.” I had to register the Inventory Agent as a first-class agent here — it’s the newest card on the status panel, and the trigger calls now attribute to it instead of showing up as “system.”

Why This Matters

The standard fear about LLMs in business processes is non-determinism on the critical path — a prompt drift quietly changing an outcome that should be a contract. Policy-driven dispatch answers that fear by drawing a hard line: the YAML decides which step runs, always; the model only runs inside a step, and only where there’s genuinely a judgment to make.

Business Case 3 is the cleanest illustration so far, because the line is visible. VIP preemption and stock re-allocation are rules — they make zero model calls and they’re identical on every run. Reordering against an unfillable gap is a judgment — it’s the one place a model adds something a lookup table can’t, and even there it’s boxed in by a deterministic floor and a fallback. The agent’s autonomy is real, but it’s bounded by exactly the cases where rules ran out.

That’s the version of “agentic” I trust in production: not an agent improvising the whole workflow, but an agent that handles the exception the workflow couldn’t pre-decide — and reports back to a human in the same breath.

The Token Bill — Why Splitting the Work Beats Loading Every Tool

There’s a fair objection to anything built on MCP: doesn’t wiring tools to a model just burn tokens? It isn’t hypothetical. Tool definitions are sent to the model on every request, used or not — so each connected server adds standing overhead to the context window, and a handful of busy servers can take up a meaningful share of it before the first user message is even processed. There’s a second, less obvious cost: as the tool menu grows, the model’s ability to pick the right tool tends to degrade. More tools, worse selection.

The mainstream fixes attack the symptom — too many schemas in the prompt. Prompt caching makes the stable tool block much cheaper to re-read on each call, and Anthropic’s Tool Search defers tool definitions and loads them on demand instead of holding them all upfront. Both are real and worth using.

Policy-driven dispatch attacks it from a different angle, and Case 3 is a clean example. The model on the client side never sees a sprawling tool catalog — it sees a handful of high-level triggers (“receive and allocate”, “check replenishment”). The decision of what to do — preempt, re-allocate, reorder, notify — doesn’t come from a model reasoning over a long menu; it comes from a deterministic engine reading YAML, server-side, with zero model calls. The two deterministic rules in this case make none. The model is invoked only at the two genuine judgments — how much to reorder, how to phrase the briefing — as small, bounded calls.

So token spend tracks the number of real decisions, not the number of operations or the number of tools you happen to have connected. A pipeline that runs many allocations and meets a few unfulfillable shortages pays for those few judgment calls — not a tool-selection round-trip on every step, and not a standing schema tax on every turn. The selection-accuracy problem doesn’t show up either, because the model was never handed a long menu to choose from.

I want to be precise, because the naive version of this claim is wrong: this does not make MCP “free.” Whatever tools you do expose still cost context (caching is what makes that bearable), and the judgment calls are still real LLM calls. The point isn’t zero tokens — it’s that the expensive habit, a model re-deciding a settled question over a large tool catalog on every step, is exactly what the ontology removes. Determinism for the part that’s policy; tokens spent only on the part that’s judgment.

Where This Goes Next

The order is signed and the stock is allocated. What’s left is getting paid.

EP.1 Customer Relationship Pipeline (released), EP.2 Tier-Driven Sales (released), EP.3 Inventory Allocation (you are here), EP.4+ Billing & Collections (coming next, fanning out into Invoicing · Collections · AR & Tax)

The Order-to-Cash lifecycle, one episode further. Acquisition and sales shipped; inventory allocation is this case; billing and collections are next, where the YAML learns net-30 vs net-14 by tier, and the agent handles the exception there too — the invoice no rule can reconcile.

Business Case 4 moves to billing — invoice generation on delivery, payment terms by tier, and an AR layer where, once again, the boring majority is deterministic and the interesting minority is where an agent gets to think. Same engine, same YAML, one more chapter of the company’s operating logic moving out of stored procedures and into a file the CFO can read.

That’s the whole project, really: the company’s operating rules become one file, and the only place a model is allowed to improvise is the exact spot where the file admits it doesn’t have the answer.

The full implementation is demonstrated on SunnyLab TV. Code referenced here is from a production-style MCP system; identifiers, internal emails, and host addresses are masked. The ontology engine and YAML are open-sourceable as part of OOSDK.

On the token-cost point: MCP tool-definition overhead, the tool-selection degradation with large tool sets, and the mitigations (prompt caching, Anthropic’s Tool Search) are well documented — see Anthropic’s “Advanced tool use” and Claude prompt-caching docs, and community write-ups on MCP context bloat.

Tags: Multi-Agent Systems, MCP, Inventory Management, YAML Policy, LLM Orchestration, Odoo, AI Architecture, Claude

How I Built a Multi-Agent AI Inventory Pipeline with MCP, and Ontology — VIP Preemption as YAML… was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

How I Built a Multi-Agent AI Inventory Pipeline with MCP, and Ontology — VIP Preemption as YAML…