I Tried to Use My Own Product for Everything.

The same project, twice. The second build is the one that floated.

I Tried to Use My Own Product for Everything. I Had to Redesign Around Using It Where It Makes Sense.

A postmortem on three architectural wrong turns building a market intelligence agent, and the principle I should have started with.

I started Pulsar wanting to use the AI runtime I work on for as much as possible. I wanted to point at the project and say: look at what this thing can do. Every line of logic that could conceivably live in a pipeline, I tried to put in a pipeline. Scraping, reasoning, fan-out, orchestration. If an agent could touch it, I let an agent touch it.

That was the mistake. The post is about how I unwound it.

What Pulsar is

Pulsar is a market intelligence agent I built as a weekend dogfood project. It scrapes eight developer-facing sources twice a day, runs the content through pipelines for analysis, and produces a trend report plus content drafts for seven platforms. The repo is here.

The dogfooding goal was to answer four questions for myself. What’s it like building an agentic system using an AI runtime as the AI layer? Is it actually cheaper, faster, and easier to maintain than wiring it up directly? What kind of content can I get out of it? Can I automate enough of my own job to focus more on the parts I care about?

Three of those questions had clean answers. The one about cost and architecture took longer, because I had to walk through a few wrong turns first.

The first wrong turn: one agent, seven drafts, one LLM call

The original content pipeline was simple. A single agent collected context from the day’s articles, reasoned about it, and produced drafts for all seven platforms in one structured output. One reasoning pass, one big JSON envelope at the end, done.

It worked once. Then it stopped working.

The combined output across seven platform drafts exceeded max output tokens. The model truncated mid-response. The runner couldn’t parse the partial JSON. Retries hit the same wall because the prompt hadn’t changed and the model wasn’t going to magically produce shorter drafts. I had built a pipeline that was structurally guaranteed to fail as soon as the drafts got long enough to be useful.

Lesson one, easy to learn the hard way: if your output shape grows with the number of items you’re producing, you have a scaling problem hiding inside what looks like a working system.

If your output shape grows with the number of items you’re producing, you have a scaling problem hiding inside what looks like a working system.

The second wrong turn: fan-out across seven specialized drafters

The obvious fix was to break the single agent into smaller pieces. So I redesigned around fan-out.

The new architecture had three parts. One context gatherer at the top, with tools to pull articles and metadata. Seven specialized drafters running in parallel, each one prompted for its specific platform (LinkedIn voice, X thread structure, Hashnode long-form, and so on), with no tools, just the LLM and memory. One collector at the end that assembled the seven drafts into the envelope the runner expected.

This solved the token problem completely. Each drafter was producing output sized to one platform, well within limits. The collector’s job was small enough to be reliable. Parallel execution meant the wall-clock time was not much worse than the single-agent version had been on a good day.

It was elegant. It was the textbook agentic pattern. I was pretty pleased with myself.

Then I looked at the costs.

What the fan-out actually cost

Seven copies of the same reasoning, billed individually.

Here’s what I had built without quite seeing it. The context gatherer pulled in the day’s articles, ran tool calls, and assembled a substantial context payload. That payload then got passed into seven separate agent contexts. Each drafter loaded the full context, ran its own reasoning loop over that context, and produced its draft.

I was paying for the same input tokens seven times. I was paying for seven separate reasoning passes that were doing roughly the same work, just steering toward slightly different outputs at the end. The context never shrank inside any of the drafters. They each had to think their way through the entire payload from scratch before they could write anything platform-specific.

The elegant pattern was burning money on duplicated reasoning over duplicated context.

Multiply by n where n is the fan-out factor. The cost wasn’t seven times the original because the drafters were smaller than the original monolithic agent, but it was several times higher than it needed to be, and the gap was structural, not something I could prompt-engineer my way out of.

The elegant pattern was burning money on duplicated reasoning over duplicated context.

The third design: single-agent orchestration with sequential drafts

The fix was to step back toward a single agent, but not all the way back. The single-agent version had failed because it tried to produce all seven drafts as one structured output. The fan-out version failed because it duplicated context and reasoning across seven parallel agents.

The version I landed on splits the difference. One agent loads context once, reasons about it once, and then produces the seven drafts as a sequence inside the same reasoning loop. Each draft is its own output step in the agent’s plan, written one at a time. The context is loaded and reasoned over exactly once. The drafts come out one after the other, each sized to its own platform, never colliding with the output token ceiling.

Fan-out is the right answer when each branch genuinely needs independent reasoning, and the wrong answer when the branches are doing the same work over the same context with slightly different output instructions.

This is the pattern I wish I’d started with. One agent, one reasoning pass over the shared context, sequential outputs. It costs roughly what the original single-agent design cost on a successful run, but it doesn’t fail at scale because no single output has to carry all seven drafts.

The lesson buried in the arc is not “fan-out is bad.” Fan-out is the right answer when each branch genuinely needs independent reasoning over distinct inputs. It’s the wrong answer when the branches are doing the same work over the same context with slightly different output instructions. In that case, you don’t need parallelism. You need one agent that produces multiple outputs in sequence.

I had reached for a parallelism pattern because the work looked parallel. It wasn’t. It was a single reasoning task with seven different shaped outputs, and the right design was one agent that knew how to produce all seven.

The fourth thing I got wrong: scraping as a pipeline

Both routes get you there. Only one of them is paying an LLM to read maps.

While I was unwinding the content fan-out, I went back and looked at the rest of the system with fresh eyes. The other thing that jumped out was scraping.

In the original design, scraping was its own pipeline, orchestrated by an agent with Firecrawl, HTTP, and GitHub tools. The agent would receive a list of sources, decide how to fetch each one, handle retries, parse responses, and write the results. It worked. It was also slow, expensive, and slightly unpredictable from run to run because the agent’s tool-use plan wasn’t identical every time.

I had given an LLM a job that was screaming to be a TypeScript module.

None of that work needed to be agentic. Scraping a known list of sources is a deterministic problem. I knew which sources to hit, I knew their shapes, I knew how to handle their failure modes. There was no fuzzy reasoning required. I had given an LLM a job that was screaming to be a TypeScript module.

I pulled scraping out of the pipelines entirely and moved it into the application layer. The pipelines now receive pre-fetched, pre-cleaned article content as input. The runtime stopped paying for an agent to do work that a function could do better, faster, and more predictably.

The trend report pipeline went from roughly twenty minutes per run to about eight. The drop came almost entirely from removing LLM calls and tool invocations that weren’t earning their keep.

The principle that landed

By the time I was done unwinding the wrong turns, the system looked smaller. Fewer agents, less surface area, more code in the application layer doing deterministic work, and a tighter set of pipelines doing the things that actually require non-deterministic reasoning.

Here’s the principle, stated plainly. Prepare deterministically, hand over the fence to the AI layer only what actually needs non-determinism.

The AI layer earns its keep on the irreducibly fuzzy parts: synthesizing across documents, generating prose that has to feel different on different platforms, deciding which trend matters when the signal is ambiguous. It does not earn its keep on glue work. Scraping a known URL is not fuzzy. Calling an API is not fuzzy. Parsing a JSON response is not fuzzy. Putting an LLM in those paths adds cost and unpredictability without adding capability.

The default should be the application layer, and the AI layer is for the parts where the application layer can’t do the job.

The corollary, which is the part I missed for too long: every line of code that lives inside a pipeline is also a liability. It has to be maintained over time, it has to keep working as models change, it has to be debugged when an agent does something weird. Code in your application layer can be unit tested, type checked, and reasoned about with normal tools. Code inside a pipeline is harder to inspect, harder to test, and more expensive to run. The default should be application layer. The AI layer is for the parts where the application layer can’t do the job.

What I’d do differently if I started Pulsar over today

I came at this project as a stakeholder trying to push a product. I wanted to maximize the surface area of the AI runtime in the architecture, because then I could point at the project and say “look at what this can do.” That framing led me into every wrong turn in this post. I overspent on the original content pipeline. I overspent on fan-out. I overspent on scraping. Each one of those mistakes was downstream of the same root error.

The framing I should have started with: this is an engineering problem, the AI runtime is one of the tools available to solve it, and the question is where in the architecture the runtime actually belongs.

Stated that way, the answers fall out naturally. The runtime belongs in the parts where the work is irreducibly fuzzy and the value of getting it right is high. It does not belong in the deterministic plumbing. If I’d started there, I would have built half as much pipeline, paid a fraction of the LLM cost during development, and shipped the project faster.

This is the part I wish someone had told me before I started, so I’m telling you now. Dogfooding does not mean stuffing everything into the layer you’re trying to dogfood. It means using that layer where it earns its place and being honest about where it doesn’t. The first version of that honesty was painful for me because I had to rip out work I’d done. The second version was useful because I had a system that was cheaper, faster, and more maintainable than the one I’d started with.

The AI runtime I’ve been talking about throughout this post is RocketRide. It’s open-source, MIT licensed, donated by Aparavi to the Linux Foundation and the Agentic AI Foundation, and the cloud platform launched this month. I’m a founding team member and one of the engineers on the runtime.

I held the name back because the lesson in this post applies to anyone integrating an AI execution layer into a broader stack, regardless of which runtime you’re using. The trap of trying to put everything into the AI layer is not specific to RocketRide. It’s specific to the moment we’re all in, where the AI layer is new and exciting and we want to use it as much as we can. The discipline of using it only where it earns its place is what separates systems that scale from systems that quietly burn money on LLM calls that should have been functions.

If you want to see the architecture I landed on, Pulsar is open source and the pipeline definitions are in the repo. If you want to look at the runtime itself, RocketRide is too.

I’m writing more about Pulsar in the coming weeks, including a post on the graph database architecture I’m using for trend analysis and a deeper look at how this project ended up reshaping our company’s content, onboarding, finances and roadmap. Follow along if you want the rest of the story.

I Tried to Use My Own Product for Everything. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

I Tried to Use My Own Product for Everything.