Writing Documentation for Two Audiences: Developers and the LLMs They Use

Bryan Elee7 min read·Just now

Most developers don’t read documentation the way they used to.

They open a code assistant (Cursor, Copilot, Claude), describe what they’re trying to do, and get a synthesised answer. That answer came from somewhere. It came from your docs, your README, your Stack Overflow answers, or whoever wrote about your API first. Whether you intended it or not, your documentation is being consumed by LLMs and re-served to developers as generated explanations.

The question is whether your docs are structured to make that answer correct.

I had been thinking about this while building two technical documentation sites: Oracle Trust Models: A Bitcoin Perspective, a conceptual reference on trust and oracle design across Bitcoin and EVM systems, and a Chainlink Price Feed Consumer integration guide, a production reference for safely consuming Chainlink Data Feeds in Solidity. Both involved deliberate structural choices I wouldn’t have made if I was only thinking about a human reader skimming a sidebar. Throughout this article I’ll draw on specific examples from both sites to make the principles concrete. Here’s what I learned.

The core problem: docs are written for search, not retrieval

Traditional technical docs are optimised for search: you type a keyword, Google surfaces a page, you read it and find your answer. LLM retrieval works differently.

When a developer asks their code assistant “why does my Chainlink price feed return stale data,” the LLM doesn’t search. It retrieves the most semantically relevant chunks from its training data or context window and synthesises an answer from them. This process is called Retrieval-Augmented Generation (RAG). In RAG pipelines, documentation is split into chunks (typically 300 to 500 tokens each), stored as vectors in a database, and retrieved by semantic similarity at query time. What this means practically: if four concepts are interleaved on one page, a retrieval query for one of them may surface a chunk containing all four, and the LLM has to figure out which parts are relevant. Sometimes it does. Sometimes it conflates adjacent concepts because they appeared in the same chunk.

This changes what “good documentation” means in two concrete ways.

Page atomicity matters. A page that covers staleness, price sign validation, round completeness, and deviation guards in one wall of text is fine for a human reading top to bottom. For a RAG pipeline, it’s four different concepts competing in one chunk. The fix is to give each concept its own section with a heading that matches how developers actually ask about it.
“Staleness” is a noun phrase. “Why does my feed return outdated prices” is how a developer phrases the problem. Headings that are questions or failure descriptions retrieve better than headings that are feature names.

Cause-and-consequence explanations anchor technical facts. An LLM can memorise that updatedAt == 0 is a guard condition. What it needs to explain the concept correctly is the why: what happens without the guard, and what class of problem it prevents.

Here’s an example from a Chainlink price feed integration guide I wrote recently. The naive way to document this check is:

> *`updatedAt == 0` catches uninitialized rounds.*

That’s accurate but thin. An LLM trained on that sentence can tell you what the check does, but not why it exists. A stronger version:

> *An aggregator that has never successfully completed a round returns zero
> for `updatedAt`. Without this guard, the subtraction
> `block.timestamp - updatedAt` would wrap around to an enormous number and
> pass the staleness check incorrectly.*

Now the LLM has the mechanism, the failure mode, and the consequence. When a developer asks “why do I need to check for updatedAt == 0,” the answer it generates is grounded in something real, not just a rephrasing of the check.

Structure for discoverability: one concept, one page

The temptation when writing a conceptual reference is to cover everything in one place: one topic, comprehensive coverage, all the nuance in one page. That’s how most technical references are structured. It’s also the worst structure for RAG retrieval.

When a developer is troubleshooting a TWAP oracle manipulation issue, they don’t need three thousand words covering all oracle failure modes. They need the section on data manipulation specifically. If that section is interleaved with sections on staleness, node operator failures, and Bitcoin attestation mechanics, retrieval surfaces the wrong context alongside the right one.

The alternative is to split by concern. Rather than one “Failure Modes” page, each failure category gets its own section with a heading that names the context explicitly: “Failures in EVM Oracle Systems,” not just “Failure Modes.” A developer or an LLM querying for EVM-specific failure patterns retrieves the right section rather than the whole page.

The same principle applies at the site level. Separating staleness, the deviation guard, and the L2 sequencer uptime check into distinct pages means each page can be retrieved independently. Each page starts with the exploit or failure scenario that motivated the feature, before explaining the implementation. That ordering (problem first, implementation second) is not how API reference docs are typically structured, but integration guides are read differently. A developer integrating a Chainlink feed for the first time doesn’t know what a heartbeat interval is or why the sequencer grace period exists. They need context before code.

Structuring around failure scenarios also improves retrieval through a different mechanism. When Compound lost 89M USD to an outlier DAI price in 2021, that event became permanently associated in training data with the concept of “oracle deviation guard.” Leading the deviation guard section with that incident isn’t sensationalism. It’s a semantic anchor. A query for “how do I protect my lending protocol from oracle manipulation” will surface that section because it connects the real-world failure to the technical solution.

What “optimised for AI use” actually looks like in practice

Use failure-first headings at the section level. “Staleness and Latency” is a noun phrase. “Why your feed returns outdated prices during high volatility” is a failure description. The second retrieves better because it matches the shape of a developer’s actual query. A practical compromise: use the noun phrase in the nav sidebar (where humans scan) and the question form as the in-page heading (where RAG retrieves).

Separate the “what” from the “why” explicitly. Most technical docs conflate these. The what is the API surface: the function signature, the parameters, the return value. The why is the reasoning: why the parameter exists, what goes wrong without it, what tradeoff it represents. LLMs retrieve the what accurately. They generate better explanations when the why is documented alongside it rather than left for the reader to infer.

Name error conditions precisely and consistently. There is a temptation to group related checks under one umbrella concept. Three staleness-related conditions:

updatedAt == 0 (uninitialized round)
answeredInRound < roundId (incomplete round)
block.timestamp — updatedAt > stalenessThreshold (stale heartbeat)

These are three different bugs with three different causes. Documenting them as one concept called “staleness” means an LLM cannot distinguish between them when a developer asks why their specific check is failing. Name them separately. Explain each independently.

Add structured frontmatter. Documentation crawlers and RAG indexers parse page metadata before content. A description: field in your frontmatter that accurately summarises the page’s purpose improves retrieval precision significantly. Docusaurus and Astro Starlight both expose description, title, and custom metadata fields in page frontmatter. Use them:

---
title: Price Validation
description: How PriceFeedConsumer validates staleness, round completeness,
  and price sign. Covers updatedAt == 0, answeredInRound < roundId,
  and the heartbeat threshold check.
---

A description that names the specific conditions covered is retrievable in a way that “Price Validation” alone is not.

Keep code examples anchored to prose. A code block with no surrounding explanation is the least retrievable unit in a RAG chunk. The code needs the prose to have meaning. A code block preceded by the problem it solves and followed by what goes wrong without it is self-contained context that an LLM can use correctly even when it retrieves just that section in isolation.

The redundancy tradeoff

There is one genuine tension between optimising for LLM retrieval and writing for a human reading a multi-page guide linearly. RAG retrieval works best when each chunk is contextually self-contained. That means repeating context at the start of each section: restating which system is being discussed, what the parent concept is, what has already been established.

A human reading a ten-page guide finds this repetitive. They remember what they read on the previous page. An LLM retrieving a single chunk has no memory of the surrounding pages.

The practical resolution is to repeat context at the page level, not at the section level. Each page opens with one sentence that establishes where it sits in the larger topic. Sections within that page can assume the page-level context and avoid redundancy. This gives RAG retrieval a grounding sentence at the top of each retrieved chunk without making the guide feel like it restates everything on every page.

The test

Read the page heading and the first paragraph of any page in your docs. Without reading the rest, can you answer: what problem does this page solve, and who has that problem?

If yes, the page is probably structured correctly for both audiences. If it requires reading most of the page to answer, the structure is optimised for comprehensive coverage, not for retrieval.

The second test: ask an LLM with access to your docs the question a developer would most likely ask after landing on that page. If the answer is accurate and references specific content from your page rather than generic background knowledge, the page is working. If the answer drifts toward generalities, add more specific, concrete context: real error values, actual exploit amounts, named variables, specific configuration values. Specificity is what separates your docs from the background noise an LLM already knows.

Developers increasingly read documentation through an intermediary. Writing docs that work for LLMs is not separate from writing docs that work for humans. The same principles apply: be specific, lead with the problem, name things precisely, explain the why alongside the what.

The difference is that with an LLM in the loop, the cost of vague documentation is higher. A human reader can tolerate ambiguity and fill in gaps from context. An LLM retrieves the ambiguity and serves it back as a confident explanation.

Write for the developer reading your docs through a code assistant at 2am trying to figure out why their integration is broken. That’s both the human audience and the AI audience at once.