Start now →

Formal Verification of a Staking Oracle — Certora Prover Continued

By Chinhaotsou · Published May 12, 2026 · 19 min read · Source: Ethereum Tag
EthereumRegulationSecurity
Formal Verification of a Staking Oracle — Certora Prover Continued

Formal Verification of a Staking Oracle — Certora Prover Continued

ChinhaotsouChinhaotsou15 min read·Just now

--

From trusted price sources to economic security, buckets, rewards, and slashing

Press enter or click to view image in full size

From Whitelist Oracle to Staking Oracle

In the previous oracle verification post, I started with the simplest design in the Speedrun Ethereum Oracles challenge: a WhitelistOracle. That contract was mostly about structural correctness. Could the owner add and remove oracle contracts? Did the oracle list stay consistent after swap-and-pop? Did stale prices get filtered before calculating the median?

The next checkpoint is a much richer target: StakingOracle.

Here, correctness is no longer just “does the array contain the right oracle addresses?” The protocol now has economic state:

That means the verification target moved from a mostly trust-based oracle to an incentive-based oracle. The core question became:

Can the staking, reporting, reward, exit, and slashing lifecycle preserve the accounting and state-machine promises the protocol relies on?

This post walks through the Certora suite I built for StakingOracle, what it proves, what it does not prove, and how mutation testing shaped the final result.

The Protocol: StakingOracle

At a high level, the oracle works like this:

struct OracleNode {
uint256 stakedAmount;
uint256 lastReportedBucket;
uint256 reportCount;
uint256 claimedReportCount;
uint256 firstBucket;
bool active;
}

struct BlockBucket {
mapping(address => bool) slashedOffenses;
address[] reporters;
uint256[] prices;
uint256 medianPrice;
}

Nodes register by staking ORA:

function registerNode(uint256 amount) public;

They report prices into the current block bucket:

function reportPrice(uint256 price) public;

Someone later finalizes a past bucket by recording the median:

function recordBucketMedian(uint256 bucketNumber) public;

If a node’s price is more than 10% away from that median, it can be slashed:

function slashNode(
address nodeToSlash,
uint256 bucketNumber,
uint256 reportIndex,
uint256 nodeAddressesIndex
) public;

Nodes can also add stake, claim rewards, and exit:

function addStake(uint256 amount) public;
function claimReward() public;
function exitNode(uint256 index) public;

On paper, this is still a teaching challenge. But the design already has the ingredients that make real DeFi verification interesting: dynamic arrays, mappings, token escrow, delayed finalization, replay prevention, slashable offenses, and view functions whose answers depend on historical state.

What Could Go Wrong?

Before writing CVL, I wrote the failure modes in plain English. This step matters because Certora will happily prove the wrong thing if I ask the wrong question precisely.

For this oracle, the main questions were:

Those questions became five focused spec families:

I grouped these properties into their own StakingOracle suite so the assumptions, artifacts, mutation history, and modeling debt were easy to read as one campaign.

Why This Needed a Richer Model

The WhitelistOracle suite already needed structural reasoning over a dynamic array. StakingOracle adds more moving parts:

This is what makes the model richer than the WhitelistOracle model. The hard part is not contract size. The hard part is that correctness depends on relationships between multiple storage objects and multi-step lifecycle flows.

The suite uses a verification harness for read-only observability and a Certora-friendly ORA model so token flows can be checked directly instead of hidden behind broad summaries.

The important harness getters expose proof-only state such as:

This does not change production behavior. It gives the prover enough visibility to talk about the protocol’s internal state.

Part 1: Sanity

I started with reachability.

The Sanity family does not try to prove the deepest correctness properties. It asks whether important flows can succeed under reasonable setup:

This is a small but useful first gate. If these rules fail, the issue is usually setup, linking, token modeling, or an impossible precondition. There is no point debugging a beautiful accounting rule if the prover cannot even reach registerNode.

The final sanity run was green:

Sanity v2: verified

I did not run mutation testing for Sanity. That was a deliberate choice. Sanity is mostly a reachability family; mutation testing is more useful on semantic families like Accounting, StateMachine, ValidState, and Views.

Part 2: Valid State

The ValidState family checks the shape of the protocol’s storage.

The obvious invariant I wanted was:

Active nodes in nodeAddresses should agree with nodes[address].active.

But this is where formal verification becomes humbling. A property can be true in the reachable states we care about and still be awkward or non-inductive as a global invariant. The initial approach tried to force too much as standalone invariants. The final version uses a bounded preservation rule instead:

rule successfulCallsPreserveBoundedNodeAddressShape(
env e,
method f,
calldataarg args,
address trackedNode
)

That rule checks the node-list shape across successful calls within the configured loop bound. It covers the key structural expectations without pretending we have an unbounded proof over every possible array length.

The suite also keeps the bucket arrays aligned:

invariant bucketReportersAndPricesStayAligned(uint256 bucketNumber)

That invariant matters because bucket reports are represented as parallel arrays:

address[] reporters;
uint256[] prices;

If those arrays ever diverge in length, prices[i] no longer necessarily belongs to reporters[i]. That would be a serious slashing and outlier-detection problem.

The final ValidState run was green:

ValidState v7: verified

Mutation testing found an important lesson here.

The first ValidState mutation run looked disappointing: 10 of 10 mutants survived. If I only looked at that number, I might conclude that the ValidState rules were weak.

But after triage, the result meant something more specific. The random mutants did not mostly mutate the things ValidState was designed to protect. They landed on behavior that belonged to other families, such as reward math, reporting transitions, or token-flow assumptions. A node-list structural rule is not supposed to kill a reward-accounting mutant. So the result was low-signal rather than a clean indictment of the ValidState family.

That changed how I used mutation testing. Instead of asking “did this family kill random mutants anywhere in the contract?”, I asked a sharper question:

If I mutate the storage-shape promises that ValidState is responsible for, does the ValidState family catch it?

For example, one focused structural mutant removed the registration append:

nodes[msg.sender] = OracleNode({
stakedAmount: amount,
lastReportedBucket: 0,
reportCount: 0,
claimedReportCount: 0,
firstBucket: getCurrentBucketNumber(),
active: true
});
// Mutant: nodeAddresses.push(msg.sender) is missing

That mutant leaves the mapping saying “this node is active,” but the public node registry does not include the node. Accounting might still look fine, and a reward rule might not notice it. But ValidState should notice because its job is to keep the mapping and registry consistent. In the final suite, the structural rules kill this kind of mutant.

The improvement was adding a small structural layer to the spec:

definition listedInBoundedPrefix(address node) returns bool = ...;
definition boundedListedEntriesActiveAndNonzero() returns bool = ...;
definition boundedListedEntriesUnique() returns bool = ...;
definition boundedActiveNodeRepresented(address node) returns bool = ...;
rule successfulCallsPreserveBoundedNodeAddressShape(...)

That rule says: before a state-changing call, assume the bounded node list is well shaped; after any successful state-changing call, the bounded node list must still be well shaped. This is a preservation rule, not a free pass. It does not say “pretend the contract is correct.” It says “from a good structural state, no successful function call may break the structure.”

The suite also has a direct registration rule:

rule registerCreatesActiveListedNode(env e, uint256 amount)

That rule checks the base transition: after registerNode(amount) succeeds, the new node must be active and must appear at the newly appended registry index. So if the production contract forgot nodeAddresses.push(msg.sender), registration would not pass this rule.

In plain English, the combined ValidState layer checks:

So when a mutant removes nodeAddresses.push(msg.sender), the active node can no longer be found in the bounded registry and the rule fails. The same structural layer also catches removal bugs, such as forgetting to clear active or forgetting to pop the removed node from the array.

After creating focused structural mutants, the suite killed 4 of 4:

ValidState focused structural mutation: killed 4 / 4

Those focused mutants touched exactly the kind of behavior ValidState is supposed to protect: node-list append, node removal, active flags, and bucket reporter/price alignment.

Part 3: State Machine

The StateMachine family is about lifecycle rules.

The main properties are:

Two rules capture the heart of bucket progression:

rule registeredNodeReportsAtMostOncePerBucket(...)
rule reportingInLaterBucketRequiresPriorMedian(...)

The second rule is especially important. Without it, a node could keep reporting into later buckets while earlier reported buckets were never finalized. That would make the reward, inactivity, and slashing model much harder to reason about because reports would pile up without the protocol committing to their medians.

The final StateMachine run was green:

StateMachine v5: verified

Mutation testing again changed the suite. The first targeted mutation run killed only 3 of 10 mutants.

The killed mutants showed that some important rules were already working:

But five survivors were real StateMachine responsibilities. They were not random “maybe interesting” mutants anymore; they pointed at specific lifecycle promises that needed sharper rules:

The workflow was:

  1. Run mutation on the StateMachine family.
  2. Classify each survivor by responsibility.
  3. Ignore the survivors that belonged to Accounting or token-modeling.
  4. Copy the exact StateMachine-owned survivors into a manual mutation set.
  5. Add rules that express the missing lifecycle promises.
  6. Rerun only those exact survivors.

For example, the deleted-sort mutant was:

uint256[] memory priceArray = bucket.prices;
// Mutant: priceArray.sort() is missing
uint256 medianPrice = priceArray.getMedian();
bucket.medianPrice = medianPrice;

That survivor told me the suite needed to prove not just “recording a median can succeed,” but “recording a median stores the sorted median.” So I added a rule that seeds an intentionally unsorted three-price bucket:

highPrice, lowPrice, medianPrice

Then it calls recordBucketMedian(bucketNumber) and asserts:

assert getBucketMedian(bucketNumber) == medianPrice;

If sorting is removed, that rule fails because getMedian() is now applied to an unsorted array.

The slash survivors were handled the same way. I added a staged valid slash path and asserted that a valid slash:

That is what slashMarksOffenseAndReducesStake checks.

After those v5 rules were added, the exact manual survivor rerun killed all five:

StateMachine manual survivor mutation: killed 5 / 5

The remaining initial survivors were not really StateMachine gaps. They were Accounting-owned reward math or honest-token transfer-failure modeling boundaries.

This is one of the practical lessons from the campaign: mutation numbers are not meaningful until you classify the mutants by family and by threat model.

Part 4: Accounting

The Accounting family is the most important part of this suite.

For a staking oracle, accounting is where a harmless-looking bug can become a real exploit:

The final Accounting rules cover the core token-flow promises:

rule registerEscrowsExactStake(...)
rule addStakeIncreasesStoredStakeAndEscrow(...)
rule claimRewardMintsUnclaimedReports(...)
rule exitPaysAtMostEffectiveStake(...)
rule exitBeforeWaitingPeriodReverts(...)
rule slashNodeReducesStakeAndPaysReward(...)

The shape of these rules is intentionally direct. For example:

after registerNode(amount) succeeds, stored stake increases by exactly amount, and the oracle's modeled ORA balance increases by exactly amount.

For rewards:

claimReward mints exactly unclaimedReportCount * REWARD_PER_REPORT.

For slashing:

slashNode reduces stake by min(stakedAmount, MISREPORT_PENALTY) and pays the slasher exactly 10% of the penalty.

The slash rule is verified in a harness-staged finalized bucket. That is an honest modeling boundary: the suite proves the slash accounting once the slashable setup exists, but it does not yet prove an entire production-only multi-call sequence from registration to report to median recording to slashing.

The final Accounting run was green:

Accounting v7: verified

Mutation testing was strong here. A first entrypoint-focused mutation run killed 6 of 8 mutants. The two survivors both mutated the exitNode waiting-period guard. That led to a new rule:

rule exitBeforeWaitingPeriodReverts(...)

After adding that rule, the rerun killed all targeted Accounting mutants:

Accounting targeted mutation v4: killed 8 / 8

This is exactly what I want from mutation testing. It did not just produce a score. It pointed to a missing property, the suite got stronger, and the new rule killed the survivor.

Part 5: Views

View functions are easy to underestimate. They do not mutate state, so they can feel less dangerous than slashNode or exitNode. But in an oracle, views are part of the product's truth surface.

The Views family covers:

The effective stake formula is especially important:

effectiveStake = stakedAmount - missedBuckets * INACTIVITY_PENALTY

with a floor at zero.

The suite verifies that inactive nodes have zero effective stake, active nodes match the missed-bucket formula, and the edge case where reported buckets exceed the expected bucket count returns the stored stake instead of underflowing.

The price-deviation edge is also explicit:

That matters because the production constant is:

uint256 public constant MAX_DEVIATION_BPS = 1000;

or 10% in basis points. Getting the boundary wrong would either slash honest reporters at the limit or let slightly-too-far reporters escape.

The final Views run was green:

Views v8: verified

The Views mutation story was mixed and useful.

The suite eventually killed the prior effective-stake survivor by adding:

rule effectiveStakeReportsBeyondExpectedReturnsStake(...)

For getOutlierNodes, focused mutation killed the validCount mutants, but two loop-index mutants remained under optimistic_loop. I tried strict-loop baselines at higher bounds, but the original rule itself failed on loop unwinding. That means those survivors are not clean evidence of a production bug. They are documented loop-modeling debt:

Views outlier focused mutation: killed 2 / 4
Residual: two loop-index mutants under optimistic_loop

This is an important distinction. A survivor can mean “the spec is weak,” “the mutant is equivalent,” “the model boundary hides the behavior,” or “the original cannot be cleanly checked under that loop strategy.” In this case, the debt is real, but it is about dynamic-array loop proof engineering, not a confirmed StakingOracle defect.

What The Suite Proves

Here is the honest coverage statement I would use for this campaign:

The StakingOracle Certora suite verifies the core node lifecycle, bounded registry structure, bucket reporter/price alignment, report-state gates, ORA stake escrow, reward minting, exit payout bounds, staged slash accounting, effective-stake formula, recorded-median views, and main outlier filtering semantics under an honest ORA model and bounded loop assumptions.

More concretely:

What The Suite Does Not Prove

The final artifacts record four main boundaries.

1. Honest ORA token model

The suite models ORA as an honest ERC20-like token. That is appropriate for this challenge because OracleToken.sol is provided and simple.

It does not model:

If this were a production protocol accepting arbitrary ERC20 assets, this would be a much larger soundness issue. Here it is documented modeling debt.

2. Bounded loop depth

Structural properties use configured loop bounds. This is normal in many Certora campaigns over dynamic arrays, but it means those checks are bounded evidence, not mathematical proofs over arbitrary array length.

The current suite uses:

loop_iter = 3

for the relevant bounded structural reasoning.

3. Staged slash setup

The exact slash accounting rule uses a harness-staged finalized outlier bucket. That proves the arithmetic and state transition once the slashable condition exists.

It does not yet prove the full production-only sequence:

register -> report -> advance bucket -> record median -> slash

I considered adding that flow, but for the current blog and challenge context I would keep the suite as-is. The staged slash rule already covers the core accounting risk, and the missing production-only flow is clearly documented as future strengthening work.

So the honest claim is not “the entire slashing lifecycle is fully proven end to end.” The honest claim is that the core slashing accounting and state transition are verified for a finalized outlier bucket, while a production-only multi-call slash flow remains future strengthening work.

4. Outlier loop-index mutation

The suite verifies the semantic behavior of getOutlierNodes for focused cases, including a mixed three-reporter bucket. It also kills the validCount mutants.

Two loop-index mutants remain under optimistic_loop, and strict-loop baselines fail on the original due to unwinding. That is residual proof-engineering debt, not a confirmed production bug.

What I Learned

The WhitelistOracle suite was mostly about list integrity and median behavior. The StakingOracle suite was about lifecycle integrity.

A few lessons stood out:

1. Staking oracle correctness is mostly state-machine correctness

The median calculation matters, but the bigger risk is lifecycle drift:

The suite had to model the oracle as a protocol, not just as a price function.

2. Accounting rules should be exact whenever possible

For stake, rewards, exit, and slashing, vague properties are not enough. “Balance does not decrease too much” is weaker than:

the stored stake delta equals the token-transfer delta.

or:

the slasher receives exactly 10% of the penalty.

Exact-delta rules are harder to write, but much more valuable.

3. Non-inductive invariants should become preservation rules

Some structural truths are difficult to maintain as global invariants because of how dynamic arrays, removals, and bounded loops interact with the prover.

Instead of forcing a brittle invariant, I moved the node-address shape property into a successful-call preservation rule. That gave useful evidence without pretending the proof was stronger than it was.

4. Mutation testing needs targeting

The first ValidState mutation run survived everything because the random sample did not target ValidState’s actual responsibilities. The focused structural run killed 4 of 4.

That taught me not to read mutation results mechanically. A mutation run is only useful if the mutant family matches the spec family.

5. Modeling debt is part of the result

A good formal verification report is not just a list of green checks. It is also a map of the assumptions:

Those boundaries make the positive claims more credible, not less.

Closing Thought

The StakingOracle is the point where this oracle challenge stops being just about “what is the median price?” and starts being about incentives:

Certora was a good fit because those are not single-test questions. They are protocol-shape questions.

The current suite does not prove everything forever and under every possible model. It proves the core lifecycle and accounting properties under explicit assumptions, and it leaves a clear path for strengthening: production-only slash flow, deeper loop reasoning, and a more adversarial token model if the challenge ever moves beyond the provided ORA token.

That is a much stronger position than “the tests pass.”

It is also a natural bridge to the next oracle design: the OptimisticOracle, where correctness moves again, this time from staking economics to assertions, disputes, bonds, and settlement.

This article was originally published on Ethereum Tag and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].

NexaPay — Accept Card Payments, Receive Crypto

No KYC · Instant Settlement · Visa, Mastercard, Apple Pay, Google Pay

Get Started →