Formal Verification of a Staking Oracle — Certora Prover Continued

Chinhaotsou15 min read·Just now

From trusted price sources to economic security, buckets, rewards, and slashing

Press enter or click to view image in full size

From Whitelist Oracle to Staking Oracle

In the previous oracle verification post, I started with the simplest design in the Speedrun Ethereum Oracles challenge: a WhitelistOracle. That contract was mostly about structural correctness. Could the owner add and remove oracle contracts? Did the oracle list stay consistent after swap-and-pop? Did stale prices get filtered before calculating the median?

The next checkpoint is a much richer target: StakingOracle.

Here, correctness is no longer just “does the array contain the right oracle addresses?” The protocol now has economic state:

oracle nodes must stake ORA tokens before reporting
reports are grouped into block buckets
each node may report at most once per bucket
buckets must be finalized by recording a median
reporters can claim rewards
outliers can be slashed
inactive nodes lose effective stake over time
nodes can exit after a waiting period

That means the verification target moved from a mostly trust-based oracle to an incentive-based oracle. The core question became:

Can the staking, reporting, reward, exit, and slashing lifecycle preserve the accounting and state-machine promises the protocol relies on?

This post walks through the Certora suite I built for StakingOracle, what it proves, what it does not prove, and how mutation testing shaped the final result.

The Protocol: StakingOracle

At a high level, the oracle works like this:

struct OracleNode {
    uint256 stakedAmount;
    uint256 lastReportedBucket;
    uint256 reportCount;
    uint256 claimedReportCount;
    uint256 firstBucket;
    bool active;
}

struct BlockBucket {
    mapping(address => bool) slashedOffenses;
    address[] reporters;
    uint256[] prices;
    uint256 medianPrice;
}

Nodes register by staking ORA:

function registerNode(uint256 amount) public;

They report prices into the current block bucket:

function reportPrice(uint256 price) public;

Someone later finalizes a past bucket by recording the median:

function recordBucketMedian(uint256 bucketNumber) public;

If a node’s price is more than 10% away from that median, it can be slashed:

function slashNode(
    address nodeToSlash,
    uint256 bucketNumber,
    uint256 reportIndex,
    uint256 nodeAddressesIndex
) public;

Nodes can also add stake, claim rewards, and exit:

function addStake(uint256 amount) public;
function claimReward() public;
function exitNode(uint256 index) public;

On paper, this is still a teaching challenge. But the design already has the ingredients that make real DeFi verification interesting: dynamic arrays, mappings, token escrow, delayed finalization, replay prevention, slashable offenses, and view functions whose answers depend on historical state.

What Could Go Wrong?

Before writing CVL, I wrote the failure modes in plain English. This step matters because Certora will happily prove the wrong thing if I ask the wrong question precisely.

For this oracle, the main questions were:

Can an inactive or unregistered node report a price?
Can a node report twice in the same bucket?
Can a node skip finalizing an earlier report bucket and keep reporting forever?
Can the node address array drift away from nodes[address].active?
Can bucket reporter and price arrays become misaligned?
Does registerNode escrow exactly the amount it records as stake?
Does addStake increase stored stake by exactly the transferred amount?
Does claimReward mint exactly one ORA per unclaimed report?
Does exitNode pay at most the effective stake and deactivate the node?
Does slashNode reduce stake by the expected penalty and pay the slasher exactly 10% of that penalty?
Does getEffectiveStake floor at zero after inactivity penalties?
Does the outlier view exclude honest reporters and already-slashed reporters?
Is exactly 10% deviation safe, while greater than 10% is slashable?

Those questions became five focused spec families:

Sanity.spec
ValidState.spec
StateMachine.spec
Accounting.spec
Views.spec

I grouped these properties into their own StakingOracle suite so the assumptions, artifacts, mutation history, and modeling debt were easy to read as one campaign.

Why This Needed a Richer Model

The WhitelistOracle suite already needed structural reasoning over a dynamic array. StakingOracle adds more moving parts:

nodeAddresses is a dynamic registry
nodes[address] is the authoritative per-node mapping
each bucket has parallel dynamic arrays: reporters and prices
each bucket also has a mapping of already-slashed offenses
ORA token transfers affect stake accounting
ORA minting affects rewards
block number controls the current bucket and inactivity penalties

This is what makes the model richer than the WhitelistOracle model. The hard part is not contract size. The hard part is that correctness depends on relationships between multiple storage objects and multi-step lifecycle flows.

The suite uses a verification harness for read-only observability and a Certora-friendly ORA model so token flows can be checked directly instead of hidden behind broad summaries.

The important harness getters expose proof-only state such as:

nodeAddresses[index] and nodeAddresses.length
bucket reporters[index] and prices[index]
bucket reporter and price lengths
bucket median
bucket slashed flags
individual fields from nodes[address]

This does not change production behavior. It gives the prover enough visibility to talk about the protocol’s internal state.

Part 1: Sanity

I started with reachability.

The Sanity family does not try to prove the deepest correctness properties. It asks whether important flows can succeed under reasonable setup:

a funded account can register
a registered account can report
a reporter can claim a reward

This is a small but useful first gate. If these rules fail, the issue is usually setup, linking, token modeling, or an impossible precondition. There is no point debugging a beautiful accounting rule if the prover cannot even reach registerNode.

The final sanity run was green:

Sanity v2: verified

I did not run mutation testing for Sanity. That was a deliberate choice. Sanity is mostly a reachability family; mutation testing is more useful on semantic families like Accounting, StateMachine, ValidState, and Views.

Part 2: Valid State

The ValidState family checks the shape of the protocol’s storage.

The obvious invariant I wanted was:

Active nodes in nodeAddresses should agree with nodes[address].active.

But this is where formal verification becomes humbling. A property can be true in the reachable states we care about and still be awkward or non-inductive as a global invariant. The initial approach tried to force too much as standalone invariants. The final version uses a bounded preservation rule instead:

rule successfulCallsPreserveBoundedNodeAddressShape(
    env e,
    method f,
    calldataarg args,
    address trackedNode
)

That rule checks the node-list shape across successful calls within the configured loop bound. It covers the key structural expectations without pretending we have an unbounded proof over every possible array length.

The suite also keeps the bucket arrays aligned:

invariant bucketReportersAndPricesStayAligned(uint256 bucketNumber)

That invariant matters because bucket reports are represented as parallel arrays:

address[] reporters;
uint256[] prices;

If those arrays ever diverge in length, prices[i] no longer necessarily belongs to reporters[i]. That would be a serious slashing and outlier-detection problem.

The final ValidState run was green:

ValidState v7: verified

Mutation testing found an important lesson here.

The first ValidState mutation run looked disappointing: 10 of 10 mutants survived. If I only looked at that number, I might conclude that the ValidState rules were weak.

But after triage, the result meant something more specific. The random mutants did not mostly mutate the things ValidState was designed to protect. They landed on behavior that belonged to other families, such as reward math, reporting transitions, or token-flow assumptions. A node-list structural rule is not supposed to kill a reward-accounting mutant. So the result was low-signal rather than a clean indictment of the ValidState family.

That changed how I used mutation testing. Instead of asking “did this family kill random mutants anywhere in the contract?”, I asked a sharper question:

If I mutate the storage-shape promises that ValidState is responsible for, does the ValidState family catch it?

For example, one focused structural mutant removed the registration append:

nodes[msg.sender] = OracleNode({
    stakedAmount: amount,
    lastReportedBucket: 0,
    reportCount: 0,
    claimedReportCount: 0,
    firstBucket: getCurrentBucketNumber(),
    active: true
});
// Mutant: nodeAddresses.push(msg.sender) is missing

That mutant leaves the mapping saying “this node is active,” but the public node registry does not include the node. Accounting might still look fine, and a reward rule might not notice it. But ValidState should notice because its job is to keep the mapping and registry consistent. In the final suite, the structural rules kill this kind of mutant.

The improvement was adding a small structural layer to the spec:

definition listedInBoundedPrefix(address node) returns bool = ...;
definition boundedListedEntriesActiveAndNonzero() returns bool = ...;
definition boundedListedEntriesUnique() returns bool = ...;
definition boundedActiveNodeRepresented(address node) returns bool = ...;
rule successfulCallsPreserveBoundedNodeAddressShape(...)

That rule says: before a state-changing call, assume the bounded node list is well shaped; after any successful state-changing call, the bounded node list must still be well shaped. This is a preservation rule, not a free pass. It does not say “pretend the contract is correct.” It says “from a good structural state, no successful function call may break the structure.”

The suite also has a direct registration rule:

rule registerCreatesActiveListedNode(env e, uint256 amount)

That rule checks the base transition: after registerNode(amount) succeeds, the new node must be active and must appear at the newly appended registry index. So if the production contract forgot nodeAddresses.push(msg.sender), registration would not pass this rule.

In plain English, the combined ValidState layer checks:

every listed node should be nonzero
every listed node should be active
listed nodes should not be duplicated
a tracked active node should appear in the bounded list
registration creates an active listed node

So when a mutant removes nodeAddresses.push(msg.sender), the active node can no longer be found in the bounded registry and the rule fails. The same structural layer also catches removal bugs, such as forgetting to clear active or forgetting to pop the removed node from the array.

After creating focused structural mutants, the suite killed 4 of 4:

ValidState focused structural mutation: killed 4 / 4

Those focused mutants touched exactly the kind of behavior ValidState is supposed to protect: node-list append, node removal, active flags, and bucket reporter/price alignment.

Part 3: State Machine

The StateMachine family is about lifecycle rules.

The main properties are:

inactive nodes cannot report
inactive nodes cannot add stake
registered nodes report at most once per bucket
successful reporting updates the bucket and report count
reporting in a later bucket requires the previous reported bucket median to be recorded
current and future buckets cannot be finalized
bucket median recording stores the sorted median
slashing marks the offense and reduces stake
exiting deactivates the node

Two rules capture the heart of bucket progression:

rule registeredNodeReportsAtMostOncePerBucket(...)
rule reportingInLaterBucketRequiresPriorMedian(...)

The second rule is especially important. Without it, a node could keep reporting into later buckets while earlier reported buckets were never finalized. That would make the reward, inactivity, and slashing model much harder to reason about because reports would pile up without the protocol committing to their medians.

The final StateMachine run was green:

StateMachine v5: verified

Mutation testing again changed the suite. The first targeted mutation run killed only 3 of 10 mutants.

The killed mutants showed that some important rules were already working:

changing the prior-bucket median guard was killed by reportingInLaterBucketRequiresPriorMedian
deleting node.reportCount++ was killed by successfulReportUpdatesBucketAndCount
deleting _removeNode during exit was killed by exitDeactivatesNode

But five survivors were real StateMachine responsibilities. They were not random “maybe interesting” mutants anymore; they pointed at specific lifecycle promises that needed sharper rules:

recordBucketMedian always reverting for current/future bucket checks
deleting priceArray.sort() before median calculation
making a valid slashNode path always revert on the report-index guard
making a valid slashNode path always revert on the already-slashed guard
changing slash stake reduction to subtract zero

The workflow was:

Run mutation on the StateMachine family.
Classify each survivor by responsibility.
Ignore the survivors that belonged to Accounting or token-modeling.
Copy the exact StateMachine-owned survivors into a manual mutation set.
Add rules that express the missing lifecycle promises.
Rerun only those exact survivors.

For example, the deleted-sort mutant was:

uint256[] memory priceArray = bucket.prices;
// Mutant: priceArray.sort() is missing
uint256 medianPrice = priceArray.getMedian();
bucket.medianPrice = medianPrice;

That survivor told me the suite needed to prove not just “recording a median can succeed,” but “recording a median stores the sorted median.” So I added a rule that seeds an intentionally unsorted three-price bucket:

highPrice, lowPrice, medianPrice

Then it calls recordBucketMedian(bucketNumber) and asserts:

assert getBucketMedian(bucketNumber) == medianPrice;

If sorting is removed, that rule fails because getMedian() is now applied to an unsorted array.

The slash survivors were handled the same way. I added a staged valid slash path and asserted that a valid slash:

does not revert
marks the offense as slashed
reduces stake by MISREPORT_PENALTY
does not deactivate a node when the remaining stake is still positive

That is what slashMarksOffenseAndReducesStake checks.

After those v5 rules were added, the exact manual survivor rerun killed all five:

StateMachine manual survivor mutation: killed 5 / 5

The remaining initial survivors were not really StateMachine gaps. They were Accounting-owned reward math or honest-token transfer-failure modeling boundaries.

This is one of the practical lessons from the campaign: mutation numbers are not meaningful until you classify the mutants by family and by threat model.

Part 4: Accounting

The Accounting family is the most important part of this suite.

For a staking oracle, accounting is where a harmless-looking bug can become a real exploit:

recording more stake than was transferred
allowing rewards to be claimed twice
paying more on exit than the node’s effective stake
reducing a slashed node’s stake without paying the expected slasher reward
paying the slasher from the wrong source or by the wrong percentage

The final Accounting rules cover the core token-flow promises:

rule registerEscrowsExactStake(...)
rule addStakeIncreasesStoredStakeAndEscrow(...)
rule claimRewardMintsUnclaimedReports(...)
rule exitPaysAtMostEffectiveStake(...)
rule exitBeforeWaitingPeriodReverts(...)
rule slashNodeReducesStakeAndPaysReward(...)

The shape of these rules is intentionally direct. For example:

after registerNode(amount) succeeds, stored stake increases by exactly amount, and the oracle's modeled ORA balance increases by exactly amount.

For rewards:

claimReward mints exactly unclaimedReportCount * REWARD_PER_REPORT.

For slashing:

slashNode reduces stake by min(stakedAmount, MISREPORT_PENALTY) and pays the slasher exactly 10% of the penalty.

The slash rule is verified in a harness-staged finalized bucket. That is an honest modeling boundary: the suite proves the slash accounting once the slashable setup exists, but it does not yet prove an entire production-only multi-call sequence from registration to report to median recording to slashing.

The final Accounting run was green:

Accounting v7: verified

Mutation testing was strong here. A first entrypoint-focused mutation run killed 6 of 8 mutants. The two survivors both mutated the exitNode waiting-period guard. That led to a new rule:

rule exitBeforeWaitingPeriodReverts(...)

After adding that rule, the rerun killed all targeted Accounting mutants:

Accounting targeted mutation v4: killed 8 / 8

This is exactly what I want from mutation testing. It did not just produce a score. It pointed to a missing property, the suite got stronger, and the new rule killed the survivor.

Part 5: Views

View functions are easy to underestimate. They do not mutate state, so they can feel less dangerous than slashNode or exitNode. But in an oracle, views are part of the product's truth surface.

The Views family covers:

getCurrentBucketNumber
getEffectiveStake
getPastPrice
getLatestPrice
_checkPriceDeviated behavior through exposed harness helpers
getOutlierNodes

The effective stake formula is especially important:

effectiveStake = stakedAmount - missedBuckets * INACTIVITY_PENALTY

with a floor at zero.

The suite verifies that inactive nodes have zero effective stake, active nodes match the missed-bucket formula, and the edge case where reported buckets exceed the expected bucket count returns the stored stake instead of underflowing.

The price-deviation edge is also explicit:

exactly 10% deviation is not slashable
greater than 10% deviation is slashable

That matters because the production constant is:

uint256 public constant MAX_DEVIATION_BPS = 1000;

or 10% in basis points. Getting the boundary wrong would either slash honest reporters at the limit or let slightly-too-far reporters escape.

The final Views run was green:

Views v8: verified

The Views mutation story was mixed and useful.

The suite eventually killed the prior effective-stake survivor by adding:

rule effectiveStakeReportsBeyondExpectedReturnsStake(...)

For getOutlierNodes, focused mutation killed the validCount mutants, but two loop-index mutants remained under optimistic_loop. I tried strict-loop baselines at higher bounds, but the original rule itself failed on loop unwinding. That means those survivors are not clean evidence of a production bug. They are documented loop-modeling debt:

Views outlier focused mutation: killed 2 / 4
Residual: two loop-index mutants under optimistic_loop

This is an important distinction. A survivor can mean “the spec is weak,” “the mutant is equivalent,” “the model boundary hides the behavior,” or “the original cannot be cleanly checked under that loop strategy.” In this case, the debt is real, but it is about dynamic-array loop proof engineering, not a confirmed StakingOracle defect.

What The Suite Proves

Here is the honest coverage statement I would use for this campaign:

The StakingOracle Certora suite verifies the core node lifecycle, bounded registry structure, bucket reporter/price alignment, report-state gates, ORA stake escrow, reward minting, exit payout bounds, staged slash accounting, effective-stake formula, recorded-median views, and main outlier filtering semantics under an honest ORA model and bounded loop assumptions.

More concretely:

Node registration: successful registration creates an active listed node with exact stake escrow.
Duplicate registration: duplicate node registration reverts.
Node list shape: successful calls preserve bounded node-address shape.
Bucket structure: reporter and price arrays stay aligned.
Reporting: inactive nodes cannot report, and active nodes cannot report twice in one bucket.
Bucket progression: later reporting requires the previous reported bucket median to be recorded.
Median recording: current and future buckets cannot be finalized, and past buckets record the sorted median.
Rewards: reward minting matches unclaimed reports times REWARD_PER_REPORT.
Exit: early exit reverts; successful exit deactivates the node and pays at most effective stake.
Slashing: staged slash accounting reduces stake by the expected penalty and pays the slasher 10%.
Views: bucket number, effective stake, latest/past prices, and deviation edges are covered.
Outliers: non-deviated and already-slashed reporters are excluded; mixed reporter buckets are checked.

What The Suite Does Not Prove

The final artifacts record four main boundaries.

1. Honest ORA token model

The suite models ORA as an honest ERC20-like token. That is appropriate for this challenge because OracleToken.sol is provided and simple.

It does not model:

fee-on-transfer behavior
rebasing behavior
ERC777-style callbacks
malicious return values
fake token contracts

If this were a production protocol accepting arbitrary ERC20 assets, this would be a much larger soundness issue. Here it is documented modeling debt.

2. Bounded loop depth

Structural properties use configured loop bounds. This is normal in many Certora campaigns over dynamic arrays, but it means those checks are bounded evidence, not mathematical proofs over arbitrary array length.

The current suite uses:

loop_iter = 3

for the relevant bounded structural reasoning.

3. Staged slash setup

The exact slash accounting rule uses a harness-staged finalized outlier bucket. That proves the arithmetic and state transition once the slashable condition exists.

It does not yet prove the full production-only sequence:

register -> report -> advance bucket -> record median -> slash

I considered adding that flow, but for the current blog and challenge context I would keep the suite as-is. The staged slash rule already covers the core accounting risk, and the missing production-only flow is clearly documented as future strengthening work.

So the honest claim is not “the entire slashing lifecycle is fully proven end to end.” The honest claim is that the core slashing accounting and state transition are verified for a finalized outlier bucket, while a production-only multi-call slash flow remains future strengthening work.

4. Outlier loop-index mutation

The suite verifies the semantic behavior of getOutlierNodes for focused cases, including a mixed three-reporter bucket. It also kills the validCount mutants.

Two loop-index mutants remain under optimistic_loop, and strict-loop baselines fail on the original due to unwinding. That is residual proof-engineering debt, not a confirmed production bug.

What I Learned

The WhitelistOracle suite was mostly about list integrity and median behavior. The StakingOracle suite was about lifecycle integrity.

A few lessons stood out:

1. Staking oracle correctness is mostly state-machine correctness

The median calculation matters, but the bigger risk is lifecycle drift:

a node that should be inactive can still report
a node reports twice in one bucket
a bucket cannot be finalized cleanly
rewards can be claimed against the wrong count
stake and token balance drift apart

The suite had to model the oracle as a protocol, not just as a price function.

2. Accounting rules should be exact whenever possible

For stake, rewards, exit, and slashing, vague properties are not enough. “Balance does not decrease too much” is weaker than:

the stored stake delta equals the token-transfer delta.

or:

the slasher receives exactly 10% of the penalty.

Exact-delta rules are harder to write, but much more valuable.

3. Non-inductive invariants should become preservation rules

Some structural truths are difficult to maintain as global invariants because of how dynamic arrays, removals, and bounded loops interact with the prover.

Instead of forcing a brittle invariant, I moved the node-address shape property into a successful-call preservation rule. That gave useful evidence without pretending the proof was stronger than it was.

4. Mutation testing needs targeting

The first ValidState mutation run survived everything because the random sample did not target ValidState’s actual responsibilities. The focused structural run killed 4 of 4.

That taught me not to read mutation results mechanically. A mutation run is only useful if the mutant family matches the spec family.

5. Modeling debt is part of the result

A good formal verification report is not just a list of green checks. It is also a map of the assumptions:

this token model is honest
this loop proof is bounded
this slash setup is staged
this view still has loop-index mutation debt

Those boundaries make the positive claims more credible, not less.

Closing Thought

The StakingOracle is the point where this oracle challenge stops being just about “what is the median price?” and starts being about incentives:

who is allowed to report
what they put at risk
when the protocol accepts their report
how it rewards honest participation
how it punishes outliers

Certora was a good fit because those are not single-test questions. They are protocol-shape questions.

The current suite does not prove everything forever and under every possible model. It proves the core lifecycle and accounting properties under explicit assumptions, and it leaves a clear path for strengthening: production-only slash flow, deeper loop reasoning, and a more adversarial token model if the challenge ever moves beyond the provided ORA token.

That is a much stronger position than “the tests pass.”

It is also a natural bridge to the next oracle design: the OptimisticOracle, where correctness moves again, this time from staking economics to assertions, disputes, bonds, and settlement.

Formal Verification of a Staking Oracle — Certora Prover Continued

Formal Verification of a Staking Oracle — Certora Prover Continued

From Whitelist Oracle to Staking Oracle

The Protocol: StakingOracle

What Could Go Wrong?

Why This Needed a Richer Model

Part 1: Sanity

Part 2: Valid State

Part 3: State Machine

Part 4: Accounting

Part 5: Views

What The Suite Proves

What The Suite Does Not Prove

1. Honest ORA token model

2. Bounded loop depth

3. Staged slash setup

4. Outlier loop-index mutation

What I Learned

1. Staking oracle correctness is mostly state-machine correctness

2. Accounting rules should be exact whenever possible

3. Non-inductive invariants should become preservation rules

4. Mutation testing needs targeting

5. Modeling debt is part of the result

Closing Thought

NexaPay — Accept Card Payments, Receive Crypto

Related Articles

Senate Banking Panel Releases CLARITY Act Draft Ahead of Thursday Markup

XRP price has the bullish signal traders wanted, but one Bitcoin level could wreck it

BitMine's Tom Lee Buys More Crypto Than Michael Saylor

1.3% of All XRP Now Unavailable Amid US ETF Rally; Vitalik Buterin Surprises Market With New SHIB-Style Donation; Bitcoin to $126,000: Arthur Hayes on New BTC Price High - Morning Crypto Report

Michael Nadeau: Crypto’s correlation with the Nasdaq is at an all-time high, the AI boom is driving current price trends, and understanding market bubbles is crucial for investors | Bankless

“He Knew My Husband’s Unit”: How Werewolfs.com and a Fake Lieutenant Cost My Mother Her Retirement