brk_oracle 0.3.2

Pure on-chain BTC/USD price oracle algorithm
Documentation

brk_oracle

Version 3

Pure on-chain BTC/USD price oracle. No exchange feeds, no external APIs. Derives the bitcoin price from transaction data alone. Tracks block by block from height 340,000 (January 2015) onward.

Inspired by UTXOracle by @SteveSimple, which proved the concept. brk_oracle takes the same core insight and redesigns the algorithm for per-block resolution and rolling operation. See comparison below.

The signal

People buy bitcoin in round dollar amounts. Each purchase creates a transaction output whose satoshi value depends on the current price:

  $100 at  $50,000/BTC  →  200,000 sats
  $100 at $100,000/BTC  →  100,000 sats

Thousands of these round-dollar purchases happen every day: $10, $20, $50, $100, $200, $500. Plot every transaction output in a block on a log-scale histogram and clear spikes emerge at each round-dollar amount:

       $5  $10 $20 $50 $100 $200 $500  $1k       $5k $10k
        ↓   ↓   ↓   ↓    ↓    ↓    ↓    ↓         ↓   ↓
   │            ▌        ▌         ▌                  ▌
   │    ▌   █   █   ▌    █    ▌    █    █         ▌   █
   │   ▐█▌ ▐█▌  █▌  █   ▐█▌  ▐█   ▐█▌  ▐█▌  █    ▐█  ▐█▌
   │▄▄████▄███████▄▐█▌▄█████▄██▌▄█████▄███▄▐█▌▄▄▄███▄████▄▄
   └─────────────────────────────────────────────────────────→
                          log₁₀(satoshis)

On a log scale, when the price changes all spikes shift together by the same number of bins. A 2x price move always shifts the pattern by ~60 bins, whether bitcoin moves from $1k to $2k or from $50k to $100k:

  price × 2  →  sats ÷ 2  →  shift left by log₁₀(2) × 200 ≈ 60 bins

  $50k:  ···· █ ···· █ ···· █ ···· █ ····
  $100k: ·· █ ···· █ ···· █ ···· █ ······
              ◄── 60 bins ──►

The spacing between spikes is constant (set by the ratios between dollar amounts). Only the position changes. The oracle detects this pattern and reads the price from where it lands.

How it works

The oracle tracks the price incrementally, block by block, starting from a known seed price. Each new block nudges the estimate. The search window is narrow (about 12 bins, or +15% / -12% in price), so the oracle can only follow gradual movement, not jump to an arbitrary price from scratch. This is by design: it makes the algorithm resistant to noise.

For each new block:

1. Filter outputs

Skip the coinbase transaction, and skip every output of a transaction carrying an OP_RETURN: that transaction is protocol machinery, not a dollar-denominated payment, so its payout amounts are not price signal. Below height 630,000, also skip every output of a transaction with more than 100 outputs: a large fan-out is a batch payout (exchange sweep, mixer), not a round-dollar payment, and the thin early signal needs it removed. At and above height 630,000, the transaction fan-out cap relaxes to 250 outputs so dense-chain payment activity remains visible while very large fan-outs cannot dominate one EMA slot. Then exclude noisy outputs: script types dominated by protocol activity (P2TR by default), dust below 1,000 sats, and round BTC amounts (0.01, 0.1, 1.0 BTC, etc.) that create false spikes unrelated to dollar purchases.

2. Build a log-scale histogram

Each remaining output becomes a bin index in a 2,400-bin histogram spanning 12 decades (1 sat to 10¹² sats):

  bin = round(log₁₀(sats) × 200)       200 bins per decade

3. Smooth over recent blocks

A single block has too few outputs for a clean signal. The oracle keeps a ring buffer of the last 12 block histograms and combines them into an exponential moving average (EMA) that weights recent blocks more heavily:

  EMA[bin] = Σ  weight(age) × histogram[age][bin]
             age=0..11

  weight(age) = α × (1 − α)^age         default α = 2/7 (~6-block span)

The EMA is recomputed from the ring buffer each block. This makes the oracle deterministic: since only the last 12 histograms matter, any oracle started from a known price converges to the exact same state after 12 blocks, regardless of prior history. This is what makes checkpointing and restoring possible.

4. Score with a 19-point stencil

The fixed ratios between round-dollar amounts ($1, $2, $3, $5, ... $10,000) create a fingerprint: a pattern of 19 spikes with known spacing on the log scale. A stencil encodes this spacing as bin offsets from a $100 reference point:

   $1       $5     $10          $50  $100  $200        $1k          $10k
    ↓        ↓      ↓            ↓     ↓     ↓          ↓             ↓
    ·────────·──────·────────────·─────·─────·──────────·─────────────·
  -400     -260   -200          -60    0    +60       +200          +400
                      bin offsets from the $100 reference point
                                 (19 offsets total)

The oracle slides this stencil across the EMA histogram within the search window. At each candidate position:

  1. Read the EMA value at all 19 expected spike locations
  2. Normalize each value by dividing by that offset's peak within the search window: this gives rare amounts like $3 equal voting weight to common amounts like $100
  3. Sum the 19 normalized values into a single score

The position with the highest score is where the fingerprint best matches the histogram.

5. Convert bin to price

A $100 purchase at price P produces $100 / P × 10⁸ sats, which lands in bin:

  bin = log₁₀($100 / P × 10⁸) × 200
      = (2 + 8 − log₁₀(P)) × 200
      = (10 − log₁₀(P)) × 200

So the stencil's winning position, the bin where $100 purchases land, directly encodes the price:

  price = 10^(10 − bin / 200)  dollars

Parabolic interpolation between the best bin and its two neighbors refines the estimate to sub-bin precision.

Pipeline

  block ──→ filter ──→ histogram ──→ ring buffer ──→ EMA ──→ stencil ──→ best bin ──→ $
             outputs     2,400 bins       ×12                  19-point    parabolic
                          log-scale                             scoring   interpolation

Input

The oracle consumes one pre-built histogram per block via process_histogram(&hist), a [u32; 2400] bin-count array, and returns the updated reference bin.

The caller filters as it builds the histogram, applying the step 1 rules. PaymentFilter::for_height(height).histogram(txs) builds a fresh block histogram from non-coinbase transaction outputs. Incremental live callers use PaymentFilter::MODERN.for_each_bin(outputs, emit), which applies the modern fan-out cap without requiring a height. PaymentFilter::eligible_bin(sats, output_type) returns an individual output's bin index, or None if filtered. The transaction-level rules include the OP_RETURN drop, the >100 transaction-output fan-out cap below height 630,000, and the >250 cap from height 630,000 onward.

The initial seed must be close to the real price at the starting height. The crate includes typed pre-oracle helpers for exchange prices at heights 0..340,000. Oracle::from_seed() uses the last baked price, height 339,999 (one below START_HEIGHT_SLOW), and the slow cold-start config to seed the oracle's first on-chain computation at height 340,000.

Configuration

All parameters via Config with sensible defaults:

Parameter Default Purpose
alpha 2/7 EMA decay rate (~6-block span)
window_size 12 Ring buffer depth in blocks
search_below / search_above 12 / 11 Search window around previous estimate (bins)
shape_weight 0 Shape-anchoring restoring-force weight. 0 disables it. Config::slow() sets 8 for the cold-start

The output-filtering rules (1,000-sat dust floor, excluded P2TR, round-BTC exclusion) are not Config parameters: they are constants in the filter module so the indexer, per-request reconstruction, and mempool all bin identically. See Input.

Between heights 340,000 and 508,000 the oracle runs a slower cold-start configuration (Config::slow(): alpha = 0.10, ~19-block span, window_size = 40, shape_weight = 8). In the thin pre-2018 output mix the fast default octave-locks onto the round-dollar half-price pattern, so the slow EMA and the shape-anchoring restoring force resist that drift. At 508,000 Oracle::reconfigure switches to the defaults above (shape_weight back to 0), and Config::for_height returns the right one for any height.

Comparison with UTXOracle

UTXOracle by @SteveSimple proved that BTC/USD can be derived purely from on-chain data. Both projects share the same core insight (round-dollar detection via log-scale histogram) but make different engineering choices:

brk_oracle UTXOracle
Resolution Per-block (~10 min); daily OHLC built downstream Per-run consensus price + per-output intraday scatter
Operation Rolling: EMA over ring buffer, updates each block Batch: processes a full day from scratch, stateless
Algorithm Single-pass stencil scoring with per-offset normalization Multi-step: dual stencil → rough estimate → output-to-USD mapping → iterative convergence
Steps to compute price 7 (filter+bin → ring insert → EMA → per-offset peaks → score → argmax+parabolic → bin→price) 10 (filter+bin → clip → smooth round BTC → sum → normalize → cap extremes → dual-stencil slide → neighbor weight-avg → output-to-USD map → iterative central price)
Stencil 19 round-USD offsets ($1 to $10k), each normalized to its own peak 803-point Gaussian + weighted spike template targeting 17 round-USD amounts
Round BTC handling Excluded from histogram entirely Histogram bins smoothed by averaging neighbors
Output filtering Per-tx OP_RETURN drop, then per-output: script type, dust threshold, round BTC Per-tx: not coinbase, no OP_RETURN, exactly 2 outputs, ≤5 inputs, no same-day inputs, ≤500-byte witness
Validated from Height 340,000 (January 2015) Dec 15, 2023
Language Rust Python
Dependencies None (pure computation, caller provides block data) bitcoin-cli + direct blk file reads
Bins per decade 200 200

Accuracy

Tested over 596,251 exchange-covered blocks after running the oracle from height 340,000 through height 952,314. Error is measured per block as distance from the oracle estimate to the exchange high/low range at that height. If the oracle falls within the range, the error is zero.

Per-block

Metric Value
Median error 0.15%
95th percentile 1.2%
99th percentile 3.4%
99.9th percentile 15.6%
RMSE 0.97%
Max error 33.8%
Bias +0.06 bins (essentially zero)
Blocks > 5% error 3,233 (0.542%)
Blocks > 10% error 1,323
Blocks > 20% error 154

Daily candles

Oracle daily OHLC built from per-block prices vs exchange daily OHLC:

Median RMSE Max
Open 0.24% 1.07% 29.1%
High 0.58% 1.47% 27.3%
Low 0.53% 1.95% 55.1%
Close 0.27% 1.18% 29.2%

By year

Year Blocks Median RMSE Max >5% >10% >20% Price range
2015 51,249 0.26% 1.67% 33.8% 916 449 25 $198–$500
2016 54,753 0.33% 0.80% 16.9% 150 33 0 $351–$989
2017 55,959 0.45% 2.05% 28.6% 1,527 606 67 $0–$19,892
2018 54,531 0.18% 1.31% 31.6% 411 207 62 $3,129–$17,178
2019 54,272 0.16% 0.59% 17.4% 100 16 0 $3,338–$13,868
2020 53,102 0.10% 0.42% 11.6% 61 3 0 $3,858–$29,322
2021 52,733 0.07% 0.47% 14.4% 42 9 0 $27,678–$69,000
2022 53,230 0.07% 0.32% 6.8% 10 0 0 $15,460–$48,240
2023 54,032 0.10% 0.25% 6.6% 5 0 0 $16,490–$44,700
2024 53,367 0.10% 0.28% 6.7% 7 0 0 $38,555–$108,298
2025 53,113 0.11% 0.25% 5.8% 4 0 0 $74,409–$126,198
2026 5,910 0.10% 0.27% 3.2% 0 0 0 $60,000–$97,900

The oracle is only as good as the signal it reads. The largest errors cluster in the early cold-start, where thin 2015 on-chain volume gives a weaker round-dollar pattern: the 33.8% max error sits at height 341,498 (oracle ~$287 vs exchange ~$213) during the first weeks of warm-up. A second cluster sits just below the 508,000 regime switch, where the slow EMA lagged the fast early-2018 rally (~31.6% at height 507,278, oracle ~$6,685 vs exchange ~$8,800) before handing off to the fast default. The thin pre-2018 mix means 2015, 2017, and 2018 carry the bulk of the error (1.67%, 2.05%, and 1.31% RMSE). From 2019 the signal strengthens: by 2020 the oracle reaches 0.1% median accuracy, and since 2022 no block exceeds 10% error.

Why no outlier smoothing?

Post-hoc smoothing, for example correcting any block whose price deviates more than 5% from both its neighbors, would improve the aggregate numbers. This is deliberately not done, for two reasons:

  1. Simplicity: The oracle is a single forward pass with no lookback corrections. Adding smoothing means defining thresholds, neighbor windows, and replacement strategies, all of which add complexity for marginal gain.
  2. Finality: Each block's price is produced once and never revised (unless the block itself is reorged). Downstream consumers can treat the oracle output as append-only. Smoothing would require retroactively changing already-published prices, breaking that property.

Changelog

v4

Changes from v3:

  • Modern fan-out cap: below height 630,000 the oracle keeps the strict >100-output transaction drop introduced in v3. At and above 630,000 the cap now relaxes to 250 outputs instead of being fully lifted. This preserves dense-chain payment signal while preventing very large modern fan-outs from dominating a single EMA slot and creating a transient false round-dollar ladder.

VERSION is bumped to 4 so downstream consumers invalidate prices computed by an earlier algorithm.

v3

Changes from v2:

  • Earlier start with a cold-start regime: on-chain tracking begins at height 340,000 (January 2015) instead of 525,000, adding about 185,000 blocks of history. Below height 508,000 the oracle runs a slower EMA (Config::slow(), ~19-block span, window 40) paired with a shape-anchoring restoring force (shape_weight 8) that pulls candidate scores toward a slowly-adapted profile of the round-dollar arm shape, resisting the half-price octave drift the fast default locks onto in the thinner pre-2018 output mix. At height 508,000 it switches to the fast default via Oracle::reconfigure, which restores shape_weight to 0 and turns the force off.
  • Max-outputs filter: a transaction with more than 100 outputs is dropped from the histogram below height 630,000. Large fan-outs (exchange sweeps, mixer payouts) are batch machinery, not round-dollar payments, and the thin 2018-2020 signal needs them removed to stay locked onto the pattern. Above 630,000 on-chain volume is dense enough that the cap removes more genuine signal than noise, so it is lifted.
  • Wider up-reach: search_below raised from 9 to 12 bins. The sharp 2018 reversal candles need extra room to follow a fast move upward in price.

VERSION is bumped to 3 so downstream consumers invalidate prices computed by an earlier algorithm.

v2

Changes from v1:

  • OP_RETURN filter: every output of a transaction carrying an OP_RETURN is now dropped from the histogram. Such transactions are protocol machinery (cross-chain swaps, anchoring) whose payout amounts can form false round-dollar patterns. This was the trigger for the worst price glitches in v1.
  • P2WSH reactivated: once the OP_RETURN filter removes the protocol noise, P2WSH outputs are usable round-dollar signal again, so they are no longer excluded. P2TR stays excluded.
  • Earlier start: on-chain tracking begins at height 525,000 (May 2018) instead of 550,000, adding about 25,000 blocks of history.

VERSION is exposed as a crate constant so downstream consumers can invalidate prices computed by an earlier algorithm.