# brk_oracle
**Version 3**
Pure on-chain BTC/USD price oracle. No exchange feeds, no external APIs. Derives the bitcoin price from transaction data alone. Tracks block by block from height 340,000 (January 2015) onward.
Inspired by [UTXOracle](https://utxo.live/oracle/) by [@SteveSimple](https://x.com/SteveSimple), which proved the concept. brk_oracle takes the same core insight and redesigns the algorithm for per-block resolution and rolling operation. See [comparison](#comparison-with-utxoracle) below.
## The signal
People buy bitcoin in round dollar amounts. Each purchase creates a transaction output whose satoshi value depends on the current price:
```
$100 at $50,000/BTC → 200,000 sats
$100 at $100,000/BTC → 100,000 sats
```
Thousands of these round-dollar purchases happen every day: $10, $20, $50, $100, $200, $500. Plot every transaction output in a block on a log-scale histogram and clear spikes emerge at each round-dollar amount:
```
$5 $10 $20 $50 $100 $200 $500 $1k $5k $10k
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
│ ▌ ▌ ▌ ▌
│ ▌ █ █ ▌ █ ▌ █ █ ▌ █
│ ▐█▌ ▐█▌ █▌ █ ▐█▌ ▐█ ▐█▌ ▐█▌ █ ▐█ ▐█▌
│▄▄████▄███████▄▐█▌▄█████▄██▌▄█████▄███▄▐█▌▄▄▄███▄████▄▄
└─────────────────────────────────────────────────────────→
log₁₀(satoshis)
```
On a log scale, when the price changes **all spikes shift together** by the same number of bins. A 2x price move always shifts the pattern by ~60 bins, whether bitcoin moves from $1k to $2k or from $50k to $100k:
```
price × 2 → sats ÷ 2 → shift left by log₁₀(2) × 200 ≈ 60 bins
$50k: ···· █ ···· █ ···· █ ···· █ ····
$100k: ·· █ ···· █ ···· █ ···· █ ······
◄── 60 bins ──►
```
The spacing between spikes is constant (set by the ratios between dollar amounts). Only the position changes. The oracle detects this pattern and reads the price from where it lands.
## How it works
The oracle tracks the price incrementally, block by block, starting from a known seed price. Each new block nudges the estimate. The search window is narrow (about 12 bins, or +15% / -12% in price), so the oracle can only follow gradual movement, not jump to an arbitrary price from scratch. This is by design: it makes the algorithm resistant to noise.
For each new block:
### 1. Filter outputs
Skip the coinbase transaction, and skip every output of a transaction carrying an `OP_RETURN`: that transaction is protocol machinery, not a dollar-denominated payment, so its payout amounts are not price signal. Below height 630,000, also skip every output of a transaction with more than 100 outputs: a large fan-out is a batch payout (exchange sweep, mixer), not a round-dollar payment, and the thin early signal needs it removed. At and above height 630,000, the transaction fan-out cap relaxes to 250 outputs so dense-chain payment activity remains visible while very large fan-outs cannot dominate one EMA slot. Then exclude noisy outputs: script types dominated by protocol activity (P2TR by default), dust below 1,000 sats, and round BTC amounts (0.01, 0.1, 1.0 BTC, etc.) that create false spikes unrelated to dollar purchases.
### 2. Build a log-scale histogram
Each remaining output becomes a bin index in a 2,400-bin histogram spanning 12 decades (1 sat to 10¹² sats):
```
bin = round(log₁₀(sats) × 200) 200 bins per decade
```
### 3. Smooth over recent blocks
A single block has too few outputs for a clean signal. The oracle keeps a ring buffer of the last 12 block histograms and combines them into an exponential moving average (EMA) that weights recent blocks more heavily:
```
EMA[bin] = Σ weight(age) × histogram[age][bin]
age=0..11
weight(age) = α × (1 − α)^age default α = 2/7 (~6-block span)
```
The EMA is recomputed from the ring buffer each block. This makes the oracle deterministic: since only the last 12 histograms matter, any oracle started from a known price converges to the exact same state after 12 blocks, regardless of prior history. This is what makes checkpointing and restoring possible.
### 4. Score with a 19-point stencil
The fixed ratios between round-dollar amounts ($1, $2, $3, $5, ... $10,000) create a fingerprint: a pattern of 19 spikes with known spacing on the log scale. A stencil encodes this spacing as bin offsets from a $100 reference point:
```
$1 $5 $10 $50 $100 $200 $1k $10k
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
·────────·──────·────────────·─────·─────·──────────·─────────────·
-400 -260 -200 -60 0 +60 +200 +400
bin offsets from the $100 reference point
(19 offsets total)
```
The oracle slides this stencil across the EMA histogram within the search window. At each candidate position:
1. **Read** the EMA value at all 19 expected spike locations
2. **Normalize** each value by dividing by that offset's peak within the search window: this gives rare amounts like $3 equal voting weight to common amounts like $100
3. **Sum** the 19 normalized values into a single score
The position with the highest score is where the fingerprint best matches the histogram.
### 5. Convert bin to price
A $100 purchase at price P produces `$100 / P × 10⁸` sats, which lands in bin:
```
bin = log₁₀($100 / P × 10⁸) × 200
= (2 + 8 − log₁₀(P)) × 200
= (10 − log₁₀(P)) × 200
```
So the stencil's winning position, the bin where $100 purchases land, directly encodes the price:
```
price = 10^(10 − bin / 200) dollars
```
Parabolic interpolation between the best bin and its two neighbors refines the estimate to sub-bin precision.
## Pipeline
```
block ──→ filter ──→ histogram ──→ ring buffer ──→ EMA ──→ stencil ──→ best bin ──→ $
outputs 2,400 bins ×12 19-point parabolic
log-scale scoring interpolation
```
## Input
The oracle consumes one pre-built histogram per block via `process_histogram(&hist)`, a `[u32; 2400]` bin-count array, and returns the updated reference bin.
The caller filters as it builds the histogram, applying the [step 1](#1-filter-outputs) rules. `PaymentFilter::for_height(height).histogram(txs)` builds a fresh block histogram from non-coinbase transaction outputs. Incremental live callers use `PaymentFilter::MODERN.for_each_bin(outputs, emit)`, which applies the modern fan-out cap without requiring a height. `PaymentFilter::eligible_bin(sats, output_type)` returns an individual output's bin index, or `None` if filtered. The transaction-level rules include the OP_RETURN drop, the >100 transaction-output fan-out cap below height 630,000, and the >250 cap from height 630,000 onward.
The initial seed must be close to the real price at the starting height. The crate includes typed pre-oracle helpers for exchange prices at heights 0..340,000. `Oracle::from_seed()` uses the last baked price, height 339,999 (one below `START_HEIGHT_SLOW`), and the slow cold-start config to seed the oracle's first on-chain computation at height 340,000.
## Configuration
All parameters via `Config` with sensible defaults:
| Parameter | Default | Purpose |
|-----------|---------|---------|
| `alpha` | 2/7 | EMA decay rate (~6-block span) |
| `window_size` | 12 | Ring buffer depth in blocks |
| `search_below` / `search_above` | 12 / 11 | Search window around previous estimate (bins) |
| `shape_weight` | 0 | Shape-anchoring restoring-force weight. 0 disables it. `Config::slow()` sets 8 for the cold-start |
The output-filtering rules (1,000-sat dust floor, excluded P2TR, round-BTC exclusion) are not `Config` parameters: they are constants in the `filter` module so the indexer, per-request reconstruction, and mempool all bin identically. See [Input](#input).
Between heights 340,000 and 508,000 the oracle runs a slower cold-start configuration (`Config::slow()`: `alpha` = 0.10, ~19-block span, `window_size` = 40, `shape_weight` = 8). In the thin pre-2018 output mix the fast default octave-locks onto the round-dollar half-price pattern, so the slow EMA and the shape-anchoring restoring force resist that drift. At 508,000 `Oracle::reconfigure` switches to the defaults above (`shape_weight` back to 0), and `Config::for_height` returns the right one for any height.
## Comparison with UTXOracle
[UTXOracle](https://utxo.live/oracle/) by [@SteveSimple](https://x.com/SteveSimple) proved that BTC/USD can be derived purely from on-chain data. Both projects share the same core insight (round-dollar detection via log-scale histogram) but make different engineering choices:
| | brk_oracle | UTXOracle |
|---|---|---|
| Resolution | Per-block (~10 min); daily OHLC built downstream | Per-run consensus price + per-output intraday scatter |
| Operation | Rolling: EMA over ring buffer, updates each block | Batch: processes a full day from scratch, stateless |
| Algorithm | Single-pass stencil scoring with per-offset normalization | Multi-step: dual stencil → rough estimate → output-to-USD mapping → iterative convergence |
| Steps to compute price | 7 (filter+bin → ring insert → EMA → per-offset peaks → score → argmax+parabolic → bin→price) | 10 (filter+bin → clip → smooth round BTC → sum → normalize → cap extremes → dual-stencil slide → neighbor weight-avg → output-to-USD map → iterative central price) |
| Stencil | 19 round-USD offsets ($1 to $10k), each normalized to its own peak | 803-point Gaussian + weighted spike template targeting 17 round-USD amounts |
| Round BTC handling | Excluded from histogram entirely | Histogram bins smoothed by averaging neighbors |
| Output filtering | Per-tx OP_RETURN drop, then per-output: script type, dust threshold, round BTC | Per-tx: not coinbase, no OP_RETURN, exactly 2 outputs, ≤5 inputs, no same-day inputs, ≤500-byte witness |
| Validated from | Height 340,000 (January 2015) | Dec 15, 2023 |
| Language | Rust | Python |
| Dependencies | None (pure computation, caller provides block data) | bitcoin-cli + direct blk file reads |
| Bins per decade | 200 | 200 |
## Accuracy
Tested over 596,251 exchange-covered blocks after running the oracle from height 340,000 through height 952,314. Error is measured per block as distance from the oracle estimate to the exchange high/low range at that height. If the oracle falls within the range, the error is zero.
### Per-block
| Metric | Value |
|--------|-------|
| Median error | 0.15% |
| 95th percentile | 1.2% |
| 99th percentile | 3.4% |
| 99.9th percentile | 15.6% |
| RMSE | 0.97% |
| Max error | 33.8% |
| Bias | +0.06 bins (essentially zero) |
| Blocks > 5% error | 3,233 (0.542%) |
| Blocks > 10% error | 1,323 |
| Blocks > 20% error | 154 |
### Daily candles
Oracle daily OHLC built from per-block prices vs exchange daily OHLC:
| | Median | RMSE | Max |
|-------|--------|------|-----|
| Open | 0.24% | 1.07% | 29.1% |
| High | 0.58% | 1.47% | 27.3% |
| Low | 0.53% | 1.95% | 55.1% |
| Close | 0.27% | 1.18% | 29.2% |
### By year
| Year | Blocks | Median | RMSE | Max | >5% | >10% | >20% | Price range |
|------|--------|--------|------|-----|-----|------|------|-------------|
| 2015 | 51,249 | 0.26% | 1.67% | 33.8% | 916 | 449 | 25 | $198–$500 |
| 2016 | 54,753 | 0.33% | 0.80% | 16.9% | 150 | 33 | 0 | $351–$989 |
| 2017 | 55,959 | 0.45% | 2.05% | 28.6% | 1,527 | 606 | 67 | $0–$19,892 |
| 2018 | 54,531 | 0.18% | 1.31% | 31.6% | 411 | 207 | 62 | $3,129–$17,178 |
| 2019 | 54,272 | 0.16% | 0.59% | 17.4% | 100 | 16 | 0 | $3,338–$13,868 |
| 2020 | 53,102 | 0.10% | 0.42% | 11.6% | 61 | 3 | 0 | $3,858–$29,322 |
| 2021 | 52,733 | 0.07% | 0.47% | 14.4% | 42 | 9 | 0 | $27,678–$69,000 |
| 2022 | 53,230 | 0.07% | 0.32% | 6.8% | 10 | 0 | 0 | $15,460–$48,240 |
| 2023 | 54,032 | 0.10% | 0.25% | 6.6% | 5 | 0 | 0 | $16,490–$44,700 |
| 2024 | 53,367 | 0.10% | 0.28% | 6.7% | 7 | 0 | 0 | $38,555–$108,298 |
| 2025 | 53,113 | 0.11% | 0.25% | 5.8% | 4 | 0 | 0 | $74,409–$126,198 |
| 2026 | 5,910 | 0.10% | 0.27% | 3.2% | 0 | 0 | 0 | $60,000–$97,900 |
The oracle is only as good as the signal it reads. The largest errors cluster in the early cold-start, where thin 2015 on-chain volume gives a weaker round-dollar pattern: the 33.8% max error sits at height 341,498 (oracle ~$287 vs exchange ~$213) during the first weeks of warm-up. A second cluster sits just below the 508,000 regime switch, where the slow EMA lagged the fast early-2018 rally (~31.6% at height 507,278, oracle ~$6,685 vs exchange ~$8,800) before handing off to the fast default. The thin pre-2018 mix means 2015, 2017, and 2018 carry the bulk of the error (1.67%, 2.05%, and 1.31% RMSE). From 2019 the signal strengthens: by 2020 the oracle reaches 0.1% median accuracy, and since 2022 no block exceeds 10% error.
### Why no outlier smoothing?
Post-hoc smoothing, for example correcting any block whose price deviates more than 5% from both its neighbors, would improve the aggregate numbers. This is deliberately not done, for two reasons:
1. **Simplicity**: The oracle is a single forward pass with no lookback corrections. Adding smoothing means defining thresholds, neighbor windows, and replacement strategies, all of which add complexity for marginal gain.
2. **Finality**: Each block's price is produced once and never revised (unless the block itself is reorged). Downstream consumers can treat the oracle output as append-only. Smoothing would require retroactively changing already-published prices, breaking that property.
## Changelog
### v4
Changes from v3:
- **Modern fan-out cap**: below height 630,000 the oracle keeps the strict >100-output transaction drop introduced in v3. At and above 630,000 the cap now relaxes to 250 outputs instead of being fully lifted. This preserves dense-chain payment signal while preventing very large modern fan-outs from dominating a single EMA slot and creating a transient false round-dollar ladder.
`VERSION` is bumped to 4 so downstream consumers invalidate prices computed by an earlier algorithm.
### v3
Changes from v2:
- **Earlier start with a cold-start regime**: on-chain tracking begins at height 340,000 (January 2015) instead of 525,000, adding about 185,000 blocks of history. Below height 508,000 the oracle runs a slower EMA (`Config::slow()`, ~19-block span, window 40) paired with a shape-anchoring restoring force (`shape_weight` 8) that pulls candidate scores toward a slowly-adapted profile of the round-dollar arm shape, resisting the half-price octave drift the fast default locks onto in the thinner pre-2018 output mix. At height 508,000 it switches to the fast default via `Oracle::reconfigure`, which restores `shape_weight` to 0 and turns the force off.
- **Max-outputs filter**: a transaction with more than 100 outputs is dropped from the histogram below height 630,000. Large fan-outs (exchange sweeps, mixer payouts) are batch machinery, not round-dollar payments, and the thin 2018-2020 signal needs them removed to stay locked onto the pattern. Above 630,000 on-chain volume is dense enough that the cap removes more genuine signal than noise, so it is lifted.
- **Wider up-reach**: `search_below` raised from 9 to 12 bins. The sharp 2018 reversal candles need extra room to follow a fast move upward in price.
`VERSION` is bumped to 3 so downstream consumers invalidate prices computed by an earlier algorithm.
### v2
Changes from v1:
- **OP_RETURN filter**: every output of a transaction carrying an `OP_RETURN` is now dropped from the histogram. Such transactions are protocol machinery (cross-chain swaps, anchoring) whose payout amounts can form false round-dollar patterns. This was the trigger for the worst price glitches in v1.
- **P2WSH reactivated**: once the OP_RETURN filter removes the protocol noise, P2WSH outputs are usable round-dollar signal again, so they are no longer excluded. P2TR stays excluded.
- **Earlier start**: on-chain tracking begins at height 525,000 (May 2018) instead of 550,000, adding about 25,000 blocks of history.
`VERSION` is exposed as a crate constant so downstream consumers can invalidate prices computed by an earlier algorithm.