zaino-state 0.3.0

A mempool and chain-fetching service built on top of zebra's ReadStateService and TrustedChainSync.
Documentation
Zaino Finalised-State Database Changelog
=======================================

Format
------
One entry per database version bump (major / minor / patch). Keep entries concise and factual.

Entry template:

--------------------------------------------------------------------------------
DB VERSION vX.Y.Z (from vA.B.C)
Date: YYYY-MM-DD
--------------------------------------------------------------------------------

Summary
- <1–3 bullets describing intent of the change>

On-disk schema
- Layout:
  - <directory / file layout changes>
- Tables:
  - Added: <...>
  - Removed: <...>
  - Renamed: <old -> new>
- Encoding:
  - Keys: <what changed, if anything>
  - Values: <what changed, if anything>
  - Checksums / validation: <what changed, if anything>
- Invariants:
  - <new or changed integrity constraints>

API / capabilities
- Capability changes:
  - Added: <...>
  - Removed: <...>
  - Changed: <...>
- Public surface changes:
  - Added: <methods / behaviors>
  - Removed: <methods / behaviors>
  - Changed: <semantic changes, error mapping changes>

Migration
- Strategy: <in-place | shadow build | rebuild>
- Backfill: <what gets rebuilt and how broadly>
- Completion criteria: <how we decide migration is done>
- Failure handling: <rollback / retry behavior>

Bug Fixes / Optimisations

--------------------------------------------------------------------------------
DB VERSION v1.0.0 (from v0.0.0)
Date: 2025-08-13
--------------------------------------------------------------------------------

Summary
- Replace legacy v0 schema with versioned v1 schema and expanded indices / query surface.
- Introduce stronger integrity checks and on-demand validation for v1 read paths.
- Keep compact block retrieval available (compatibility surface).

On-disk schema
- Layout:
  - Move to per-network version directory layout: <base>/<network>/v1/
  - VERSION_DIRS begins at ["v1"] new versions append, no gaps.
- Tables:
  - Added (v1): headers, txids, transparent, sapling, orchard, commitment_tree_data, heights (hash->height),
    plus v1 indices for tx locations, spent outpoints, and transparent address history.
  - Removed / superseded (v0): legacy compact-block-streamer oriented storage layout.
- Encoding:
  - v1 values are stored as checksum-protected `StoredEntryVar<T>` / `StoredEntryFixed<T>` entries.
  - Canonical key bytes are used for checksum verification via `verify(key)`.
- Invariants (v1 validation enforces):
  - Per-table checksum verification for all per-block tables.
  - Chain continuity: header parent hash at height h matches stored hash at h-1.
  - Merkle consistency: header merkle root matches computed root from stored txid list.
  - Index consistency:
    - hash->height mapping must match the queried height.
    - spent + addr history records must exist and match for transparent inputs/outputs.

API / capabilities
- Capability changes:
  - v0: READ_CORE | WRITE_CORE | COMPACT_BLOCK_EXT
  - v1: Capability::LATEST (block core/transparent/shielded, indexed block, transparent history, etc.)
- Public surface changes:
  - Added (v1-only; FeatureUnavailable on v0):
    - BlockCoreExt: header/txids/range fetch, txid<->location lookup
    - BlockTransparentExt: per-tx and per-block transparent access + ranges
    - BlockShieldedExt: sapling/orchard per-tx and per-block access + ranges, commitment tree data (+ ranges)
    - IndexedBlockExt: indexed block retrieval
    - TransparentHistExt: addr records, range queries, balance/utxos, outpoint spender(s)
  - Preserved:
    - CompactBlockExt remains available for both v0 and v1.

Migration
- Strategy: shadow build + promotion (no in-place transformation of v0).
- Backfill: rebuild all v1 tables/indices by ingesting chain data.
- Completion criteria:
  - metadata indicates migrated/ready, and required tables exist through the tip.
  - validation succeeds for the contiguous best chain range as built.
- Failure handling:
  - do not promote partially built v1; continue using v0 if present; rebuild v1 on retry.

Bug Fixes / Optimisations
- Complete DB rework
--------------------------------------------------------------------------------
DB VERSION v1.0.0 (RC Bug Fixes)
--------------------------------------------------------------------------------

Summary
- Minor version bump to reflect updated compact block API contract (streaming + pool filtering semantics).
- No schema or encoding changes; metadata-only migration updates persisted DB version marker.

On-disk schema
- Layout:
  - No changes.
- Tables:
  - Added: None.
  - Removed: None.
  - Renamed: None.
- Encoding:
  - Keys: No changes.
  - Values: No changes.
  - Checksums / validation: No changes.
- Invariants:
  - No changes.

API / capabilities
- Capability changes:
  - Added: None.
  - Removed: None.
  - Changed:
    - COMPACT_BLOCK_EXT contract updated for v1 backends:
      - get_compact_block(...) now takes a PoolTypeFilter, which selects which pool data is materialized into the returned compact block.
      - get_compact_block_stream(...) added.

- Public surface changes:
  - Added:
    - CompactBlockExt::get_compact_block_stream(start_height, end_height, pool_types: PoolTypeFilter).
  - Removed: None.
  - Changed:
    - CompactBlockExt::get_compact_block(height, pool_types: PoolTypeFilter) signature updated.
    - Compact block contents are now filtered by PoolTypeFilter, and may include transparent transaction data (vin/vout) when selected.

Bug Fixes / Optimisations
- Added safety check for idempotent DB writes
- Updated 'fix_addr_hist_records_by_addr_and_index_blocking' to take and reuse an lmdb ro transaction, improving initial sync performance.

--------------------------------------------------------------------------------
DB VERSION v1.0.0 (from v1.1.0)
Date: 2026-01-27
--------------------------------------------------------------------------------

Summary
- BlockHeaderData v2 introduced (internally using new BlockIndex::V2 format); because relevant tables (notably `headers` / `BlockHeaderData`) use
   variable-length encodings existing tables are updated in-place: DB values may contain either v1 or v2 `BlockHeaderData` entries.
- Recorded on-disk schema text was clarified; migration refreshes persisted `DbMetadata.schema_hash`
   so the metadata matches the repository's schema contract.

On-disk schema
- Layout:
  - Updated [`BlockHeaderData`] table by introducing [`BlockHeaderData::V2`] and internally [`BlockIndex::V2`], this table may now hold either V1 or V2
     [`BlockHeaderData`] structs, with serde handled internally.
- Tables:
  - Added: None.
  - Removed: None.
  - Renamed: None.
- Encoding:
  - Keys: No changes.
  - Values: Introduced `[BlockHeaderData::V2]`.
  - Checksums / validation: No changes.
- Invariants:
  - No changes.

--------------------------------------------------------------------------------
DB VERSION v1.2.0 (from v1.1.0)
Date: 2026-06-11
--------------------------------------------------------------------------------

Summary
- Promote the `spent` outpoint index to core finalised-state data.
- Add a finalised txout-set accumulator (`tx_out_set_info_accumulator`)
  maintaining the data needed to serve `gettxoutsetinfo` directly from the
  indexer.
- Add a reverse transaction-id index (`txid_location`, `txid -> TxLocation`)
  so previous-output resolution is an O(log n) point lookup instead of a full
  scan of the height-keyed `txids` table. This fixes a near-quadratic slowdown
  in both the migration backfill and clean-sync write path.
- Backfill the new structures from existing per-block transparent transaction
  data via a single in-place, three-stage migration (`txid_location`, then
  `spent`, then a bulk txout-set accumulator rebuild).
- Add resumable in-place migration progress tracking using temporary metadata
  entries (one per backfill stage).
- The random-keyed `spent` / `txid_location` indexes are written in sorted
  batches (sequential B-tree sweep instead of a random fault per insert once the
  DB exceeds RAM), and the txout-set accumulator is rebuilt in bulk from the
  finalised tables rather than maintained per block.

On-disk schema
- Layout:
  - No directory layout changes.
- Tables:
  - Added: `spent` is now a core v1 table rather than an experimental transparent-address-history table.
  - Added: `tx_out_set_info_accumulator` — singleton table holding the
    finalised transparent UTXO-set summary
    (LMDB database name: `tx_out_set_info_accumulator_1_2_0`,
    singleton key: ASCII `"tx_out_set_info_accumulator"`).
  - Added: `txid_location` — reverse transaction-id index mapping each
    transaction id to its on-chain `TxLocation`
    (LMDB database name: `txid_location_1_0_0`).
  - Removed: None.
  - Renamed: None.
- Encoding:
  - Keys: No changes to `Outpoint` encoding. `txid_location` is keyed by the
    32-byte transaction id (internal byte order).
  - Values: `spent` stores `StoredEntryFixed<TxLocation>` values.
    `txid_location` stores `StoredEntryFixed<TxLocation>` values
    (checksum-protected against the txid key).
    `tx_out_set_info_accumulator` stores
    `StoredEntryFixed<FinalisedTxOutSetInfoAccumulator>` whose body is
    `LE(u64) transactions || LE(u64) transaction_outputs ||
     LE(u64) bytes_serialized || [32] hash_serialized ||
     LE(u64) total_zatoshis` (64 bytes).
  - Checksums / validation:
    - `spent` entries are checksum-protected using the encoded `Outpoint` key.
    - `txid_location` entries are checksum-protected using the txid key.
    - The accumulator entry is checksum-protected using its singleton key.
    - The height the accumulator was last fully built to is recorded in the
      metadata DB under `_tx_out_set_accumulator_built_height` as
      `StoredEntryFixed<Height>` — a freshness watermark letting readers detect
      a stale accumulator after a sync was interrupted before its bulk rebuild.
    - Migration progress is temporarily stored as `StoredEntryFixed<Height>` in
      the metadata DB under `_migration_spent_progress_1_2_0_next_height`
      (Stage B) and `_migration_txid_location_progress_1_2_0_next_height`
      (Stage A).
- Invariants:
  - For every non-null transparent input in finalised-state block data, `spent[Outpoint]` must exist and point to the spending transaction’s `TxLocation`.
  - For every transaction in finalised-state block data, `txid_location[txid]`
    must exist and resolve to that transaction’s `TxLocation`.
  - Existing `spent` / `txid_location` entries encountered during migration must decode, verify, and match the expected `TxLocation`.
  - The accumulator excludes provably-unspendable transparent outputs
    (anything whose `ScriptType` is not `P2PKH` or `P2SH` — matches zcashd's
    `IsUnspendable()` view of the UTXO set: OP_RETURN, oversized scripts,
    etc.). `bytes_serialized == transaction_outputs * 65` by construction.
  - `hash_serialized` is the XOR over all currently-unspent transparent
    outputs of `BLAKE2b-256(b"ZcashTxOutSet___" || prev_txid || LE(u32)
    vout || LE(u64) value || script_hash[20] || u8 script_type)`. Order-
    independent and self-inverse under add/remove. Not byte-equal to
    zcashd's `hash_serialized`.

API / capabilities
- Capability changes:
  - Added: core availability of spent-outpoint lookup data.
  - Removed: None.
  - Changed:
    - Spent-outpoint indexing is no longer dependent on transparent address-history support.
    - `BlockTransparentExt::get_previous_output` is now part of the trait
      (formerly available only behind
      `transparent_address_history_experimental`).
    - `TransparentHistExt::get_tx_out_set_info_accumulator` returns the new
      `FinalisedTxOutSetInfoAccumulator` value.
- Public surface changes:
  - Added:
    - `DbReader::get_previous_output(outpoint) -> TxOutCompact` — the
      read-only entry point for finalised previous-output lookups.
    - `DbReader::get_tx_out_set_info_accumulator() -> FinalisedTxOutSetInfoAccumulator`.
  - Removed: None.
  - Changed:
    - Existing spent/outpoint-spender functionality can be backed by the core `spent` table.

Migration
- Strategy: in-place index backfill, run as three sequential stages
  (single migration step, re-entrant).
- Backfill:
  - Stage A (`txid_location`): scans the existing `txids` table from genesis
    through the current finalised DB tip and writes
    `txid -> StoredEntryFixed<TxLocation>`.
  - Stage B (`spent`): iterates existing transparent block data and writes
    `Outpoint -> StoredEntryFixed<TxLocation>` for each non-null transparent
    input. Entries are buffered and inserted in sorted key order in batches
    (bounded by `storage.database.sync_write_batch_bytes`); each batch commits
    its `spent` entries together with the Stage B progress watermark in one
    transaction.
  - Stage C (`tx_out_set_info_accumulator`): rebuilds the accumulator in bulk
    from the finalised `transparent` + `spent` tables via sequential scans
    (skipping NonStandard outputs) and overwrites the singleton. It never trusts
    an existing accumulator — so a partially-run prior (2-stage) migration is
    recomputed correctly rather than corrupted — and is idempotent / re-runnable,
    so it needs no per-height progress key.
- 0.4.0-alpha.1 compatibility (temporary):
  - A cache built by 0.4.0-alpha.1 is recorded at v1.2.0 but has an empty
    `txid_location` index. On open, a non-empty database at version >= 1.2.0
    with an empty `txid_location` table has its `spent` table cleared and its
    recorded version rolled back to 1.1.0 so this migration rebuilds the
    indices in place (rather than forcing a full rebuild from the validator).
    This shim is to be removed once 0.4.0 ships.
- Completion criteria:
  - All heights through the current finalised DB tip have been processed by both stages.
  - Migration status reaches `Complete`.
  - Both temporary migration progress keys are deleted.
  - `DbMetadata.version` is advanced to v1.2.0 and `migration_status` is reset to `Empty`.
- Failure handling:
  - Stages A and B resume from their own temporary metadata progress heights.
  - `spent` entries and the Stage B progress watermark commit together per batch;
    `txid_location` entries and their Stage A progress likewise. The watermark is
    never advanced past committed data, so a crash resumes from the last committed
    batch (re-doing only uncommitted heights).
  - Stage C is a separate idempotent bulk rebuild, so on resume the accumulator
    is recomputed from the (already-built) finalised tables rather than tracked
    per height.
  - Existing matching `spent` / `txid_location` entries are accepted after checksum and `TxLocation` verification.
  - Existing conflicting or corrupt entries fail the migration.

Bug Fixes / Optimisations
- Reverse txid lookups (`find_txid_index_blocking`, and therefore
  `get_tx_location` / `get_previous_output`) are now O(log n) point lookups on
  `txid_location` instead of a full cursor scan of `txids`. This removes a
  near-quadratic cost that made the v1.1.0 -> v1.2.0 migration appear to hang
  on large caches and progressively slowed clean sync.
- The v1.1.0 -> v1.2.0 migration now logs per-stage start/completion and
  periodic progress (height / db tip / elapsed).
- `write_block` no longer issues two redundant `env.sync(true)` calls around
  per-block validation; the durable `txn.commit()` already fsyncs, so crash
  safety is unchanged.
- The txout-set accumulator is no longer maintained per block during bulk sync /
  migration: it is deferred and rebuilt once at the tip, removing an unbounded
  fan-out of random `spent` reads per block that stalled sync around sandblast
  height. Single-block appends still maintain it incrementally.
- Block validation is off the write hot path: writes do cheap in-memory
  parent-hash + merkle-root checks and advance `validated_tip` directly; the full
  read-back validation runs only at startup.
- Startup validation scans in ascending height order (was block-hash order via
  `heights`), so `validated_tip` advances monotonically and startup avoids
  random-access cache thrash.
- The random-keyed `spent` / `txid_location` writes (clean sync and migration)
  are batched and inserted in sorted key order, turning per-insert random B-tree
  faults into a sequential sweep once the DB exceeds RAM. Tunable via
  `storage.database.sync_write_batch_bytes` (default 4 GiB).

Bug Fixes / Optimisations
- Avoids a shadow rebuild by deriving the new core `spent` index from existing transparent transaction data.
- Avoids temporary named LMDB databases by storing migration progress as a temporary metadata entry.

--------------------------------------------------------------------------------
(append new entries below)
--------------------------------------------------------------------------------