tempest-kv 0.0.2

Key-Value storage layer for TempestDB
Documentation
//! # Migration
//!
//! This module defines how compaction of segments interacts with the [`StorageStrategy`], enabling
//! the strategy implementor to collect metadata about segment contents and to lazily rewrite key
//! suffixes and values during compaction. It also defines the types that make up the migration API
//! surface.
//!
//! ## Motivation
//!
//! The KV layer is deliberately agnostic to the semantics of keys and values. It does not know
//! about HLC timestamps, schema versions, or any other engine-level concern. However, the engine
//! built on top of the KV layer has requirements that depend on understanding what is inside
//! segments:
//!
//! - **Schema version eviction.** TQL types are versioned. When a type's schema changes, old rows
//!   encoded under the previous schema version remain on disk until they are rewritten. A schema
//!   version is only safe to evict from the catalog once no segment contains a row encoded under
//!   it. To determine this, the engine needs to know the oldest HLC timestamp present across all
//!   live segments - the HLC watermark. Schema versions whose validity interval ends before the
//!   watermark can be safely dropped.
//!
//! - **Lazy value migration.** Rewriting all rows eagerly on a schema change would be
//!   prohibitively expensive. Instead, rows are migrated lazily: during compaction, each row is
//!   brought up to the current schema version before being written to the output segment. Over
//!   time, as compaction covers all key ranges, the HLC watermark advances and old schema versions
//!   become evictable. A forced full compaction pass can be triggered by the operator to
//!   deliberately advance the watermark.
//!
//! - **Key suffix rewriting.** Each key consists of a prefix (the logical row identity) and a
//!   suffix (typically an HLC timestamp encoding the row's version). When a row is migrated to a
//!   newer schema version, its suffix should be updated to reflect the HLC at migration time. This
//!   is safe during compaction because by the time `migrate` is called, the merging iterator has
//!   already deduplicated entries: exactly one entry per logical key (prefix) is passed to
//!   `migrate`. Since no other entry in the output shares that prefix, the suffix can be freely
//!   replaced without violating the segment's physical sort order, which is determined by prefix
//!   alone.
//!
//! ## API Design
//!
//! The migration API is expressed as extensions to [`StorageStrategy`]:
//!
//! ### Segment summaries
//!
//! Each segment has an associated [`Summary`] that is built incrementally inside `migrate` during
//! flush and compaction. The summary is stored in the manifest alongside the segment and survives
//! restarts, so the engine can reconstruct the HLC watermark on startup without re-scanning
//! segment files.
//!
//! `SegmentId` is an internal manifest concern only - it correlates summaries to files on disk and
//! handles deduplication during compaction. The engine never sees it. Instead, the manifest
//! exposes:
//!
//! ```text
//! fn summaries() -> &[Summary]
//! ```
//!
//! reflecting the current set of live segment summaries. The engine folds over this slice whenever
//! it needs the watermark:
//!
//! ```text
//! watermark = min(summary.min_hlc for all summaries)
//! ```
//!
//! When segments are removed (merged during compaction, no snapshots or pending reads hold them),
//! the manifest updates its live set. The engine recomputes the watermark from `summaries()` on
//! demand - no deletion events or per-segment bookkeeping required.
//!
//! ### Migration context
//!
//! Compaction is driven by a [`MigrationContext`] produced by a [`MigrationContextSource`] at the
//! start of each compaction job via `::context()`. The context holds engine-level state -
//! typically an `Rc` snapshot of the current catalog and the current HLC - and is passed to every
//! `migrate` call for the duration of that compaction. The KV layer never inspects the context's
//! contents; it only passes it through.
//!
//! The context source may be updated between compaction ticks to advance the HLC and pick up
//! catalog changes, allowing long-running compactions to migrate rows to progressively newer
//! schema versions without stalling.
//!
//! ### The `migrate` function
//!
//! ```text
//! fn migrate(
//!   ctx: &mut MigrationContext, summary: &mut Summary,
//!   prefix: &[u8], suffix: &[u8], value: &[u8]
//! ) -> Migrate
//! ```
//!
//! Called once per logical key (post-deduplication, post-filter) during compaction. The
//! implementor inspects the entry, updates `summary` to reflect the output entry, and returns
//! one of:
//!
//! - `Migrate::Unchanged` - write the entry as-is. The common case; no allocation.
//! - `Migrate::Modified { value, suffix }` - replace the value and/or suffix. Either field may be
//!   `None` to indicate "keep existing." The KV layer splices the new suffix onto the existing
//!   prefix when writing the output entry.
//!
//! The summary is updated inside `migrate` rather than in a separate `observe` pass. This keeps
//! the implementor in full control - they know exactly what was output and can update the summary
//! accordingly in one pass. Strategies that only need summaries and no rewriting can return
//! `Migrate::Unchanged` from the default implementation while still maintaining their summary.