# Production Readiness & FASTER Parity Guide
This document tracks the gaps and the implementation plan to move `oxifaster` from a research-grade port to a production-ready system, with an explicit parity mapping against Microsoft FASTER.
Scope notes:
- “FASTER” has multiple implementations and feature sets (C# and C++ variants). This guide focuses on the typical FASTER surface area: `FasterKV` + HybridLog + sessions + `Pending + CompletePending` flow + checkpoint/recovery + compaction, plus the standalone `FasterLog` concept.
- Production readiness here means: crash-consistent durability, well-defined API contracts, safe and documented type constraints, multi-thread correctness under checkpoint/compaction, observability, and CI gates that catch regressions.
## Tooling Gates (Must Stay Green)
- MSRV: Rust 1.92.0 (`Cargo.toml` `rust-version`)
- Local preflight (before merging changes): `./scripts/check-fmt.sh`, `./scripts/check-clippy.sh`, `./scripts/check-test.sh`
- CI gates (GitHub Actions): `./scripts/check-fmt.sh`, `./scripts/check-clippy.sh`, `./scripts/check-build.sh`, `./scripts/check-test.sh`, `./scripts/check-test-all-features.sh`, `cargo package`, and bench compile check
## Current State Summary (Repository Reality)
Implemented building blocks already in-tree:
- Core primitives: 48-bit `Address`, record header (`RecordInfo`), epoch protection (`LightEpoch`), in-memory hash index, hybrid log allocator, device abstraction (file, null, Linux `io_uring` with fallback)
- Store surface: `FasterKv` + `Session`/`AsyncSession`, CRUD + RMW, read cache, compaction, index growth API, incremental checkpoint (delta log) artifacts
- Tests: integration tests for CRUD, checkpoint/recovery, compaction, cold index, read cache, incremental checkpoint, pending I/O, scanning, async session, statistics
- CI: fmt/clippy/test/all-features/MSRV/coverage/package/bench compile check
Where the implementation is still not production-equivalent to FASTER is mostly about correctness semantics (CPR integration), durability guarantees, and type-safety boundaries for persistence.
## FASTER Parity Matrix (High-Level)
Legend:
- Implemented: present with production-grade semantics and tests
- Partial: present but semantics incomplete / limited / not hardened
- Missing: not present in a meaningful way
### FasterKV Core
- CRUD + RMW: Implemented (sync session); tests present
- Session model (per-thread session, serial numbers): Implemented, but CPR/version integration is Partial (see “Critical Gaps”)
- Pending + CompletePending flow: Partial
- Background disk readback exists (`PendingIoManager`), but the model is limited (see “Type Model” and async integration notes)
- Async API: Partial
- `AsyncSession` exists, but is currently a cooperative wrapper rather than a fully integrated async completion model (no per-request waker/notification)
### Hybrid Log & Tiering
- In-memory ring with mutable/read-only/on-disk regions: Partial
- Region boundaries exist, but the automatic background flushing and stable “on-disk” semantics are not fully wired (see “Critical Gaps”)
- Read cache: Implemented (feature-level); tests present
- Compaction: Implemented (manual + auto); concurrency hardening still needs validation under real CPR semantics
### Checkpoint & Recovery (CPR)
- Checkpoint artifacts: Implemented (metadata + index snapshot + log snapshot + incremental delta files)
- CPR protocol (concurrent prefix semantics): Partial
- The state machine exists, but it is not integrated into the normal operation path and currently behaves more like a single-threaded checkpoint driver
- Incremental checkpoint (delta log): Partial
- Artifacts exist; correctness under concurrent updates and crash scenarios needs dedicated verification
- Recovery: Partial
- Loads snapshots/deltas; missing corruption hardening (format versioning/checksums) and rigorous crash-consistency tests
### FasterLog (Standalone Log)
- Append/commit/scan surface: Partial
- Current `commit()` only advances an in-memory pointer; it does not make data durable on the `StorageDevice`
- Durable commit groups, recovery, file format hardening: Missing
### Devices & Platform
- File-backed device: Implemented
- Linux `io_uring`: Partial (feature-gated); requires correctness/performance hardening and feature-matrix testing
- Non-Linux fallback: Implemented (portable backend), but needs stress testing under load
## Critical Gaps Blocking “Production-Ready”
This is the short list of issues that must be resolved before treating the system as production-grade.
### 1) Persistence Type Model Must Be Explicit (Workstream A)
Status: Implemented.
The persistence boundary is now explicit and safe-by-default:
- The store is parameterized by `K: PersistKey` and `V: PersistValue`, which bind each type to a `KeyCodec`/`ValueCodec`.
- Default mode is “POD-bytes” via `bytemuck::Pod` (`BlittableCodec<T>`).
- VarLen mode is supported via opt-in wrappers:
- `RawBytes`: raw bytes (no SpanByte envelope)
- `Utf8`: UTF-8 bytes (no SpanByte envelope)
- `Bincode<T>`: serde+bincode payload stored in a SpanByte envelope (FASTER-style)
- Stable key hashing is xxHash-based (xxh3 default; xxh64 optional via feature).
Operational changes enabled by this model:
- Disk readback supports both fixed and variable-length records; non-POD types must opt into a codec (no “Pending forever” failure mode).
- A bytes-view read API exists: `Session::read_view(&EpochGuard, &K) -> RecordView`, returning borrowed encoded bytes.
### 2) CPR Is Not Integrated into the Operational Fast Path
The checkpoint state machine exists, but the “FASTER-style cooperative protocol” is not wired into `Session::refresh()` / operation entry points.
Symptoms in current code structure:
- `ThreadContext.version` exists but is not updated in normal operations.
- Checkpoint phases such as `PrepIndexChkpt`, `WaitPending`, and the “threads help advance state machine” behavior are stubbed/no-op in the store checkpoint driver.
Production implications:
- Checkpoint artifacts can be inconsistent under concurrent operations.
- Incremental checkpoints are especially sensitive: the “prefix” guarantee needs clear boundaries and tests.
### 3) Hybrid Log Durability & On-Disk Semantics Are Not Fully Implemented
Although `flush_until` exists, critical pieces are still incomplete:
- Page transitions (mutable -> read-only -> on-disk) do not automatically trigger real background flush and eviction.
- `flush_until` writes pages but does not clearly define and enforce “durable” vs “buffered” semantics (`StorageDevice::flush()` is not part of the allocator flush path).
Production implications:
- “Cold data on disk” is not a reliable invariant.
- Crash-consistent durability claims cannot be made without explicit fsync/flush semantics.
### 4) FasterLog Is Not Yet a Persistent Log
The current `FasterLog` implementation is primarily an in-memory ring with metadata pointers. It needs real `StorageDevice` integration for durable commit and recovery.
## Code Quality & Engineering Gaps
### Unsafe & Invariants
- The codebase contains a large amount of `unsafe` usage (currently ~180 occurrences). Many blocks have rationale, but production readiness requires consistent, auditable invariants:
- Every `unsafe` block should have a localized “Safety:” justification that states required invariants and what enforces them.
- A small set of core invariants should be documented centrally (epoch ownership, page lifetime, address region rules, record layout rules).
### Panics and Error Modeling
- Library code should minimize panics under valid API use. Remaining `expect()`/panic sites should be classified:
- “Programmer error” (documented precondition) vs “runtime error” (must return `Result`/`Status`).
### Config Coherence
- Configuration fields must reflect actual behavior:
- `FasterKvConfig.mutable_fraction` is currently not reflected in `HybridLogConfig` (the allocator uses a fixed `memory_pages / 4` mutable region).
### Observability
- `tracing` is a dependency, but end-to-end structured tracing is not yet a first-class observability story.
- Statistics exist; align them with feature flags and ensure output is suitable for production (no direct stdout/stderr in library paths).
## Testing / CI Gaps (What to Add)
Existing integration tests are strong for “happy path” functionality; production readiness needs adversarial and semantic tests:
- CPR semantics tests:
- Multi-thread “checkpoint while writes/reads ongoing” with invariants checked (prefix guarantee, session serial correctness)
- “help advance state machine” behavior in the presence of stalled threads
- Crash consistency tests:
- Kill -9 style termination during checkpoint / flush / incremental delta write; verify recovery behavior and corruption detection
- Fault injection device (short write, partial read, delayed flush, IO error) for deterministic coverage
- Concurrency correctness:
- Loom tests for key lock-free structures / state transitions (or a minimal set around checkpoint + index growth)
- Miri runs (where feasible) for UB detection in core record/page primitives
- Fuzzing:
- Fuzz delta log entry parsing, snapshot parsing, metadata parsing
- Platform matrix:
- Linux/macOS/Windows compilation and core tests
- Linux `io_uring` workflow that actually exercises the backend
- Performance regression gates:
- Baseline YCSB + a minimal “ops/sec + p99 latency” regression guard (even if only as a periodic job)
## Documentation Gaps (What to Write/Update)
Before production adoption, the following must be explicit and discoverable:
- Type model: what is safe to persist/recover; what is “blittable-only”; how varlen types are supported (or not yet supported)
- Durability contract: what operations are durable, when, and what `flush`/fsync means across devices
- Checkpoint operational guide: checkpoint lifecycle, incremental chains, retention/cleanup, validation tooling
- On-disk formats: snapshot/delta/index formats with versioning and corruption detection strategy
- Performance tuning guide: sizing knobs, page sizes, cache sizing, compaction tuning
## Implementation Plan (Detailed, Production-Oriented)
This plan is organized into workstreams with clear deliverables and acceptance criteria.
### Workstream A: Define and Enforce a Safe Persistence Type Model
Goal: make persistence semantics correct and explicit before expanding features.
Status: Implemented (type model + record formats + tests + CI gates).
Implementation notes (current codebase):
- Codecs and constraints live under `src/codec/` (`PersistKey`/`PersistValue`, `KeyCodec`/`ValueCodec`).
- Record layouts are unified in `src/store/record_format.rs` (fixed + varlen records, `RecordView<'a>`).
- Stable hashing uses xxHash (`hash-xxh3` default; `hash-xxh64` optional feature).
- Zero-copy bytes-view read: `Session::read_view(&EpochGuard, &K) -> Result<Option<RecordView>, Status>`.
- Coverage:
- `tests/read_view.rs` (fixed + varlen view semantics)
- `tests/hash.rs` (hash determinism smoke test)
- `tests/varlen.rs` (varlen integration via `Utf8` / `RawBytes`)
Deliverables:
1. Define two supported persistence modes:
- Blittable/POD mode: `K` and `V` must be safe for byte-wise persistence and recovery
- VarLen/serialized mode: explicit serialization into the log format (SpanByte-style), no raw pointers on disk
2. Enforce the rules at compile-time where possible (preferred), otherwise at runtime with clear errors.
3. Define destruction/reclamation semantics for in-memory records (no silent unbounded leaks for non-POD types).
Acceptance criteria:
- The public docs state the type constraints clearly.
- Non-POD types cannot silently “appear to work” in persistent mode; they should be rejected or routed through serialization.
- Recovery tests cover both success and expected-failure cases.
### Workstream B: Wire CPR into the Operational Path (True Cooperative Checkpointing)
Goal: implement the FASTER-style “threads cooperate to progress checkpoint/recovery” semantics.
Key tasks:
1. Implement a `refresh()`/operation-entry hook that:
- Observes `system_state`
- Updates `ThreadContext.version` appropriately
- Helps advance checkpoint phases when required
2. Implement real behaviors for checkpoint phases:
- `PrepIndexChkpt`: thread rendezvous / safe points
- `WaitPending`: ensure pending operations are drained consistently
- `WaitFlush`: ensure durability boundary is met before finalizing metadata
3. Ensure checkpoint metadata reflects a coherent view:
- Versioning is consistent across index/log metadata and record headers
- Session serial numbers are meaningful and used during recovery validation
Acceptance criteria:
- Multi-thread checkpoint-under-load tests pass consistently.
- Checkpoint artifacts are validated on recovery; corrupted/incomplete checkpoints are detected and rejected.
### Workstream C: Make Hybrid Log “On-Disk Region” Real
Goal: background flushing and eviction consistent with FASTER hybrid log semantics.
Key tasks:
1. Implement the read-only transition logic to actually schedule flushes.
2. Track page flush status correctly (use `PageInfo` / flush status) and advance `safe_read_only_address` only after flush completes.
3. Define durability:
- When a page is considered persisted
- When `StorageDevice::flush()` is required and who calls it
4. Provide a controlled API for forcing flush and shifting head (needed for tests, compaction, ops tooling).
Acceptance criteria:
- “read from disk” path is exercised in tests without manual hacks.
- Crash consistency tests around flush + checkpoint behave as documented.
### Workstream D: Implement Durable FasterLog
Goal: bring `FasterLog` to parity with the “persistent log” concept, not just an in-memory ring.
Key tasks:
1. Implement append that writes to device pages (or a buffered staging area that flushes on commit).
2. Implement `commit()` that guarantees durability (including `StorageDevice::flush()` semantics).
3. Implement recovery: load metadata, rebuild tail/commit pointers, and support scanning committed entries after restart.
4. Add corruption detection and format versioning for log files.
Acceptance criteria:
- End-to-end tests: append -> commit -> drop -> reopen -> scan/read -> data matches.
### Workstream E: Observability, Configuration, and Operational Tooling
Goal: make the system operable in production environments.
Key tasks:
1. Structured tracing:
- Add `tracing` spans/events around checkpoint, flush, compaction, growth, recovery, pending I/O
2. Metrics:
- Decide on an export model (pull/push) and keep the internal collector consistent
3. Configuration:
- Add a TOML config file loader (and a stable schema) that maps to `FasterKvConfig`/device config/compaction config/cache config
4. Operational tooling:
- Checkpoint listing/validation CLI (optional but high leverage for production ops)
Acceptance criteria:
- A minimal “ops story” exists: configure, run, checkpoint, validate, recover, observe.
### Workstream F: CI Expansion (Regressions Must Be Caught Automatically)
Goal: raise confidence with adversarial tests and platform coverage.
Key tasks:
- Add workflows for:
- Cross-platform build/test matrix
- `io_uring` backend exercise on Linux
- Miri (best-effort subset), sanitizer (best-effort), loom (targeted)
- Fuzz jobs (nightly/cron)
- Performance smoke regression (cron)
Acceptance criteria:
- A PR that breaks durability semantics, checkpoint correctness, or introduces UB has a high chance of being caught pre-merge.
## Suggested Priority Order (Recommended)
1. Workstream A (type model) + Workstream B (CPR integration): establishes correctness boundaries
2. Workstream C (hybrid log on-disk semantics): makes “bigger than memory” real
3. Workstream D (durable FasterLog): completes the second flagship component
4. Workstreams E/F: operability and sustained quality