durability
Crash-consistent persistence primitives: directory abstraction, record logs, generic WAL, checkpoints, and recovery.
Quick start
use MemoryDirectory;
use ;
// MemoryDirectory::arc() returns Arc<dyn Directory> directly
let dir = arc;
// Write
let mut w = new;
w.append.unwrap;
w.append.unwrap;
w.flush.unwrap;
assert_eq!;
// Recover
let records = new.replay.unwrap;
assert_eq!;
assert_eq!; // entry_id assigned by writer
WalWriter<E> and WalReader<E> are generic -- define your own entry type with
#[derive(Serialize, Deserialize)] and use WalWriter::<YourType>::new(dir).
Entry IDs are assigned by the writer and stored in the frame header, not in your payload.
Important: WalWriter::new() creates a fresh WAL and errors if segments already
exist. Use WalWriter::resume() to continue an existing WAL (it handles both empty
and non-empty directories).
For batch writes (amortize flush cost across multiple entries):
let ids = w.append_batch.unwrap;
// Single flush for both entries
For large WALs, use streaming replay to avoid collecting into a Vec:
reader.replay_each?;
Not provided (and why)
- Multi-process locking: This crate does not manage
flockor IPC locks. Single-writer-per-directory is assumed. Multiple writers silently corrupt data.WalWritercreates an advisory lockfile (wal/.lock) to catch accidental double-instantiation within a process, but this does not guarantee cross-process exclusion. - Strong consistency by default:
writecalls are buffered. Useflush_and_sync()when you need a durability barrier. - fsync failure recovery: A failed
fsyncon Linux clears dirty pages; retrying reports false success. This crate propagates IO errors as fatal. Callers should treat fsync failure as unrecoverable and restart from WAL.
What really matters (failure model)
- Crash / torn writes: partial writes at the tail (e.g. process crash mid-record).
- Corruption detection: CRC/magic/version/type mismatches are treated as errors (even in "best-effort" modes).
- Stable storage vs "reported success": unless you add explicit barriers, a successful write may still be only in OS caches.
Contract surface (what you get)
- Prefix property: Best-effort replay returns a prefix of the valid operation stream. No garbage, no reordering.
- Narrow best-effort scope: Tolerance applies only to the final WAL segment's torn tail records, and also tolerates a torn header in the final segment (crash during segment creation). Corruption in non-final segments is an error.
- Deterministic checkpoints: Checkpoint payloads are written deterministically (stable ordering).
Stable-storage durability (opt-in)
If you need "survives power loss after success", add explicit barriers:
WalWriter::flush_and_sync()/RecordLogWriter::flush_and_sync()durability::storage::sync_file(dir, path)-- fsync the filedurability::storage::sync_parent_dir(dir, path)-- sync the parent directory (needed for durable create/rename)DurableDirectorytrait providesatomic_write_durable/atomic_rename_durable
If the backend cannot map paths to the OS filesystem (Directory::file_path() returns None), these operations return NotSupported.
Checkpoint publishing + WAL truncation
To truncate old WAL segments safely:
- Write a durable checkpoint
- Append
WalEntry::Checkpointto the WAL and make it durable - Only then delete WAL segments covered by the checkpoint
Use CheckpointPublisher for this pattern. After truncation, recovery should
start from the latest checkpoint marker (see RecoveryManager::recover_latest).
Modules at a glance
storage:Directoryabstraction +FsDirectory/MemoryDirectory+ sync helpers.recordlog: append-only log with CRC framing + strict/best-effort replay.walog: generic multi-segment WAL (WalWriter<E>/WalReader<E>) with strict/best-effort replay andresumerepair. IncludesWalEntryfor segment-index use cases.checkpoint: CRC-validated snapshot files (postcard payloads).recover: genericrecover_with_wal()for any checkpoint + WAL entry types, plus segment-specificRecoveryManager.publish: crash-safe checkpoint publish + WAL truncation.
Running
- Tests:
cargo test - Heavier property runs:
PROPTEST_CASES=512 cargo test --test prop_wal_resume - Benches:
cargo bench
Fuzzing (opt-in)
Property tests cover semantic invariants; fuzzing covers "never panic on weird bytes".
- Install:
cargo install cargo-fuzz - Run (see
fuzz/):cargo fuzz run fuzz_wal_entry_decodecargo fuzz run fuzz_wal_dir_replaycargo fuzz run fuzz_checkpoint_readcargo fuzz run fuzz_recordlog_read