- Append-only durable log of arbitrary byte records
- Lock-free multi-writer append — many threads append at once with no global lock
- Group commit — concurrent
synccalls coalesce into one fsync, amortising the durability cost - Segment rotation — optionally stripe the log across bounded segment files for bounded recovery and archival
- Explicit durability barriers —
appendis in-memory-fast;syncis the durability point - Platform-correct flush —
fdatasyncon Linux,FlushFileBufferson Windows,fcntl(F_FULLFSYNC)on macOS - Torn-write detection — a CRC32C checksum per record; recovery stops at the first damaged record
- Self-healing recovery — a torn tail from a crash mid-append is truncated on open, leaving a clean boundary
- Fuzz-hardened recovery — arbitrary bytes never panic or over-allocate; a continuous
cargo-fuzzharness proves it - Recovery policies — stop at the first damaged record, or skip past it for forensic partial recovery
- LSN seeking & truncation — replay from any LSN (
iter_from); drop everything after one (truncate_after) for compaction - Iterator-based replay — walk the log forward to rebuild state
- Typed records (optional) — serialise any value via
pack-iobehind a feature; the byte-record API is unchanged when off - Pluggable storage backend — file-backed by default; injectable for in-memory testing and custom stores
The durability contract
Two operations, two distinct guarantees. Confusing them is the single most common way to lose data with a WAL, so wal-db keeps them explicit:
appendreturns when the record is in the operating system's page cache. A crash afterappendbut beforesyncmay lose that record.syncreturns only when every record appended before it is on stable storage and will survive a power loss.
That flush is not the same call on every platform, and getting it wrong is silent:
| Platform | Durability call |
|---|---|
| Linux | fdatasync |
| Windows | FlushFileBuffers |
| macOS | fcntl(F_FULLFSYNC) — not plain fsync, which leaves data in the drive's write cache |
Installation
[]
= "0.4"
Quick Start
use Wal;
#
// Open (or create) the log.
let wal = open?;
// Append returns once the record is in the OS page cache. It does not flush.
let lsn = wal.append?;
// Sync is the durability barrier: it returns once the record is on stable storage.
wal.sync?;
// On restart, replay the log from the start to rebuild state.
for entry in wal.iter?
Recovery
Every record carries a CRC32C checksum over its own bytes. On open, the log scans forward and stops at the first record that is incomplete or fails its checksum — a torn write left by a crash mid-append — and truncates that tail. The records before it are kept; the next append continues from a clean boundary with no gap in the sequence numbers. A corrupt length prefix can never trigger a wild allocation: lengths are validated against the configured maximum before a single payload byte is read.
use Wal;
#
Configuration
Tunables live on WalConfig, a builder passed to Wal::open_with:
use ;
#
Concurrency and group commit
Wal is built for many writers. append is lock-free: each call reserves its byte range with a single atomic step — that range's start offset is the record's LSN — then writes its record without blocking the others. Share one Wal behind an Arc and append from every thread.
Durability is where threads cooperate. When several call sync at once they coalesce into a single fsync — group commit — so the cost of making data durable is amortised across everyone committing together rather than paid N times. append_and_sync does an append and a group-commit-aware sync in one call:
use Arc;
use thread;
use ;
#
LSNs are byte offsets. The LSN returned by
appendis the record's position in the log — monotonic and unique, but not consecutive. The first record is0; the next sits at its end. This is what lets the append path reserve with a single atomic and never reorder. Seedocs/ON_DISK_FORMAT.md.
Custom backends
Wal::open uses the file-backed FileStore. Any type implementing the WalStore trait can stand in — an in-memory store for tests, or an alternative storage layer. The crate ships MemStore for the in-memory case:
use ;
#
Segments
By default a log is a single file. For bounded recovery time and archival, stripe it across fixed-size segment files in a directory instead — Wal::open_segmented. The log stays one continuous byte stream; records span segment boundaries freely (the same scheme PostgreSQL uses), so nothing about the API or the records changes:
use Wal;
#
Typed records
By default a record is bytes. With the pack-io feature, a record can be any type that derives Serialize/Deserialize — append_typed writes it, Record::decode reads it back. The derives come from the re-exported wal_db::pack_io, so no extra dependency is needed.
[]
= { = "0.4", = ["pack-io"] }
use ;
use ;
#
Recovery policies
Wal::open always truncates a torn tail so the append boundary is clean. For corruption inside an already-recovered log — bit rot, say — a WalConfig recovery policy controls how iteration reacts:
use ;
#
Seeking and compaction
An LSN is a byte offset, so replaying from a checkpoint is O(1) — iter_from starts at the LSN instead of scanning from the beginning. When a consumer has durably applied the log up to some point, truncate_after drops everything after that record, the durable building block of compaction:
use Wal;
#
Async
The core is synchronous on purpose — a WAL's calls map to blocking syscalls (write, fsync), and a runtime is the consumer's choice, not the library's. From an async context, offload to a blocking pool:
let wal = wal.clone; // Arc<Wal>
let lsn = spawn_blocking.await??;
Performance
Numbers from the criterion suite on the development machine, 256-byte records. They are honest measurements, not marketing — the commit figures are bounded by this machine's fsync latency and scale with faster storage and more writers. Full detail and method in docs/BENCHMARKS.md.
| Benchmark | Result | What it measures |
|---|---|---|
| LSN reservation | ~4 ns | the single atomic that allocates an LSN and reserves a byte range |
append/single |
~105 ns | the lock-free hot path: reserve, frame, write one record to memory, no syscall |
append/multi (8, file) |
~160 K/s | file-backed multi-writer append — syscall-bound (one pwrite each) |
commit/group (8 writers) |
~1.9× a hand-rolled inline WAL | concurrent append-and-sync; group commit coalesces the fsyncs |
recovery/replay (10k) |
~215 K records/s | reopen and replay a file-backed log |
A file-backed append is syscall-bound, not lock-bound — the pwrite the durability contract requires dominates the negligible commit-watermark lock — so the throughput lever is group commit, which beats the inline WAL an engine hand-rolls before it has batching. Run them yourself:
Examples
| Example | Run | Shows |
|---|---|---|
basic |
cargo run --example basic |
the four-call API: open, append, sync, replay |
recovery |
cargo run --example recovery |
a simulated torn write and self-healing recovery |
concurrent |
cargo run --example concurrent |
many writers, one log, group commit |
typed |
cargo run --example typed --features pack-io |
typed records via pack-io |
Testing
RUSTFLAGS="--cfg loom"
The loom run model-checks the lock-free append and the group-commit handshake: it explores every meaningful thread interleaving and asserts no overlapping records, no reorder, and at most one fsync per syncer. The fuzz run feeds arbitrary bytes to the recovery path and proves it never panics or over-allocates.
Where It Fits
wal-db is the durability substrate. It is consumed by:
lsm-db— memtable durabilitytxn-db— transaction lograft-io— Raft log persistence- Hive DB — primary write-ahead log
It stays foreign-compatible: usable standalone in any project that needs a durable append-only log.
Cross-Platform Support
Tier 1 Support:
- Linux (x86_64, aarch64) —
fdatasync - macOS (x86_64, Apple Silicon) —
fcntl(F_FULLFSYNC)for true durability - Windows (x86_64) —
FlushFileBuffers
Durability semantics are equivalent across platforms; the CI matrix runs the full suite — including the cross-process durability test — on each.
Contributing
Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Any change touching the durability path requires a torn-write recovery test and a benchmark.