datawal
datawal is a local record store: a framed append-only RecordLog plus an
optional last-write-wins DataWal KV projection.
MSRV: Rust 1.75.0
What datawal is
RecordLog— the canonical append-only list. Every write becomes a framed, CRC-checked record on disk. Recovery is defined as the longest valid prefix: a truncated tail is reported but not fatal; a mid-stream CRC error in a closed segment is a hard error.DataWal— a KV projection derived from the log. Keys are bytes; values are bytes. Last-write-wins. Delete leaves a tombstone. Reopen rebuilds the keydir from scratch by replaying the log.- Bytes-first. The Rust core does not parse JSON, MessagePack, or any semantic encoding. It stores and returns opaque byte slices.
- Clean export.
export_jsonlwrites the live key/value state to a JSONL file (base64-encoded keys and values) via an atomic write. - FS plumbing in a sibling crate. Atomic POSIX primitives
(
write_atomic,write_once,write_append_fsync,rename_atomic,fsync_dir) live insafeatomic-rs.
When to use
- You are manually appending JSONL and a crash truncating the file mid-record would be a problem.
- You need a tiny local key/value store with last-write-wins semantics and no external process or network.
- You need audit logs, checkpoint logs, or event logs for experiments, agents, crawlers, CLIs, or local daemons.
- You want a file-based log format that is documented down to the byte level, with frozen wire-format fixtures and TLA+ invariants for the recovery protocol.
- You want to be able to open the log, scan it, and understand exactly what is on disk — no opaque internal formats.
When not to use
- SQL, joins, secondary indexes, or range queries.
- A cache with TTL or eviction.
- A FIFO queue.
- Multi-writer or concurrent writers.
- Distributed or network-attached storage.
- Large object / blob / content-addressed storage.
- DataFrame analytics (use Polars, DuckDB, etc.).
- A production database (use SQLite, LMDB, RocksDB, etc.).
Current status
datawal is currently v0.1.0-alpha: functional and model-checked at
the protocol level, but not production-ready.
It is tagged locally (git tag v0.1.0-alpha), has no remote push, and has not
been published to crates.io. It is shelf-ready: correct enough to be shelved and
resumed later without rediscovering the protocol.
What is in:
- 58 tests green (
cargo test --workspace). - 3 TLA+ models model-checked with TLC 2.19.
- Wire-format corpus: 6 binary fixture directories, 11 corpus tests.
- 4 runnable examples.
- Real CRC-32C (Castagnoli,
0x1EDC6F41) per record, pinned by a known-vector test. - fs2 fd-based advisory lock: held by a file descriptor, not by the
existence of the sentinel file. Released on
Drop/ process exit. A stale.lockfrom a crashed previous process is not a problem. - Durability boundary is explicit:
appendproduces a framed, recoverable record but does not guarantee durability across a crash. CallRecordLog::fsync()to durabilise (sync_allon the active segment plusfsync_diron the containing directory). compact_to(out_dir)only — no in-placecompact().
What is not in:
- Python / PyO3 bindings.
- Content-addressed storage / blob / dedup / CAS.
- Compression.
- Server or multi-user access.
- Multi-writer.
- Query / secondary indexes.
- In-place compaction.
- Reader API / concurrent reads.
Quick start
use ;
use Path;
// --- RecordLog ---
let path = new;
let mut log = open?;
log.append?;
log.append?;
log.fsync?; // durability boundary
let records = log.scan?;
assert_eq!;
assert_eq!;
// --- DataWal ---
let path = new;
let mut db = open?;
db.put?;
db.put?; // last-write-wins
assert_eq!;
db.delete?;
assert_eq!;
db.compact_to?;
db.export_jsonl?;
# Ok::
Evidence stack
The protocol has been validated at multiple levels:
| Layer | Evidence |
|---|---|
| Specification | docs/canon.md — 14 binding clauses; byte layout |
| Code | crates/datawal-core/src/ — ~1900 LOC Rust |
| Unit + integration | 58 tests across tests/*.rs and embedded #[test]s |
| Wire-format corpus | 6 binary fixture dirs, 11 corpus tests |
| Formal models | 3 TLA+ models, model-checked with TLC 2.19 |
| Runnable examples | 4 examples under crates/datawal-core/examples/ |
Formal models wording: model-checked under documented assumptions.
Not "formally verified". Models do not check the Rust implementation.
See formal/README.md for invariants and how to run TLC.
Layout
datawal/
├── Cargo.toml # workspace
├── crates/
│ └── datawal-core/
│ ├── src/
│ │ ├── lib.rs
│ │ ├── format.rs # wire format, encode/decode, CRC, limits
│ │ ├── segment.rs # segment naming and listing
│ │ ├── lock.rs # fs2 fd-based advisory lock
│ │ ├── record_log.rs # RecordLog
│ │ └── datawal.rs # DataWal KV
│ ├── examples/
│ │ ├── record_log_demo.rs
│ │ ├── datawal_kv_demo.rs
│ │ ├── tail_recovery_demo.rs
│ │ └── gen_corpus.rs # regenerate tests/corpus/* (run-on-demand)
│ └── tests/
│ ├── record_log.rs # 14 cases
│ ├── datawal.rs # 9 cases
│ ├── integration.rs # 3 cases
│ ├── corpus_fixtures.rs # 11 cases over the frozen corpus
│ └── corpus/ # binary fixtures, one subdir per fixture
├── formal/ # TLA+ models (checked with TLC)
│ ├── RecordLog.tla
│ ├── KeydirProjection.tla
│ ├── Compaction.tla
│ ├── *.cfg
│ └── reports/ # most recent TLC output per model
├── docs/ # canon, technical decisions, roadmap, related work
└── dev/ # gitignored; internal notes only
safeatomic-rs lives at ../safeatomic-rs/ and is not part of this workspace.
Running
Formal models
Three small TLA+ models live under formal/ and are checked with
TLC 2.19+:
RecordLog.tla— append / fsync / crash; durable is a monotonic prefix.KeydirProjection.tla— last-write-wins keydir from a put/del log.Compaction.tla—compact_topreserves the live projection.
model-checked under documented assumptions — not "formally verified",
does not check the Rust implementation. See formal/README.md.
Wire-format corpus
crates/datawal-core/tests/corpus/ contains binary fixtures that freeze the
v0.1 on-disk format. Regenerate only when the format changes intentionally:
See crates/datawal-core/tests/corpus/README.md.
Related projects
safeatomic-rs— Rust filesystem primitives used by datawal for atomic writes and directory fsyncs.safeatomic— Python package for whole-file persistence with explicit guarantees and runtime diagnostics.
safeatomic is for replacing whole files safely.
datawal is for appending recoverable records and deriving local state
from them.
See also
docs/canon.md— binding decisions and the byte-layout of a record.docs/technical-decisions.md— TD-NNN entries documenting choices.docs/roadmap.md— v0.1.0-alpha scope; what is frozen; next tracks.formal/README.md— the TLA+ models and how to run TLC.
License
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
SPDX-License-Identifier: MIT OR Apache-2.0
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.