holt
A carefully crafted adaptive radix tree for path-shaped metadata.
⚠️ Pre-1.0 (v0.6.0 released). The public API is now narrow and SemVer-stable inside a minor release, but minor releases may still break source compatibility before 1.0. Pin the exact published version in your
Cargo.toml(holt = "=0.6.0") until 1.0 stabilises the surface.
holt is an embedded Rust library for storing hierarchical
keys — file paths, S3 object names, multi-tenant namespaces,
time-bucketed identifiers — with sub-microsecond lookups, per-blob
concurrency, and crash-safe persistence.
It targets workloads where:
- Keys are hierarchical / path-shaped (so prefix compression pays).
- The dominant access is point lookup + prefix range scan.
- Concurrency is high (many readers + writers across disjoint subtrees).
- Latency is micro-critical — no LSM compaction stalls, no single-writer locks.
It is not a general-purpose KV store; if you need full-text
or vector similarity, reach for the right tool. For this shape,
holt's strongest wins are metadata-native operations: delimiter
directory rollup and mixed metadata workloads. In the v0.3 Linux
release run, objstore_list_dir is 151× faster than RocksDB
and fs_list_dir is 268× faster; objstore_metadata_mix is
43× faster than RocksDB and fs_metadata_mix is 66×
faster. Point reads stay ahead through 2 M keys; point writes are
competitive but intentionally not the headline claim — see
benches/RESULTS.md.
Why "holt"?
A holt (Old English holt) is a small grove or copse — a
self-contained collection of trees on a single piece of ground.
That maps directly to the design: each holt::Tree is one ART
made of many 512 KB blob frames, bounded and self-contained,
grown by repeated splitBlob. Short to type, distinct from other
crates, easy to say.
When to reach for holt
| Engine | Data structure | Persistence | Concurrency | Notes |
|---|---|---|---|---|
| LMDB | B+tree | mmap | Single-writer MVCC | Battle-tested; page chasing for short hot keys. |
| RocksDB | LSM | SST + WAL | MVCC | Compaction stalls; large hot dataset is RAM-heavy. |
| SQLite | B-tree | File | Single writer | Convenient, but writer is the bottleneck under load. |
| Sled | Hybrid LSM | Log-structured | Lock-free | Rust-native, largely unmaintained. |
| holt | Adaptive Radix Tree | 512 KB blobs | Per-blob 3-mode latch | Path compression + lookup is O(key.len) |
ART's lookup cost is O(key.len), not O(log N). For short hot keys
(< 64 bytes), that beats any tree-based competitor. The per-blob
HybridLatch lets N readers traverse disjoint subtrees in parallel
without coordinating.
Project status
Pre-1.0, actively maintained. The metadata-engine core is landed:
put / get / delete / rename / range / atomic / compact, multi-blob
crossings, online maintenance gates, persistent store with O_DIRECT
and optional Linux io_uring, logical WAL with group commit, sharded
buffer manager, 3-thread background checkpointer, SIMD CRC32 + node
scans, copy-on-write snapshots, and stateful Tree::range with prefix,
start_after, and S3 delimiter rollup.
0.6 keeps holt focused as an embedded metadata KV engine while
moving the persistent write path to a shared WAL byte ring. Local WAL
durability, whole-DB checkpoint export/install, copy-on-write snapshots,
Tree::put_many_if_absent, and per-scan ScanStats remain the core
surface. Replication, external log replay, and shard ownership live
above holt instead of inside the engine. See the
Durability section below.
See CHANGELOG.md for the release notes and
ROADMAP.md for direction.
cargo bench --manifest-path benches/Cargo.toml --bench main
runs a side-by-side comparison with RocksDB, SQLite, and sled
across three metadata workload shapes — see
benches/README.md for the methodology and
headline numbers.
Usage
Add holt to your Cargo.toml:
[]
= "0.6"
The supported user surface is deliberately small:
DB, DBAtomicBatch, DBView, TreeBuilder, Tree, TreeConfig,
Storage, Durability, RangeBuilder, RangeEntry, RangeIter,
KeyRangeBuilder, KeyRangeEntry, KeyRangeIter, ScanStats, Snapshot,
View, AtomicBatch, Record, RecordVersion, PutOutcome,
KeyPathBuf, KeyPrefixBuf, KeyPathError, CheckpointConfig,
CheckpointImage, TreeStats / DBStats / related stats structs,
Error / Result, and the custom-store surface (BlobStore,
MemoryBlobStore, FileBlobStore, AlignedBlobBuf, BlobGuid). Internal
layout, WAL, walker, and buffer-manager modules are not public API.
Open a tree
Two storage modes; same TreeBuilder, one knob switches between them:
use ;
// File-backed production mode, Unix-only — Linux `O_DIRECT`,
// macOS `F_NOCACHE`. The directory is created if missing.
let tree = new
.buffer_pool_size // optional: 512 blobs = 256 MiB
.durability // async group-commit WAL (default)
.open?;
// In-memory — volatile, dies with the last handle. The path
// argument becomes informational once `.memory()` flips the
// mode. Good for tests, sidecar caches, ephemeral session stores.
let tree = new.memory.open?;
File-backed trees default to 256 resident 512 KB blobs (128 MiB). In-memory trees keep the smaller 64-blob default (32 MiB).
Path-shaped keys
The core API takes byte keys. For object-store and filesystem-like
metadata, KeyPathBuf builds canonical slash-separated keys without
hand-written string formatting. It is optional: callers with opaque
keys can keep passing raw bytes.
use KeyPathBuf;
let mut key = with_namespace?;
key.push?;
key.push?;
key.push?;
key.push?;
tree.put?;
let mut prefix = with_namespace?;
prefix.push?;
prefix.push?;
let prefix = prefix.into_prefix;
for entry in tree.scan_keys.delimiter
# Ok::
Single-key CRUD
Bytes in, bytes out. put and delete are the hot paths: they write
or tombstone without reading the existing leaf. rename is atomic and
errors if dst exists unless force = true.
tree.put?;
let value: = tree.get?;
assert_eq!;
let existed: bool = tree.delete?;
assert!;
tree.put?;
tree.rename?;
Conditional writes
RecordVersion is Holt's lightweight compare-and-set token. It is
not an MVCC timestamp and does not provide historical snapshot
reads; it only says "this live record still has the same leaf seq".
tree.put_if_absent?;
let record = tree.get_record?.unwrap;
let updated: bool = tree.compare_and_put?;
assert!;
let current = tree.get_version?.unwrap;
let deleted: bool = tree.delete_if_version?;
assert!;
// Point-in-time read helper for rmdir-style metadata checks.
assert!;
Atomic batch
Multiple ops under one WAL record. Holt preflights rename and
conditional-write / prefix-emptiness guards before mutating,
applies the batch behind the tree-wide mutation gate, then emits
one Batch WAL record.
Ok(true) means committed, Ok(false) means a conditional guard
failed and nothing was published, and Err reports hard failures
such as a missing rename source or destination collision.
let template = tree.get_record?.unwrap;
let committed = tree.atomic?;
assert!;
Range scans and S3 delimiter rollup
Tree::range yields full records: key, value, and
RecordVersion. Use it when the list response needs metadata
bytes. Tree::range_keys / Tree::scan_keys use the same cursor
and delimiter semantics but skip value materialisation; use them
for name-only directory and object listings. Chain .prefix() to
anchor the scan, .start_after() for paging, and .delimiter()
for S3-style ?delimiter=/ rollup.
use ;
// Full prefix scan — `take(50)` for a paged "first 50" view
// when the caller needs metadata bytes.
let first_50: = tree
.range
.prefix
.into_iter
.take
.map
.?;
// S3-style "list one level" without copying values — leaves under
// `/img/` get emitted as Key; deeper paths roll up to a single
// `CommonPrefix` per distinct subdir.
for entry in tree.scan_keys.delimiter
Snapshot reads
Tree::range and Tree::range_keys are the hot restart-on-conflict
iterators. Use Tree::view(prefix, |view| { ... }) when multiple
reads must observe one stable prefix snapshot. A view copies the
reachable blob frames for prefix, releases the live tree, then
runs point reads and scans against that private frame set.
tree.view?;
Durability
Durability controls how holt acknowledges its local write-ahead log:
Durability::Wal { sync } — holt owns local durability. Each write updates
the BufferManager cache and appends one logical WAL record. sync: false (the
default) returns after the group-commit worker queues the record; sync: true
waits for sync_data, and concurrent writers share one fsync. Disk-truth
advances at:
- Background checkpoint — enabled by default for file-backed trees. It flushes the WAL, writes dirty blobs through to the store, syncs, applies pending deletes, and truncates the WAL once the pipeline is clean.
Tree::checkpoint()— the same protocol, run synchronously.- WAL auto-flush — drains the pending buffer to the OS page cache past 64 KB, bounding in-memory buffering even when checkpoints are rare.
tree.checkpoint?; // flush WAL + write through + truncate
Whole-DB checkpoint images are ordinary holt archive/transfer artifacts. They carry families and key/value bytes, but no external log index or replication metadata:
use ;
let db = DBopen?;
let image = db.export_checkpoint?;
image.validate?;
let restored = DBopen?;
restored.install_checkpoint?;
Validation and observability
Holt has four validation layers:
- Unit and integration tests cover WAL replay, checkpoint recovery, range/view semantics, conditional atomic batches, and checkpoint failpoints.
- Property tests compare random operation streams against an oracle.
fuzz/contains acargo-fuzztarget for the atomic/WAL/range model.verified/contains Verus specs for ART node shape, grow/shrink, prefix split, delimiter rollup bounds, virtual terminators, and leaf alignment.
Long lifecycle campaigns live in the explicit soak tool:
With the metrics feature, holt::metrics::render_prometheus exposes
cache hit/miss, eviction/admission, route-cache, WAL work/debt,
checkpoint debt, dirty/pending-delete counts, and reopen WAL replay
time. The caller owns the HTTP endpoint; Holt only renders the
Prometheus text payload.
The GitHub workflows are split by cost:
- normal CI runs unit/integration/property tests, a bounded fuzz smoke, coverage, and a short soak smoke;
- nightly validation runs checkpoint/WAL fault tests, longer normal and crash soak campaigns, and a time-bounded fuzz campaign;
- Verus is available as a manual
Nightly Validationoption because hosted runners do not ship a Verus binary.
See TESTING.md for the full test matrix and release-gate
commands.
See examples/ for full programs:
basic_kv,
filesystem_meta,
session_store,
s3_metadata.
Architecture
See ARCHITECTURE.md for the deep dive.
The design draws on Leis et al.'s ART paper (ICDE 2013) for the
four-node-size scheme and LeanStore (ICDE 2018) for the HybridLatch
contract.
Not on the roadmap
holt is just the metadata engine — single-node, embed-in- your-process, Unix-only. Out of scope:
- Windows —
compile_error!s the crate (UnixO_DIRECT/F_NOCACHEhas no Windows analog worth carrying). - Object-storage frontend / S3 layer — no RPC server, no multi-tenant bucket registry, no distributed checkpointer.
- SQL / vector / full-text — combine with a domain-appropriate
engine (
+ FAISSfor vectors,+ Tantivyfor full-text). - Replication / consensus — build above; we'll expose hooks (change feed, snapshot transfer) but won't ship Raft.
- Network server — this is a library; wrap it in your own RPC.
License
Licensed under the MIT licence.