holt
A carefully crafted adaptive radix tree for path-shaped metadata.
⚠️ Pre-1.0 (v0.3.1 released). The public API is now narrow and SemVer-stable inside a minor release, but minor releases may still break source compatibility before 1.0. Pin the exact published version in your
Cargo.toml(holt = "=0.3.1") until 1.0 stabilises the surface.
holt is an embedded Rust library for storing hierarchical
keys — file paths, S3 object names, multi-tenant namespaces,
time-bucketed identifiers — with sub-microsecond lookups, per-blob
concurrency, and crash-safe persistence.
It targets workloads where:
- Keys are hierarchical / path-shaped (so prefix compression pays).
- The dominant access is point lookup + prefix range scan.
- Concurrency is high (many readers + writers across disjoint subtrees).
- Latency is micro-critical — no LSM compaction stalls, no single-writer locks.
It is not a general-purpose KV store; if you need full-text
or vector similarity, reach for the right tool. For this shape,
holt's strongest wins are metadata-native operations: delimiter
directory rollup and mixed metadata workloads. In the v0.3 Linux
release run, objstore_list_dir is 151× faster than RocksDB
and fs_list_dir is 268× faster; objstore_metadata_mix is
43× faster than RocksDB and fs_metadata_mix is 66×
faster. Point reads stay ahead through 2 M keys; point writes are
competitive but intentionally not the headline claim — see
benches/RESULTS.md.
Why "holt"?
A holt (Old English holt) is a small grove or copse — a
self-contained collection of trees on a single piece of ground.
That maps directly to the design: each holt::Tree is one ART
made of many 512 KB blob frames, bounded and self-contained,
grown by repeated splitBlob. Short to type, distinct from other
crates, easy to say.
When to reach for holt
| Engine | Data structure | Persistence | Concurrency | Notes |
|---|---|---|---|---|
| LMDB | B+tree | mmap | Single-writer MVCC | Battle-tested; page chasing for short hot keys. |
| RocksDB | LSM | SST + WAL | MVCC | Compaction stalls; large hot dataset is RAM-heavy. |
| SQLite | B-tree | File | Single writer | Convenient, but writer is the bottleneck under load. |
| Sled | Hybrid LSM | Log-structured | Lock-free | Rust-native, largely unmaintained. |
| holt | Adaptive Radix Tree | 512 KB blobs | Per-blob 3-mode latch | Path compression + lookup is O(key.len) |
ART's lookup cost is O(key.len), not O(log N). For short hot keys
(< 64 bytes), that beats any tree-based competitor. The per-blob
HybridLatch lets N readers traverse disjoint subtrees in parallel
without coordinating.
Project status
Pre-1.0, actively maintained. The v0.3 metadata-engine core is
landed: insert / lookup / erase / rename / range / atomic / compact,
multi-blob crossings, online maintenance gates, persistent store
with O_DIRECT and optional Linux io_uring, logical WAL
with group commit, sharded buffer manager, 3-thread background
checkpointer, SIMD CRC32 + node scans, and stateful Tree::range
with prefix, start_after, and S3 delimiter rollup.
See CHANGELOG.md for the v0.3 release notes and
ROADMAP.md for post-0.3 direction.
cargo bench --bench main runs a side-by-side comparison with
RocksDB and SQLite across three metadata workload shapes — see
benches/README.md for the methodology and
headline numbers.
Usage
Add holt to your Cargo.toml:
[]
= "=0.3.1"
The supported user surface is deliberately small:
TreeBuilder, Tree, TreeConfig, Storage, RangeBuilder,
RangeEntry, RangeIter, KeyRangeBuilder, KeyRangeEntry,
KeyRangeIter, AtomicBatch, Record, RecordVersion,
CheckpointConfig,
TreeStats / related stats structs, Error / Result, and the
custom-store surface (BlobStore, MemoryBlobStore,
FileBlobStore, AlignedBlobBuf, BlobGuid). Internal
layout, WAL, walker, and buffer-manager modules are not public API.
Open a tree
Two storage modes; same TreeBuilder, one knob switches between them:
use ;
// File-backed production mode, Unix-only — Linux `O_DIRECT`,
// macOS `F_NOCACHE`. The directory is created if missing.
let tree = new
.buffer_pool_size // pinned 512 KB blobs (default 64)
.wal_commit // default async journal acknowledgement
.open?;
// In-memory — volatile, dies with the last handle. The path
// argument becomes informational once `.memory()` flips the
// mode. Good for tests, sidecar caches, ephemeral session stores.
let tree = new.memory.open?;
Single-key CRUD
Bytes in, bytes out. put and delete are the blind hot paths —
they write or tombstone without reading the existing leaf, and return
() / bool respectively. insert and remove are the explicit
returning variants that pay one extra leaf read to hand you the
prior value. Use put/delete by default; reach for insert/remove
only where you actually consume the previous bytes. rename is
atomic and errors if dst exists unless force = true.
// Blind hot paths — recommended default.
tree.put?;
let value: = tree.get?;
assert_eq!;
let existed: bool = tree.delete?;
assert!;
// Returning variants — pay the read-back cost only when you need it.
let prev: = tree.insert?;
assert!; // we just deleted it above
let dropped: = tree.remove?;
assert_eq!;
tree.put?;
tree.rename?;
Conditional writes
RecordVersion is Holt's lightweight compare-and-set token. It is
not an MVCC timestamp and does not provide historical snapshot
reads; it only says "this live record still has the same leaf seq".
tree.put_if_absent?;
let record = tree.get_record?.unwrap;
let updated: bool = tree.compare_and_put?;
assert!;
let current = tree.get_version?.unwrap;
let deleted: bool = tree.delete_if_version?;
assert!;
// Point-in-time read helper for rmdir-style metadata checks.
assert!;
Atomic batch
Multiple ops under one WAL record. Holt preflights rename and
conditional-write / prefix-emptiness guards before mutating,
applies the batch behind the tree-wide mutation gate, then emits
one Batch WAL record.
Ok(true) means committed, Ok(false) means a conditional guard
failed and nothing was published, and Err reports hard failures
such as a missing rename source or destination collision.
let template = tree.get_record?.unwrap;
let committed = tree.atomic?;
assert!;
Range scans and S3 delimiter rollup
Tree::range yields full records: key, value, and
RecordVersion. Use it when the list response needs metadata
bytes. Tree::range_keys / Tree::scan_keys use the same cursor
and delimiter semantics but skip value materialisation; use them
for name-only directory and object listings. Chain .prefix() to
anchor the scan, .start_after() for paging, and .delimiter()
for S3-style ?delimiter=/ rollup.
use ;
// Full prefix scan — `take(50)` for a paged "first 50" view
// when the caller needs metadata bytes.
let first_50: = tree
.range
.prefix
.into_iter
.take
.map
.?;
// S3-style "list one level" without copying values — leaves under
// `/img/` get emitted as Key; deeper paths roll up to a single
// `CommonPrefix` per distinct subdir.
for entry in tree.scan_keys.delimiter
Durability
Per-op writes land in the journal worker + BufferManager cache. The WAL acknowledgement boundary is explicit:
WalCommit::Enqueue— default. Return after the journal worker queue accepts the encoded record.WalCommit::Write— return after WAL bytes reach the OS page cache, with no per-opsync_data. This is the benchmark profile matching RocksDBWAL on, sync=false.WalCommit::Sync— return aftersync_data; concurrent writers can share one fsync through group commit.
Disk-truth advances at:
Tree::checkpoint()— flush the journal (sync_data), write dirty blobs through to the store,fdatasyncthe store, truncate the WAL. Call this at your own application checkpoint cadence.- WAL auto-flush — once the WAL writer's pending buffer
crosses 64 KB it drains to the OS page cache (no
sync_data). Bounds in-memory buffering even ifcheckpointis rare.
tree.checkpoint?; // flush WAL + write through + truncate
See examples/ for full programs:
basic_kv,
filesystem_meta,
session_store,
s3_metadata.
Architecture
See ARCHITECTURE.md for the deep dive.
The design draws on Leis et al.'s ART paper (ICDE 2013) for the
four-node-size scheme and LeanStore (ICDE 2018) for the HybridLatch
contract.
Not on the roadmap
holt is just the metadata engine — single-node, embed-in- your-process, Unix-only. Out of scope:
- Windows —
compile_error!s the crate (UnixO_DIRECT/F_NOCACHEhas no Windows analog worth carrying). - Object-storage frontend / S3 layer — no RPC server, no multi-tenant bucket registry, no distributed checkpointer.
- SQL / vector / full-text — combine with a domain-appropriate
engine (
+ FAISSfor vectors,+ Tantivyfor full-text). - Replication / consensus — build above; we'll expose hooks (change feed, snapshot transfer) but won't ship Raft.
- Network server — this is a library; wrap it in your own RPC.
License
Licensed under the MIT licence.