holt
A carefully crafted adaptive radix tree for path-shaped metadata.
⚠️ Pre-1.0 (v0.2.1). The public API is now narrow and SemVer-stable inside a minor release, but minor releases may still break source compatibility before 1.0. Pin the exact version in your
Cargo.toml(holt = "=0.2.1") until 1.0 stabilises the surface.
holt is an embedded Rust library for storing hierarchical
keys — file paths, S3 object names, multi-tenant namespaces,
time-bucketed identifiers — with sub-microsecond lookups, per-blob
concurrency, and crash-safe persistence.
It targets workloads where:
- Keys are hierarchical / path-shaped (so prefix compression pays).
- The dominant access is point lookup + prefix range scan.
- Concurrency is high (many readers + writers across disjoint subtrees).
- Latency is micro-critical — no LSM compaction stalls, no single-writer locks.
It is not a general-purpose KV store; if you need full-text
or vector similarity, reach for the right tool. For this shape,
holt beats LMDB / RocksDB / SQLite cleanly on point lookup and
prefix scan at every dataset size we test through 2 M keys. On
random point writes with a working set well past the buffer pool
(~200 MB+), LSM-style write amortization in RocksDB starts to
edge ahead — see benches/RESULTS.md for
the honest scale-curve numbers.
Why "holt"?
A holt (Old English holt) is a small grove or copse — a
self-contained collection of trees on a single piece of ground.
That maps directly to the design: each holt::Tree is one ART
made of many 512 KB blob frames, bounded and self-contained,
grown by repeated splitBlob. Short to type, distinct from other
crates, easy to say.
When to reach for holt
| Engine | Data structure | Persistence | Concurrency | Notes |
|---|---|---|---|---|
| LMDB | B+tree | mmap | Single-writer MVCC | Battle-tested; page chasing for short hot keys. |
| RocksDB | LSM | SST + WAL | MVCC | Compaction stalls; large hot dataset is RAM-heavy. |
| SQLite | B-tree | File | Single writer | Convenient, but writer is the bottleneck under load. |
| Sled | Hybrid LSM | Log-structured | Lock-free | Rust-native, largely unmaintained. |
| holt | Adaptive Radix Tree | 512 KB blobs | Per-blob 3-mode latch | Path compression + lookup is O(key.len) |
ART's lookup cost is O(key.len), not O(log N). For short hot keys
(< 64 bytes), that beats any tree-based competitor. The per-blob
HybridLatch lets N readers traverse disjoint subtrees in parallel
without coordinating.
Project status
Pre-1.0, actively maintained. The algorithm core (insert /
lookup / erase / rename / range / txn / compact + multi-blob
crossings), persistent backend with O_DIRECT + optional
io_uring fast path, physiological WAL with batched
transactions, sharded buffer manager + 3-thread background
checkpointer enforcing WAL-before-data, SIMD CRC32 + node
scans, and the stateful Tree::range iterator (prefix
anchoring + start_after + S3 delimiter) are all landed.
240+ tests (unit + property-based + crash-and-replay +
failpoint-injected) pass on Ubuntu + macOS CI.
See CHANGELOG.md for the per-feature
breakdown and ROADMAP.md for what's queued
(per-node HybridLatch, cross-blob lock-coupling, MVCC
snapshots, online compaction).
cargo bench --bench main runs a side-by-side comparison with
RocksDB and SQLite across three metadata workload shapes — see
benches/README.md for the methodology and
headline numbers.
Usage
Add holt to your Cargo.toml:
[]
= "0.2"
Open a tree
Two storage modes; same TreeBuilder, one knob switches between them:
use TreeBuilder;
// Persistent (production), Unix-only — Linux `O_DIRECT`,
// macOS `F_NOCACHE`. The directory is created if missing.
let tree = new
.buffer_pool_size // pinned 512 KB blobs (default 64)
.wal_sync_on_commit // see "Durability" below
.open?;
// In-memory — volatile, dies with the last handle. The path
// argument becomes informational once `.memory()` flips the
// mode. Good for tests, sidecar caches, ephemeral session stores.
let tree = new.memory.open?;
Single-key CRUD
Bytes in, bytes out. put returns the previous value if any;
delete returns the dropped value; rename is atomic and
errors if dst exists unless force = true.
tree.put?;
let value: = tree.get?;
assert_eq!;
let prev: = tree.delete?;
assert!;
tree.put?;
tree.rename?;
Atomic batched transaction
Multiple ops under one WAL record — either all replayed on
recovery, or none. Returns Err mid-batch on a failing rename
(e.g. src missing); earlier ops in the batch are still applied
to the in-memory cache but the batch WAL record is not emitted,
so a subsequent reopen-from-WAL drops the partial work.
tree.txn?;
Range scan with S3 delimiter rollup
Tree::range is the load-bearing API for metadata workloads —
readdir, ListObjects, AI artifact catalogs. Chain
.prefix() to anchor the scan, .start_after() for paging,
.delimiter() for S3-style ?delimiter=/ rollup.
use RangeEntry;
// Simple prefix scan — `take(50)` for a paged "first 50" view.
let first_50: = tree
.range
.prefix
.into_iter
.take
.map
.?;
// S3-style "list one level" — leaves under `/img/` get emitted
// as Key; deeper paths roll up to a single `CommonPrefix` per
// distinct subdir.
for entry in tree.range.prefix.delimiter
Durability
Per-op writes land in the WAL writer's buffer + the BufferManager cache. Disk-truth advances at:
Tree::checkpoint()— flush WAL (sync_data), write the BM root blob through to the backend,fdatasyncthe backend, truncate the WAL. Call this at your own application checkpoint cadence.- WAL auto-flush — once the WAL writer's pending buffer
crosses 64 KB it drains to the OS page cache (no
sync_data). Bounds in-memory buffering even ifcheckpointis rare. wal_sync_on_commit = true— opt in to a per-opsync_dataon the WAL. Trades ~µs per write for "durable to disk past power loss." Defaultfalsematches RocksDB'ssync=false.
tree.checkpoint?; // flush WAL + write through + truncate
See examples/ for full programs:
basic_kv,
filesystem_meta,
session_store,
s3_metadata.
Architecture
See ARCHITECTURE.md for the deep dive.
The design draws on Leis et al.'s ART paper (ICDE 2013) for the
four-node-size scheme and LeanStore (ICDE 2018) for the HybridLatch
contract.
Not on the roadmap
holt is just the metadata engine — single-node, embed-in- your-process, Unix-only. Out of scope:
- Windows —
compile_error!s the crate (UnixO_DIRECT/F_NOCACHEhas no Windows analog worth carrying). - Object-storage frontend / S3 layer — no RPC server, no multi-tenant bucket registry, no distributed checkpointer.
- SQL / vector / full-text — combine with a domain-appropriate
engine (
+ FAISSfor vectors,+ Tantivyfor full-text). - Replication / consensus — build above; we'll expose hooks (change feed, snapshot transfer) but won't ship Raft.
- Network server — this is a library; wrap it in your own RPC.
License
Licensed under the MIT licence.