holt
A carefully crafted adaptive radix tree for path-shaped metadata.
⚠️ Pre-1.0 (v0.3 released). The public API is now narrow and SemVer-stable inside a minor release, but minor releases may still break source compatibility before 1.0. Pin the exact published version in your
Cargo.toml(holt = "=0.3.0") until 1.0 stabilises the surface.
holt is an embedded Rust library for storing hierarchical
keys — file paths, S3 object names, multi-tenant namespaces,
time-bucketed identifiers — with sub-microsecond lookups, per-blob
concurrency, and crash-safe persistence.
It targets workloads where:
- Keys are hierarchical / path-shaped (so prefix compression pays).
- The dominant access is point lookup + prefix range scan.
- Concurrency is high (many readers + writers across disjoint subtrees).
- Latency is micro-critical — no LSM compaction stalls, no single-writer locks.
It is not a general-purpose KV store; if you need full-text
or vector similarity, reach for the right tool. For this shape,
holt's strongest wins are metadata-native operations: delimiter
directory rollup and mixed metadata workloads. In the v0.3 Linux
release run, objstore_list_dir is 151× faster than RocksDB
and fs_list_dir is 268× faster; objstore_metadata_mix is
43× faster than RocksDB and fs_metadata_mix is 66×
faster. Point reads stay ahead through 2 M keys; point writes are
competitive but intentionally not the headline claim — see
benches/RESULTS.md.
Why "holt"?
A holt (Old English holt) is a small grove or copse — a
self-contained collection of trees on a single piece of ground.
That maps directly to the design: each holt::Tree is one ART
made of many 512 KB blob frames, bounded and self-contained,
grown by repeated splitBlob. Short to type, distinct from other
crates, easy to say.
When to reach for holt
| Engine | Data structure | Persistence | Concurrency | Notes |
|---|---|---|---|---|
| LMDB | B+tree | mmap | Single-writer MVCC | Battle-tested; page chasing for short hot keys. |
| RocksDB | LSM | SST + WAL | MVCC | Compaction stalls; large hot dataset is RAM-heavy. |
| SQLite | B-tree | File | Single writer | Convenient, but writer is the bottleneck under load. |
| Sled | Hybrid LSM | Log-structured | Lock-free | Rust-native, largely unmaintained. |
| holt | Adaptive Radix Tree | 512 KB blobs | Per-blob 3-mode latch | Path compression + lookup is O(key.len) |
ART's lookup cost is O(key.len), not O(log N). For short hot keys
(< 64 bytes), that beats any tree-based competitor. The per-blob
HybridLatch lets N readers traverse disjoint subtrees in parallel
without coordinating.
Project status
Pre-1.0, actively maintained. The v0.3 metadata-engine core is
landed: insert / lookup / erase / rename / range / txn / compact,
multi-blob crossings, online maintenance gates, persistent backend
with O_DIRECT and optional Linux io_uring, physiological WAL
with group commit, sharded buffer manager, 3-thread background
checkpointer, SIMD CRC32 + node scans, and stateful Tree::range
with prefix, start_after, and S3 delimiter rollup.
See CHANGELOG.md for the v0.3 release notes and
ROADMAP.md for post-0.3 direction.
cargo bench --bench main runs a side-by-side comparison with
RocksDB and SQLite across three metadata workload shapes — see
benches/README.md for the methodology and
headline numbers.
Usage
Add holt to your Cargo.toml:
[]
= "=0.3.0"
The supported 0.3 user surface is deliberately small:
TreeBuilder, Tree, TreeConfig, Storage, RangeBuilder,
RangeEntry, RangeIter, TxnBatch, CheckpointConfig,
TreeStats / related stats structs, Error / Result, and the
custom-backend surface (Backend, MemoryBackend,
PersistentBackend, AlignedBlobBuf, BlobGuid). Internal
layout, WAL, walker, and buffer-manager modules are not public API.
Open a tree
Two storage modes; same TreeBuilder, one knob switches between them:
use TreeBuilder;
// Persistent (production), Unix-only — Linux `O_DIRECT`,
// macOS `F_NOCACHE`. The directory is created if missing.
let tree = new
.buffer_pool_size // pinned 512 KB blobs (default 64)
.wal_sync_on_commit // see "Durability" below
.open?;
// In-memory — volatile, dies with the last handle. The path
// argument becomes informational once `.memory()` flips the
// mode. Good for tests, sidecar caches, ephemeral session stores.
let tree = new.memory.open?;
Single-key CRUD
Bytes in, bytes out. put and delete are the blind hot paths —
they write or tombstone without reading the existing leaf, and return
() / bool respectively. insert and remove are the explicit
returning variants that pay one extra leaf read to hand you the
prior value. Use put/delete by default; reach for insert/remove
only where you actually consume the previous bytes. rename is
atomic and errors if dst exists unless force = true.
// Blind hot paths — recommended default.
tree.put?;
let value: = tree.get?;
assert_eq!;
let existed: bool = tree.delete?;
assert!;
// Returning variants — pay the read-back cost only when you need it.
let prev: = tree.insert?;
assert!; // we just deleted it above
let dropped: = tree.remove?;
assert_eq!;
tree.put?;
tree.rename?;
Atomic batched transaction
Multiple ops under one WAL record — either all replayed on
recovery, or none. Returns Err mid-batch on a failing rename
(e.g. src missing); earlier ops in the batch are still applied
to the in-memory cache but the batch WAL record is not emitted,
so a subsequent reopen-from-WAL drops the partial work.
tree.txn?;
Range scan with S3 delimiter rollup
Tree::range is the load-bearing API for metadata workloads —
readdir, ListObjects, AI artifact catalogs. Chain
.prefix() to anchor the scan, .start_after() for paging,
.delimiter() for S3-style ?delimiter=/ rollup.
use RangeEntry;
// Simple prefix scan — `take(50)` for a paged "first 50" view.
let first_50: = tree
.range
.prefix
.into_iter
.take
.map
.?;
// S3-style "list one level" — leaves under `/img/` get emitted
// as Key; deeper paths roll up to a single `CommonPrefix` per
// distinct subdir.
for entry in tree.range.prefix.delimiter
Durability
Per-op writes land in the journal worker + BufferManager cache. Disk-truth advances at:
Tree::checkpoint()— flush the journal (sync_data), write dirty blobs through to the backend,fdatasyncthe backend, truncate the WAL. Call this at your own application checkpoint cadence.- WAL auto-flush — once the WAL writer's pending buffer
crosses 64 KB it drains to the OS page cache (no
sync_data). Bounds in-memory buffering even ifcheckpointis rare. wal_sync_on_commit = true— opt in to a per-op durable journal acknowledgement. Concurrent writers can share onesync_datathrough group commit. Defaultfalsematches RocksDB'ssync=false.
tree.checkpoint?; // flush WAL + write through + truncate
See examples/ for full programs:
basic_kv,
filesystem_meta,
session_store,
s3_metadata.
Architecture
See ARCHITECTURE.md for the deep dive.
The design draws on Leis et al.'s ART paper (ICDE 2013) for the
four-node-size scheme and LeanStore (ICDE 2018) for the HybridLatch
contract.
Not on the roadmap
holt is just the metadata engine — single-node, embed-in- your-process, Unix-only. Out of scope:
- Windows —
compile_error!s the crate (UnixO_DIRECT/F_NOCACHEhas no Windows analog worth carrying). - Object-storage frontend / S3 layer — no RPC server, no multi-tenant bucket registry, no distributed checkpointer.
- SQL / vector / full-text — combine with a domain-appropriate
engine (
+ FAISSfor vectors,+ Tantivyfor full-text). - Replication / consensus — build above; we'll expose hooks (change feed, snapshot transfer) but won't ship Raft.
- Network server — this is a library; wrap it in your own RPC.
License
Licensed under the MIT licence.