# masstree
A high-performance concurrent ordered map for Rust. It stores keys as `&[u8]` and supports variable-length keys by building a trie of B+trees, based on the [Masstree paper](https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf)
**Disclaimer:** This is an independent implementation. It is not endorsed by, affiliated with, or connected to the original Masstree authors or their institutions.
## Features
- Ordered map for byte keys (lexicographic ordering)
- Lock-free reads with version validation
- Concurrent inserts and deletes with fine-grained leaf locking
- Zero-copy range scans with `scan_ref` and `scan_prefix`
- Memory reclamation via hyaline scheme (`seize` crate)
- Lazy leaf coalescing for deleted entries
- Two node widths: `MassTree` (WIDTH=24) and `MassTree15` (WIDTH=15)
## Status
**v0.5.0** — Core feature complete. Heavily tested but concurrent data structures require extensive stress testing beyond what tests (even though still quite comprehensive) provide. The unsafe code passes Miri with strict-provenance flag. It should be noted that the C++ implementation is a research project, not a library specifically developed for production usage. So it is quite possible that there are still various rare edge cases and soundness issues that still haven't been addressed.
| `get`, `get_ref` | Lock-free with version validation |
| `insert` | Fine-grained leaf locking |
| `remove` | Concurrent deletion with memory reclamation |
| `scan`, `scan_ref`, `scan_prefix` | Zero-copy range iteration |
| `DoubleEndedIterator` | Reverse iteration support |
| Leaf coalescing | Lazy queue-based cleanup |
| Memory reclamation | Hyaline scheme via `seize` crate |
**Tests:** 900 tests (unit + 88 ported from C++ reference + proptests + stress tests). Miri strict provenance clean.
**Not yet implemented:** `Entry` API, `Extend`/`FromIterator`.
## Install
```toml
[dependencies]
masstree = { version = "0.5.0", features = ["mimalloc"] }
```
MSRV is Rust 1.92+ (Edition 2024).
The `mimalloc` feature sets the global allocator. If your project already uses a custom allocator, omit this feature.
## Quick Start
```rust
use masstree::MassTree;
let tree: MassTree<u64> = MassTree::new();
let guard = tree.guard();
// Insert
tree.insert_with_guard(b"hello", 123, &guard).unwrap();
tree.insert_with_guard(b"world", 456, &guard).unwrap();
// Point lookup
assert_eq!(tree.get_ref(b"hello", &guard), Some(&123));
// Remove
tree.remove_with_guard(b"hello", &guard).unwrap();
assert_eq!(tree.get_ref(b"hello", &guard), None);
// Range scan (zero-copy)
tree.scan_ref(b"a"..b"z", |key, value| {
println!("{:?} -> {}", key, value);
true // continue scanning
}, &guard);
// Prefix scan
tree.scan_prefix(b"wor", |key, value| {
println!("{:?} -> {}", key, value);
true
}, &guard);
```
## Ergonomic APIs
For simpler use cases, auto-guard versions create guards internally:
```rust
use masstree::MassTree;
let tree: MassTree<u64> = MassTree::new();
// Auto-guard versions (simpler but slightly more overhead per call)
tree.insert(b"key1", 100).unwrap();
tree.insert(b"key2", 200).unwrap();
assert_eq!(tree.get(b"key1"), Some(std::sync::Arc::new(100)));
assert_eq!(tree.len(), 2);
assert!(!tree.is_empty());
tree.remove(b"key1").unwrap();
```
### Range Iteration
```rust
use masstree::{MassTree, RangeBound};
let tree: MassTree<u64> = MassTree::new();
let guard = tree.guard();
// Populate
for i in 0..100u64 {
tree.insert_with_guard(&i.to_be_bytes(), i, &guard).unwrap();
}
// Iterator-based range scan
for entry in tree.range(RangeBound::Included(b""), RangeBound::Unbounded, &guard) {
println!("{:?} -> {:?}", entry.key(), entry.value());
}
// Full iteration
for entry in tree.iter(&guard) {
println!("{:?}", entry.key());
}
```
## When to Use
**May work well for:**
- Long keys with shared prefixes (URLs, file paths, UUIDs)
- Range scans over ordered data
- Mixed read/write workloads
- High-contention scenarios (the trie structure helps here)
**Consider alternatives for:**
- Unordered point lookups → `dashmap`
- Pure insert-only workloads → `scc::TreeIndex`
- Integer keys only → `congee` (ART-based)
- Read-heavy with rare writes → `RwLock<BTreeMap>`
## Variant Selection
Two variants are provided with different performance characteristics:
| `MassTree15` | Range scans, writes, shared-prefix keys, contention |
| `MassTree` (WIDTH=24) | Random-access reads, single-threaded point ops |
`MassTree15` tends to perform better in our benchmarks due to cheaper u64 atomics and better cache utilization. Consider it for most workloads unless you have uniform random-access patterns.
```rust
use masstree::{MassTree, MassTree15, MassTree24Inline, MassTree15Inline};
// Default: WIDTH=24, Arc-based storage
let tree: MassTree<u64> = MassTree::new();
// WIDTH=15, Arc-based storage (recommended for most workloads)
let tree15: MassTree15<u64> = MassTree15::new();
// Inline storage for Copy types (no Arc overhead)
let inline: MassTree24Inline<u64> = MassTree24Inline::new();
let inline15: MassTree15Inline<u64> = MassTree15Inline::new();
```
## How It Works
Masstree splits keys into 8-byte chunks, creating a trie where each node is a B+tree:
```text
Key: "users/alice/profile" (19 bytes)
└─ Layer 0: "users/al" (8 bytes)
└─ Layer 1: "ice/prof" (8 bytes)
└─ Layer 2: "ile" (3 bytes)
```
Keys with shared prefixes share upper layers, making lookups efficient for hierarchical data.
## Examples
The `examples/` directory contains comprehensive usage examples:
```bash
cargo run --example basic_usage --release # Core API walkthrough
cargo run --example rayon_parallel --release # Parallel processing with Rayon
cargo run --example tokio_async --release # Async integration with Tokio
cargo run --example url_cache --release # Real-world URL cache
cargo run --example session_store --release # Concurrent session store
```
### Rayon Integration
MassTree works seamlessly with Rayon for parallel bulk operations:
```rust
use masstree::MassTree15Inline;
use rayon::prelude::*;
use std::sync::Arc;
let tree: Arc<MassTree15Inline<u64>> = Arc::new(MassTree15Inline::new());
// Parallel bulk insert (~10M ops/sec)
(0..1_000_000).into_par_iter().for_each(|i| {
let key = format!("key/{i:08}");
let guard = tree.guard();
let _ = tree.insert_with_guard(key.as_bytes(), i, &guard);
});
// Parallel lookups (~45M ops/sec)
let sum: u64 = (0..1_000_000).into_par_iter()
.map(|i| {
let key = format!("key/{i:08}");
let guard = tree.guard();
tree.get_with_guard(key.as_bytes(), &guard).unwrap_or(0)
})
.sum();
```
### Tokio Integration
MassTree is thread-safe but guards cannot be held across `.await` points:
```rust
use masstree::MassTree15;
use std::sync::Arc;
let tree: Arc<MassTree15<String>> = Arc::new(MassTree15::new());
// Spawn async tasks that share the tree
let handle = tokio::spawn({
let tree = Arc::clone(&tree);
async move {
// Guard must be scoped - cannot be held across await!
{
let guard = tree.guard();
let _ = tree.insert_with_guard(b"key", "value".to_string(), &guard);
} // guard dropped here
tokio::time::sleep(Duration::from_millis(10)).await;
// Create new guard after await
let guard = tree.guard();
tree.get_with_guard(b"key", &guard)
}
});
// For CPU-intensive operations, use spawn_blocking
let tree_clone = Arc::clone(&tree);
tokio::task::spawn_blocking(move || {
let guard = tree_clone.guard();
for entry in tree_clone.iter(&guard) {
// Process entries...
}
}).await;
```
## Crate Features
- `mimalloc` — Use mimalloc as global allocator (recommended)
- `tracing` — Enable structured logging to `logs/masstree.jsonl`
## License
MIT. See `LICENSE`.
## References
- [Masstree Paper (EuroSys 2012)](https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf)
- [C++ Reference Implementation](https://github.com/kohler/masstree-beta)