masstree 0.6.2

A high-performance concurrent ordered map (trie of B+trees)
Documentation
# masstree

A high-performance concurrent ordered map for Rust. It stores keys as `&[u8]` and supports variable-length keys by building a trie of B+trees, based on the [Masstree paper](https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf)

**Disclaimer:** This is an independent implementation. It is not endorsed by, affiliated with, or connected to the original Masstree authors or their institutions.

## Features

- Ordered map for byte keys (lexicographic ordering)
- Lock-free reads with version validation
- Concurrent inserts and deletes with fine-grained leaf locking
- Zero-copy range scans with `scan_ref` and `scan_prefix`
- High-throughput value-only scans with `scan_values` (skips key materialization)
- Memory reclamation via hyaline scheme (`seize` crate)
- Lazy leaf coalescing for deleted entries
- Two node widths: `MassTree` (WIDTH=24) and `MassTree15` (WIDTH=15)
- Extremely high-performance inline variant `MassTree15Inline`, this is only usable on
Copy types.

## Status

**v0.6.2** — Major performance enhancements. Core feature complete. Beats C++ Masstree on 5/7 benchmarks and
Rust alternatives on 11/12 workloads. Passes Miri with strict-provenance flag. Concurrent data structures
require extensive stress testing, the test suite is comprehensive (998 tests total) but edge cases may remain.

| Feature | Status |
|---------|--------|
| `get`, `get_ref` | Lock-free with version validation |
| `insert` | Fine-grained leaf locking |
| `remove` | Concurrent deletion with memory reclamation |
| `scan`, `scan_ref`, `scan_prefix` | Zero-copy range iteration |
| `scan_values`, `scan_values_rev` | High-throughput value-only scans |
| `DoubleEndedIterator` | Reverse iteration support |
| Leaf coalescing | Lazy queue-based cleanup |
| Memory reclamation | Hyaline scheme via `seize` crate |

**Not yet implemented:** `Entry` API, `Extend`/`FromIterator`.

## vs C++ Masstree (12T, 10s)

Results vary between runs and hardware configurations. The `same` benchmark
(10 hot keys, 12 threads) consistently shows the largest improvement over C++,
achieving **1.7x higher throughput** under extreme contention. Possible factors:

- **Hyaline memory reclamation** — Unlike the C++ epoch-based reclamation (EBR),
  Hyaline (via `seize` crate) allows readers to avoid quiescent state registration
- **Lazy coalescing** — Empty leaves are queued for deferred cleanup rather than
  removed inline, avoiding lock-coupling issues during removes
- **Sharded length counter** — 16 cache-line-aligned shards for `len()` tracking
  (C++ doesn't track global count)

Note: The optimistic read protocol (version-based OCC) is the original Masstree design,
not a divergence. One minor divergence: `has_changed()` uses `> (LOCK_BIT | INSERTING_BIT)`
instead of C++'s `> lock_bit`, ignoring both bits 0-1. This is safe because version
*counters* (VINSERT/VSPLIT) are the source of truth, INSERTING_BIT is only set while
modifications are in-flight and not yet visible to readers. See `src/nodeversion.rs:643-673`
for the full safety argument.

The forward-sequential gap (rw3) narrowed from 57% to 81% but remains under investigation.

| Benchmark | Rust | C++ | Ratio | Winner |
|-----------|------|-----|-------|--------|
| **rw4** (reverse-seq) | 59.00 | 48.14 | **123%** | **Rust** |
| **same** (10 hot keys) | 3.56 | 2.09 | **170%** | **Rust** |
| **rw2g98** (98% reads) | 25.81 | 23.04 | **112%** | **Rust** |
| **uscale** (random 140M) | 11.05 | 10.58 | **104%** | **Rust** |
| **wscale** (wide random) | 9.56 | 9.03 | **106%** | **Rust** |
| **rw1** (random insert+read) | 11.01 | 11.23 | 98% | Tie |
| **rw3** (forward-seq) | 40.54 | 50.34 | 81% | C++ |

## vs Rust Concurrent Maps (6T Physical, Rigorous)

> Source: `runs/run136_read_write.txt`
> **Config:** Physical cores only, 200 samples, performance governor.

This can be considered the current baseline.

Note: MassTree's `insert()` has upsert semantics, it updates existing keys and returns the old
value. TreeIndex's `insert()` fails on existing keys, requiring a `remove()+insert()` fallback.
Pure insert benchmarks (13, 14) use fresh keys only, providing a fairer comparison for
insert-heavy workloads where TreeIndex performs better.

NOTE 2: Recent optimizations (batched timeout checks, hybrid spin-yield backoff, countdown-based guard
refresh) improved throughput by +18% on average. Notable gains: zipfian +30%, mixed 50/50 +34%,
pure read +28%. The `insert_only_fair` benchmark flipped from a loss (0.92x) to a win (1.14x).

| Benchmark | masstree15 | tree_index | skipmap | indexset | MT vs Best |
|-----------|-----------|------------|---------|----------|------------|
| 01_uniform | **28.13** | 15.44 | 9.49 | 12.66 | **1.82x** |
| 02_zipfian | **33.98** | 11.73 | 10.58 | 4.90 | **2.90x** |
| 03_shared_prefix | **18.62** | 8.74 | 8.83 | 12.92 | **1.44x** |
| 04_high_contention | **65.24** | 15.82 | 13.07 | 3.48 | **4.12x** |
| 05_large_dataset | **13.02** | 9.33 | 6.87 | 7.88 | **1.40x** |
| 06_single_hot_key | **20.19** | 4.44 | 6.01 | 3.82 | **3.36x** |
| 07_mixed_50_50 | **24.74** | 5.69 | 5.18 | 12.09 | **2.05x** |
| 08_8byte_keys | **44.71** | 23.41 | 11.83 | 15.97 | **1.91x** |
| 09_pure_read | **42.52** | 22.69 | 14.19 | 13.77 | **1.87x** |
| 10_remove_heavy | **21.70** | 11.57 | 5.50 | 3.68 | **1.88x** |
| 13_insert_only_fair | **22.53** | 19.68 | 11.32 | 5.64 | **1.14x** |
| 14_pure_insert | 9.30 | **12.43** | 6.35 | 2.30 | 0.75x |

**Single-thread latency:** masstree15 achieves **864 µs** median read latency vs tree_index 1.31 ms (**1.52x faster**).

**Build time:** masstree15 builds at **8.56 Mitem/s** vs skipmap 6.33, tree_index 4.46, indexset 1.85 (**1.35–4.6x faster**).

## Range Scans (6T Physical, Rigorous)

> Source: `runs/run139_range_scan_optimized.txt` (inline-optimized)
> **Config:** Physical cores only, 100 samples, performance governor

| Benchmark | masstree15_inline | tree_index | MT vs TI | vs run137 |
|-----------|-------------------|------------|----------|-----------|
| 01_sequential_full_scan | **32.29** | 15.37 | **2.10x** | +79% |
| 02_reverse_scan | **23.50** | 15.31 | **1.53x** | +34% |
| 03_clustered_scan | **31.61** | 15.25 | **2.07x** | +76% |
| 04_sparse_scan | **32.19** | 11.60 | **2.77x** | +79% |
| 05_shared_prefix_scan | **26.38** | 12.97 | **2.03x** | +59% |
| 06_suffix_differ_scan | **15.86** | 12.16 | **1.30x** | -30% |
| 07_hierarchical_scan | **17.43** | 11.95 | **1.46x** | +56% |
| 08_adversarial_splits | **18.75** | 6.71 | **2.79x** | +4% |
| 09_interleaved_scan | **16.27** | 9.51 | **1.71x** | -2% |
| 10_blink_stress_scan | **20.75** | 9.81 | **2.11x** | +16% |
| 11_random_keys_scan | **20.85** | 10.80 | **1.93x** | +17% |
| 12_long_keys_64b_scan | **19.21** | 12.46 | **1.54x** | +94% |
| 15_full_scan_aggregate | **1.70 G** | 1.04 G | **1.64x** ||
| 16_insert_heavy | **22.60** | 15.93 | **1.42x** | new |
| 17_hot_spot | 6.03 | **14.87** | 0.41x | new |

## Install

```toml
[dependencies]
masstree = { version = "0.6.2", features = ["mimalloc"] }
```

MSRV is Rust 1.92+ (Edition 2024).

The `mimalloc` feature sets the global allocator. If your project already uses a custom allocator, omit this feature.

## Quick Start

```rust
use masstree::MassTree;

let tree: MassTree<u64> = MassTree::new();
let guard = tree.guard();

// Insert
tree.insert_with_guard(b"hello", 123, &guard).unwrap();
tree.insert_with_guard(b"world", 456, &guard).unwrap();

// Point lookup
assert_eq!(tree.get_ref(b"hello", &guard), Some(&123));

// Remove
tree.remove_with_guard(b"hello", &guard).unwrap();
assert_eq!(tree.get_ref(b"hello", &guard), None);

// Range scan (zero-copy)
tree.scan_ref(b"a"..b"z", |key, value| {
    println!("{:?} -> {}", key, value);
    true // continue scanning
}, &guard);

// Prefix scan
tree.scan_prefix(b"wor", |key, value| {
    println!("{:?} -> {}", key, value);
    true
}, &guard);
```

## Ergonomic APIs

For simpler use cases, auto-guard versions create guards internally:

```rust
use masstree::MassTree;

let tree: MassTree<u64> = MassTree::new();

// Auto-guard versions (simpler but slightly more overhead per call)
tree.insert(b"key1", 100).unwrap();
tree.insert(b"key2", 200).unwrap();

assert_eq!(tree.get(b"key1"), Some(std::sync::Arc::new(100)));
assert_eq!(tree.len(), 2);
assert!(!tree.is_empty());

tree.remove(b"key1").unwrap();
```

### Range Iteration

```rust
use masstree::{MassTree, RangeBound};

let tree: MassTree<u64> = MassTree::new();
let guard = tree.guard();

// Populate
for i in 0..100u64 {
    tree.insert_with_guard(&i.to_be_bytes(), i, &guard).unwrap();
}

// Iterator-based range scan
for entry in tree.range(RangeBound::Included(b""), RangeBound::Unbounded, &guard) {
    println!("{:?} -> {:?}", entry.key(), entry.value());
}

// Full iteration
for entry in tree.iter(&guard) {
    println!("{:?}", entry.key());
}
```

## When to Use

**May work well for:**

- Long keys with shared prefixes (URLs, file paths, UUIDs)
- Range scans over ordered data
- Mixed read/write workloads
- High-contention scenarios (the trie structure helps here)

**Consider alternatives for:**

- Unordered point lookups → `dashmap`
- Pure insert-only workloads → `scc::TreeIndex`
- Integer keys only → `congee` (ART-based)
- Read-heavy with rare writes → `RwLock<BTreeMap>`

## Variant Selection

Two variants are provided with different performance characteristics:

| Variant | Best For |
|---------|----------|
| `MassTree15` | Range scans, writes, shared-prefix keys, contention |
| `MassTree` (WIDTH=24) | Random-access reads, single-threaded point ops |

`MassTree15` tends to perform better in our benchmarks due to cheaper u64 atomics and better cache utilization. Consider it for most workloads unless you have uniform random-access patterns.

```rust
use masstree::{MassTree, MassTree15, MassTree24Inline, MassTree15Inline};

// Default: WIDTH=24, Arc-based storage
let tree: MassTree<u64> = MassTree::new();

// WIDTH=15, Arc-based storage (recommended for most workloads)
let tree15: MassTree15<u64> = MassTree15::new();

// Inline storage for Copy types (no Arc overhead)
let inline: MassTree24Inline<u64> = MassTree24Inline::new();
let inline15: MassTree15Inline<u64> = MassTree15Inline::new();
```

## How It Works

Masstree splits keys into 8-byte chunks, creating a trie where each node is a B+tree:

```text
Key: "users/alice/profile" (19 bytes)
     └─ Layer 0: "users/al" (8 bytes)
        └─ Layer 1: "ice/prof" (8 bytes)
           └─ Layer 2: "ile" (3 bytes)
```

Keys with shared prefixes share upper layers, making lookups efficient for hierarchical data.

## Examples

The `examples/` directory contains comprehensive usage examples:

```bash
cargo run --example basic_usage --release      # Core API walkthrough
cargo run --example rayon_parallel --release   # Parallel processing with Rayon
cargo run --example tokio_async --release      # Async integration with Tokio
cargo run --example url_cache --release        # Real-world URL cache
cargo run --example session_store --release    # Concurrent session store
```

### Rayon Integration

MassTree works seamlessly with Rayon for parallel bulk operations:

```rust
use masstree::MassTree15Inline;
use rayon::prelude::*;
use std::sync::Arc;

let tree: Arc<MassTree15Inline<u64>> = Arc::new(MassTree15Inline::new());

// Parallel bulk insert (~10M ops/sec)
(0..1_000_000).into_par_iter().for_each(|i| {
    let key = format!("key/{i:08}");
    let guard = tree.guard();
    let _ = tree.insert_with_guard(key.as_bytes(), i, &guard);
});

// Parallel lookups (~45M ops/sec)
let sum: u64 = (0..1_000_000).into_par_iter()
    .map(|i| {
        let key = format!("key/{i:08}");
        let guard = tree.guard();
        tree.get_with_guard(key.as_bytes(), &guard).unwrap_or(0)
    })
    .sum();
```

### Tokio Integration

MassTree is thread-safe but guards cannot be held across `.await` points:

```rust
use masstree::MassTree15;
use std::sync::Arc;

let tree: Arc<MassTree15<String>> = Arc::new(MassTree15::new());

// Spawn async tasks that share the tree
let handle = tokio::spawn({
    let tree = Arc::clone(&tree);
    async move {
        // Guard must be scoped - cannot be held across await!
        {
            let guard = tree.guard();
            let _ = tree.insert_with_guard(b"key", "value".to_string(), &guard);
        } // guard dropped here

        tokio::time::sleep(Duration::from_millis(10)).await;

        // Create new guard after await
        let guard = tree.guard();
        tree.get_with_guard(b"key", &guard)
    }
});

// For CPU-intensive operations, use spawn_blocking
let tree_clone = Arc::clone(&tree);
tokio::task::spawn_blocking(move || {
    let guard = tree_clone.guard();
    for entry in tree_clone.iter(&guard) {
        // Process entries...
    }
}).await;
```

## Crate Features

- `mimalloc` — Use mimalloc as global allocator (recommended)
- `tracing` — Enable structured logging to `logs/masstree.jsonl`

## License

MIT. See `LICENSE`.

## References

- [Masstree Paper (EuroSys 2012)]https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf
- [C++ Reference Implementation]https://github.com/kohler/masstree-beta