masstree 0.7.2

A high-performance concurrent ordered map (trie of B+trees)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
# masstree

A high-performance concurrent ordered map for Rust. It stores keys as `&[u8]` and supports variable-length keys by building a trie of B+trees, based on the [Masstree paper](https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf)

**Disclaimer:** This is an independent implementation. It is not endorsed by, affiliated with, or connected to the original Masstree authors or their institutions.

## Features

- Ordered map for byte keys (lexicographic ordering)
- Lock-free reads with version validation
- Concurrent inserts and deletes with fine-grained leaf locking
- Zero-copy range scans with `scan_ref` and `scan_prefix`
- High-throughput value-only scans with `scan_values` (skips key materialization)
- Memory reclamation via hyaline scheme (`seize` crate)
- Lazy leaf coalescing for deleted entries
- Extremely high-performance inline variant `MassTree15Inline`, this is only usable on
Copy types.

## Status

**v0.7.2** — Major performance enhancements. Core feature complete. Beats C++ Masstree on 7/8 benchmarks and
Rust alternatives on 12/12 workloads (12T SMT). Passes Miri with strict-provenance flag. Concurrent data structures
require extensive stress testing, the test suite is comprehensive (978 tests total) but edge cases may remain.

| Feature | Status |
|---------|--------|
| `get`, `get_ref` | Lock-free with version validation |
| `insert` | Fine-grained leaf locking |
| `remove` | Concurrent deletion with memory reclamation |
| `scan`, `scan_ref`, `scan_prefix` | Zero-copy range iteration |
| `scan_values`, `scan_values_rev` | High-throughput value-only scans |
| `DoubleEndedIterator` | Reverse iteration support |
| Leaf coalescing | Lazy queue-based cleanup |
| Memory reclamation | Hyaline scheme via `seize` crate |

## vs C++ Masstree (12T, 10s)

Results vary between runs and hardware configurations. The `same` benchmark
(10 hot keys, 12 threads) consistently shows the largest improvement over C++,
achieving **1.7x higher throughput** under extreme contention. Possible factors:

- **Hyaline memory reclamation** — Unlike the C++ epoch-based reclamation (EBR),
  Hyaline (via `seize` crate) allows readers to avoid quiescent state registration
- **Lazy coalescing** — Empty leaves are queued for deferred cleanup rather than
  removed inline, avoiding lock-coupling issues during removes
- **Sharded length counter** — 16 cache-line-aligned shards for `len()` tracking
  (C++ doesn't track global count)

Note: The optimistic read protocol (version-based OCC) is the original Masstree design,
not a divergence. One minor divergence: `has_changed()` uses `> (LOCK_BIT | INSERTING_BIT)`
instead of C++'s `> lock_bit`, ignoring both bits 0-1. This is safe because version
*counters* (VINSERT/VSPLIT) are the source of truth, INSERTING_BIT is only set while
modifications are in-flight and not yet visible to readers. See `src/nodeversion.rs:643-673`
for the full safety argument.

The forward-sequential gap (rw3) narrowed from 57% to 81% but remains under investigation.

| Benchmark | Rust | C++ | Ratio |
|-----------|------|-----|-------|
| **rw4** (reverse-seq) | 59.00 | 48.14 | **123%** |
| **same** (10 hot keys) | 3.56 | 2.09 | **170%** |
| **rw2g98** (98% reads) | 25.81 | 23.04 | **112%** |
| **uscale** (random 140M) | 11.05 | 10.58 | **104%** |
| **wscale** (wide random) | 9.56 | 9.03 | **106%** |
| **rw1** (random insert+read) | 11.01 | 11.23 | 98% |
| **rw3** (forward-seq) | 40.54 | 50.34 | 81% |

## vs Rust Concurrent Maps (6T Physical, Rigorous)

> Source: `runs/run150_read_write_correctness.txt`
> **Config:** Physical cores only, 200 samples, performance governor.

This can be considered the current baseline.

Note: MassTree's `insert()` has upsert semantics, it updates existing keys and returns the old
value. TreeIndex's `insert()` fails on existing keys, requiring a `remove()+insert()` fallback.
Pure insert benchmarks (13, 14) use fresh keys only, providing a fairer comparison for
insert-heavy workloads where TreeIndex performs better.

| Benchmark | masstree15 | tree_index | skipmap | indexset | MT vs Best |
|-----------|-----------|------------|---------|----------|------------|
| 01_uniform | **28.03** | 13.93 | 8.78 | 12.23 | **2.01x** |
| 02_zipfian | **30.89** | 11.63 | 9.90 | 4.20 | **2.66x** |
| 03_shared_prefix | **15.57** | 8.48 | 7.66 | 11.80 | **1.32x** |
| 04_high_contention | **59.10** | 14.78 | 12.94 | 3.47 | **4.00x** |
| 05_large_dataset | **13.76** | 8.98 | 6.71 | 7.68 | **1.53x** |
| 06_single_hot_key | **18.02** | 4.50 | 5.94 | 4.04 | **3.03x** |
| 07_mixed_50_50 | **25.99** | 5.67 | 5.13 | 12.12 | **2.14x** |
| 08_8byte_keys | **43.67** | 21.52 | 11.86 | 16.95 | **2.03x** |
| 09_pure_read | **42.10** | 22.88 | 13.70 | 13.31 | **1.84x** |
| 10_remove_heavy | **15.02** | 11.62 | 5.07 | 3.93 | **1.29x** |
| 13_insert_only_fair | **22.49** | 17.77 | 10.37 | 5.42 | **1.27x** |
| 14_pure_insert | 9.93 | **11.42** | 8.13 | 2.17 | 0.87x |

**Single-thread latency:** masstree15 achieves **836 µs** median read latency vs tree_index 1.35 ms (**1.61x faster**).

**Build time:** masstree15 builds at **8.46 Mitem/s** vs skipmap 6.17, tree_index 4.35, indexset 1.86 (**1.37–4.6x faster**).

## vs Rust Concurrent Maps (12T SMT)

> Source: `runs/run158_read_write.txt`
> **Config:** 12 threads on 6 physical cores (SMT/hyperthreading), 200 samples.

| Benchmark | masstree15 | tree_index | skipmap | indexset | MT vs Best |
|-----------|-----------|------------|---------|----------|------------|
| 01_uniform | **49.85** | 20.64 | 14.14 | 17.73 | **2.42x** |
| 02_zipfian | **43.39** | 18.38 | 14.98 | 3.03 | **2.36x** |
| 03_shared_prefix | **24.86** | 15.17 | 12.98 | 16.37 | **1.52x** |
| 04_high_contention | **76.28** | 16.44 | 17.85 | 1.97 | **4.27x** |
| 05_large_dataset | **20.79** | 12.44 | 10.10 | 11.22 | **1.67x** |
| 06_single_hot_key | **10.66** | 4.29 | 6.51 | 2.39 | **1.64x** |
| 07_mixed_50_50 | **36.60** | 9.89 | 7.20 | 16.89 | **2.17x** |
| 08_8byte_keys | **59.86** | 32.17 | 17.39 | 20.90 | **1.86x** |
| 09_pure_read | **56.25** | 29.26 | 20.64 | 19.09 | **1.92x** |
| 10_remove_heavy | **21.52** | 18.01 | 8.22 | 4.54 | **1.19x** |
| 13_insert_only_fair | **37.25** | 24.25 | 16.72 | 6.12 | **1.54x** |
| 14_pure_insert | **14.33** | 13.67 | 10.52 | 2.50 | **1.05x** |

**Wins 12/12.** Deferred suffix retirement and pre-allocated suffix buffers improved insert
throughput by 20–46%, flipping `14_pure_insert` from a loss (0.75x in run151) to a win.

**Single-thread latency (11):** masstree15 achieves **829 µs** median read latency vs tree_index 1.39 ms (**1.67x faster**).

**Build time (12):** masstree15 builds at **8.14 Mitem/s** vs skipmap 6.29, tree_index 4.33, indexset 1.81 (**1.29–4.5x faster**).

SMT scaling highlights: High-contention workloads benefit most from hyperthreading, with
masstree15 reaching **76.28 Mitem/s** (4.27x vs alternatives). Insert-heavy workloads
(`13_insert_only_fair`, `14_pure_insert`) showed the largest gains from moving heap allocation
and EBR retirement outside the leaf lock critical section.

Note: `06_single_hot_key` regressed from 14.33 to 10.66 Mitem/s (-26%) compared to run151.
With only one key contended across 12 threads, the increased leaf size (896→1152 bytes from
inline suffix capacity doubling) likely worsens cache line sharing. masstree15 still leads
all alternatives (next best: skipmap at 6.51).

## High-Impact Workloads (12T SMT)

> Source: `runs/run154_high_impact_twig_optimization.txt`
> **Config:** 12 threads on 6 physical cores (SMT), 200 samples

Benchmarks targeting Masstree's architectural advantages: long keys, variable-length keys,
hot key patterns, mixed operations, prefix queries, and deep trie traversal.

| Benchmark | masstree15 | indexset | tree_index | skipmap | MT vs Best |
|-----------|------------|----------|------------|---------|------------|
| 01_long_keys_128b | **34.95** | 14.58 | 14.98 | 11.15 | **2.33x** |
| 02_multiple_hot_keys | **40.97** | 14.24 | 12.43 | 13.26 | **2.88x** |
| 03_mixed_get_insert_remove | **27.24** | 6.00 | 11.93 | 8.85 | **2.28x** |
| 04_variable_long_keys | **28.17** | 9.30 | 8.29 | 7.51 | **3.03x** |
| 05_prefix_queries (Kitem/s) | **426.3** | n/a | 14.56 | 140.7 | **3.02x** |
| 06_deep_trie_traversal | **18.16** | 13.77 | 11.16 | 8.84 | **1.32x** |
| 07_deep_trie_read_only | **27.90** | 15.05 | 17.35 | 15.28 | **1.61x** |
| 08_variable_keys_arc | **29.56** | 11.13 | 11.55 | 8.46 | **2.56x** |

**Wins 8/8** with margins from 1.32x to 3.03x.

Key insights:

- **Long keys (128B):** Unique prefixes test suffix handling; Masstree stores suffixes inline
- **Variable keys (64-256B):** Masstree takes `&[u8]` slices; others `clone()` `Vec<u8>`
- **Multiple hot keys:** OCC reads excel under localized contention (8 keys, 80% access)
- **Mixed ops (70/20/10):** `seize`-based reclamation handles concurrent deletes well
- **Prefix queries:** Native `scan_prefix()` vs range simulation (29x faster than tree_index)
- **Deep trie:** Shared prefix chunks force multi-layer descent; narrowest margin (1.32x)

## Range Scans (6T Physical)

> Source: `runs/run161_range_scan.txt`
> **Config:** Physical cores only, 100 samples, performance governor

| Benchmark | masstree15_inline | tree_index | MT vs TI |
|-----------|-------------------|------------|----------|
| 01_sequential_full_scan | **28.42** | 13.47 | **2.11x** |
| 02_reverse_scan | **27.09** | 13.59 | **1.99x** |
| 03_clustered_scan | **29.54** | 13.40 | **2.20x** |
| 04_sparse_scan | **29.43** | 13.39 | **2.20x** |
| 05_shared_prefix_scan | **25.49** | 15.17 | **1.68x** |
| 06_suffix_differ_scan | **22.21** | 15.66 | **1.42x** |
| 07_hierarchical_scan | **27.12** | 15.62 | **1.74x** |
| 08_adversarial_splits | **28.71** | 8.40 | **3.42x** |
| 09_interleaved_scan | **25.42** | 13.55 | **1.88x** |
| 10_blink_stress_scan | **29.31** | 13.58 | **2.16x** |
| 11_random_keys_scan | **29.44** | 13.54 | **2.17x** |
| 12_long_keys_64b_scan | **27.68** | 15.63 | **1.77x** |
| 15_full_scan_aggregate | **178.1** | 83.36 | **2.14x** |
| 16_insert_heavy | **26.93** | 19.02 | **1.42x** |
| 17_hot_spot | **9.51** | 2.99 | **3.18x** |

**Wins 15/15** vs TreeIndex with margins from 1.42x to 3.42x.

## Range Scans (12T SMT)

> Source: `runs/run161_range_scan.txt`
> **Config:** 12 threads on 6 physical cores (SMT), 100 samples

| Benchmark | masstree15_inline | tree_index | MT vs TI |
|-----------|-------------------|------------|----------|
| 01_sequential_full_scan | **30.38** | 15.27 | **1.99x** |
| 02_reverse_scan | **28.51** | 15.14 | **1.88x** |
| 03_clustered_scan | **30.50** | 15.18 | **2.01x** |
| 04_sparse_scan | **30.37** | 15.11 | **2.01x** |
| 05_shared_prefix_scan | **26.12** | 21.48 | **1.22x** |
| 06_suffix_differ_scan | **24.00** | 21.08 | **1.14x** |
| 07_hierarchical_scan | **27.98** | 21.17 | **1.32x** |
| 08_adversarial_splits | **29.23** | 10.04 | **2.91x** |
| 09_interleaved_scan | **26.24** | 15.30 | **1.72x** |
| 10_blink_stress_scan | **30.14** | 15.23 | **1.98x** |
| 11_random_keys_scan | **29.70** | 15.17 | **1.96x** |
| 12_long_keys_64b_scan | **28.25** | 21.33 | **1.32x** |
| 15_full_scan_aggregate | **176.8** | 113.5 | **1.56x** |
| 16_insert_heavy | **30.06** | 25.26 | **1.19x** |
| 17_hot_spot | **9.69** | 4.04 | **2.40x** |

**Wins 15/15** vs TreeIndex. TreeIndex scales better on SMT for shared-prefix and suffix-differ
workloads but Masstree maintains leadership with margins from 1.14x to 2.91x.

## Install

```toml
[dependencies]
masstree = { version = "0.7.2", features = ["mimalloc"] }
```

MSRV is Rust 1.92+ (Edition 2024).

The `mimalloc` feature sets the global allocator. If your project already uses a custom allocator, omit this feature.

## Quick Start

```rust
use masstree::MassTree;

let tree: MassTree<u64> = MassTree::new();
let guard = tree.guard();

// Insert
tree.insert_with_guard(b"hello", 123, &guard).unwrap();
tree.insert_with_guard(b"world", 456, &guard).unwrap();

// Point lookup
assert_eq!(tree.get_ref(b"hello", &guard), Some(&123));

// Remove
tree.remove_with_guard(b"hello", &guard).unwrap();
assert_eq!(tree.get_ref(b"hello", &guard), None);

// Range scan (zero-copy)
tree.scan_ref(b"a"..b"z", |key, value| {
    println!("{:?} -> {}", key, value);
    true // continue scanning
}, &guard);

// Prefix scan
tree.scan_prefix(b"wor", |key, value| {
    println!("{:?} -> {}", key, value);
    true
}, &guard);
```

## Ergonomic APIs

For simpler use cases, auto-guard versions create guards internally:

```rust
use masstree::MassTree;

let tree: MassTree<u64> = MassTree::new();

// Auto-guard versions (simpler but slightly more overhead per call)
tree.insert(b"key1", 100).unwrap();
tree.insert(b"key2", 200).unwrap();

assert_eq!(tree.get(b"key1"), Some(std::sync::Arc::new(100)));
assert_eq!(tree.len(), 2);
assert!(!tree.is_empty());

tree.remove(b"key1").unwrap();
```

### Range Iteration

```rust
use masstree::{MassTree, RangeBound};

let tree: MassTree<u64> = MassTree::new();
let guard = tree.guard();

// Populate
for i in 0..100u64 {
    tree.insert_with_guard(&i.to_be_bytes(), i, &guard).unwrap();
}

// Iterator-based range scan
for entry in tree.range(RangeBound::Included(b""), RangeBound::Unbounded, &guard) {
    println!("{:?} -> {:?}", entry.key(), entry.value());
}

// Full iteration
for entry in tree.iter(&guard) {
    println!("{:?}", entry.key());
}
```

## When to Use

**May work well for:**

- Long keys with shared prefixes (URLs, file paths, UUIDs)
- Range scans over ordered data
- Mixed read/write workloads
- High-contention scenarios (the trie structure helps here)

**Consider alternatives for:**

- Unordered point lookups → `dashmap`
- Integer keys only → `congee` (ART-based)
- Read-heavy with rare writes → `RwLock<BTreeMap>`

## Type Aliases

| Type | Storage | Value Requirement |
|------|---------|-------------------|
| `MassTree<V>` | Inline (default) | `V: InlineBits` (Copy + fits in 64 bits) |
| `MassTree15<V>` | Arc-based | `V: Send + Sync + 'static` |
| `MassTree15Inline<V>` | True inline | `V: InlineBits` |

`MassTree<V>` is the recommended default for `Copy` types like `u64`, `i32`, `f64`, pointers, etc.
Use `MassTree15<V>` explicitly when you need to store non-Copy types like `String`.

```rust
use masstree::{MassTree, MassTree15, MassTree15Inline};

// Default: inline storage for Copy types (recommended)
let tree: MassTree<u64> = MassTree::new();

// Arc-based storage for non-Copy types
let tree_arc: MassTree15<String> = MassTree15::new();

// Explicit inline storage (same as MassTree)
let inline: MassTree15Inline<u64> = MassTree15Inline::new();
```

## How It Works

Masstree splits keys into 8-byte chunks, creating a trie where each node is a B+tree:

```text
Key: "users/alice/profile" (19 bytes)
     └─ Layer 0: "users/al" (8 bytes)
        └─ Layer 1: "ice/prof" (8 bytes)
           └─ Layer 2: "ile" (3 bytes)
```

Keys with shared prefixes share upper layers, making lookups efficient for hierarchical data.

## Examples

The `examples/` directory contains comprehensive usage examples:

```bash
cargo run --example basic_usage --release      # Core API walkthrough
cargo run --example rayon_parallel --release   # Parallel processing with Rayon
cargo run --example tokio_async --release      # Async integration with Tokio
cargo run --example url_cache --release        # Real-world URL cache
cargo run --example session_store --release    # Concurrent session store
```

### Rayon Integration

MassTree works seamlessly with Rayon for parallel bulk operations:

```rust
use masstree::MassTree15Inline;
use rayon::prelude::*;
use std::sync::Arc;

let tree: Arc<MassTree15Inline<u64>> = Arc::new(MassTree15Inline::new());

// Parallel bulk insert (~10M ops/sec)
(0..1_000_000).into_par_iter().for_each(|i| {
    let key = format!("key/{i:08}");
    let guard = tree.guard();
    let _ = tree.insert_with_guard(key.as_bytes(), i, &guard);
});

// Parallel lookups (~45M ops/sec)
let sum: u64 = (0..1_000_000).into_par_iter()
    .map(|i| {
        let key = format!("key/{i:08}");
        let guard = tree.guard();
        tree.get_with_guard(key.as_bytes(), &guard).unwrap_or(0)
    })
    .sum();
```

### Tokio Integration

MassTree is thread-safe but guards cannot be held across `.await` points:

```rust
use masstree::MassTree15;
use std::sync::Arc;

let tree: Arc<MassTree15<String>> = Arc::new(MassTree15::new());

// Spawn async tasks that share the tree
let handle = tokio::spawn({
    let tree = Arc::clone(&tree);
    async move {
        // Guard must be scoped - cannot be held across await!
        {
            let guard = tree.guard();
            let _ = tree.insert_with_guard(b"key", "value".to_string(), &guard);
        } // guard dropped here

        tokio::time::sleep(Duration::from_millis(10)).await;

        // Create new guard after await
        let guard = tree.guard();
        tree.get_with_guard(b"key", &guard)
    }
});

// For CPU-intensive operations, use spawn_blocking
let tree_clone = Arc::clone(&tree);
tokio::task::spawn_blocking(move || {
    let guard = tree_clone.guard();
    for entry in tree_clone.iter(&guard) {
        // Process entries...
    }
}).await;
```

## Crate Features

- `mimalloc` — Use mimalloc as global allocator (recommended)
- `tracing` — Enable structured logging to `logs/masstree.jsonl`

## License

MIT. See `LICENSE`.

## References

- [Masstree Paper (EuroSys 2012)]https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf
- [C++ Reference Implementation]https://github.com/kohler/masstree-beta