ferntree 0.5.0

Concurrent in-memory B+ Tree featuring optimistic lock coupling
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
# Phase 2-5 implementation guide

Phase 1 (atomic_slot.rs) is complete. This document captures the design
decisions and step-by-step refactor plan for finishing #7.

## API decision: `OptimisticRead` gains `type Slot`

```rust
pub unsafe trait OptimisticRead: Sized + Send + 'static {
    const EPOCH_DEFERRED_DROP: bool = false;
    type Slot: OptimisticSlot<Value = Self>;
}
```

The existing blanket `unsafe impl<T: Copy> OptimisticRead for T {}` is
**dropped**. Replaced with enumerated stdlib impls and an
`impl_optimistic_read_inline!` / `impl_optimistic_read_boxed!` macro
for user types.

Built-in impls to add in `src/optimistic.rs`:

```rust
macro_rules! impl_inline {
    ($($t:ty),*) => {
        $(
            unsafe impl OptimisticRead for $t {
                type Slot = crate::atomic_slot::InlineSlot<Self>;
            }
        )*
    };
}
impl_inline!(u8, i8, u16, i16, u32, i32, u64, i64, usize, isize);

macro_rules! impl_boxed {
    ($($t:ty),*) => {
        $(
            unsafe impl OptimisticRead for $t {
                const EPOCH_DEFERRED_DROP: bool = true;
                type Slot = crate::atomic_slot::BoxedSlot<Self>;
            }
        )*
    };
}
impl_boxed!(String, Vec<u8>);

unsafe impl<T: Send + Sync + Clone + 'static> OptimisticRead for Vec<T> {
    const EPOCH_DEFERRED_DROP: bool = true;
    type Slot = BoxedSlot<Self>;
}
unsafe impl<T: Send + Sync + 'static + ?Sized> OptimisticRead for std::sync::Arc<T>
where
    Self: Clone,
{
    const EPOCH_DEFERRED_DROP: bool = true;
    type Slot = BoxedSlot<Self>;
}

// Unit type — special-case ZST slot.
pub struct UnitSlot;
impl Default for UnitSlot { fn default() -> Self { Self } }
unsafe impl OptimisticSlot for UnitSlot {
    type Value = ();
    type Displaced = ();
    unsafe fn load(&self) -> () {}
    unsafe fn store_into_empty(&self, _: ()) {}
    unsafe fn swap_init(&self, _: ()) -> () {}
    unsafe fn take_init(&self) -> () {}
    unsafe fn move_init_to_empty(&self, _: &Self) {}
    unsafe fn drop_in_place(&mut self, _: bool) {}
}
unsafe impl OptimisticRead for () { type Slot = UnitSlot; }
```

For `&'static str`: it is `Copy` but not `AtomicLoadable` (16 bytes).
Use `BoxedSlot`:
```rust
unsafe impl OptimisticRead for &'static str {
    const EPOCH_DEFERRED_DROP: bool = false;  // Drop is no-op
    type Slot = BoxedSlot<Self>;
}
```

## LeafNode struct change

```rust
pub(crate) struct LeafNode<K: OptimisticRead, V: OptimisticRead, const LC: usize> {
    pub(crate) len: crate::atomic_slot::AtomicLen,
    pub(crate) keys:   crate::atomic_slot::SlotArray<K::Slot, LC>,
    pub(crate) values: crate::atomic_slot::SlotArray<V::Slot, LC>,
    pub(crate) lower_fence: crate::atomic_slot::OptimisticOption<K::Slot>,
    pub(crate) upper_fence: crate::atomic_slot::OptimisticOption<K::Slot>,
    pub(crate) sample_key:  crate::atomic_slot::OptimisticOption<K::Slot>,
}
```

## InternalNode struct change

```rust
pub(crate) struct InternalNode<K: OptimisticRead, V: OptimisticRead, const IC: usize, const LC: usize> {
    pub(crate) len: AtomicLen,
    pub(crate) keys: SlotArray<K::Slot, IC>,
    pub(crate) edges: InlineVec<Atomic<HybridLatch<Node<K, V, IC, LC>>>, IC>,
    pub(crate) upper_edge: Atomic<HybridLatch<Node<K, V, IC, LC>>>,
    pub(crate) lower_fence: OptimisticOption<K::Slot>,
    pub(crate) upper_fence: OptimisticOption<K::Slot>,
    pub(crate) sample_key:  OptimisticOption<K::Slot>,
}
```

`edges`' inner `Atomic<...>` is already atomic; its `InlineVec` wrapper
needs `len: AtomicU16`. The existing `InlineVec` impl in
[src/inline_vec.rs](src/inline_vec.rs) can be migrated to atomic `len`
with `Release` stores under exclusive lock and `Acquire` loads on
readers — straightforward change.

## Method rewrites per node type

For LeafNode (in src/lib.rs, lines ~3595-4015):

| Method | New impl |
|---|---|
| `new()` | `keys: SlotArray::new(), values: SlotArray::new(), len: AtomicLen::new(0), lower_fence/upper_fence/sample_key: OptimisticOption::default()` |
| `lower_bound<Q>(&self, key)` | Use `keys.load(pos)` instead of `entries.get(pos)`; binary search on owned K snapshots |
| `lower_bound_raw<Q>` (existing) | Use `SlotArray::load_raw` instead of `ptr::read` |
| `value_at(pos)` | `unsafe { self.values.load(pos) }` returns owned V |
| `key_at(pos)` | `unsafe { self.keys.load(pos) }` returns owned K |
| `kv_at(pos)` | Load K and V separately, build the tuple |
| `insert_at(pos, k, v)` | `keys.shift_insert(len, pos, k); values.shift_insert(len, pos, v); len.fetch_add(1)` |
| `remove_at(pos)` | Symmetric with `shift_remove`; returns `Displaced` (Box for boxed storage) |
| `split` | Move entries via `take_init` from self, `store_into_empty` on `right`; or migrate via raw shuffle |
| `merge` | Symmetric |
| `borrow_from_*` | Symmetric |

Caller-side `lookup` closure now receives owned V (via load). To
preserve the existing `F: FnMut(&V) -> R` API, materialise the loaded
V on the stack:
```rust
let v: V = unsafe { self.values.load(pos) };
let result = f(&v);
// For Copy V, drop is no-op. For boxed V, dropping v frees the local
// clone; the underlying Box stored in the slot is untouched.
result
```

For BoxedSlot::load, the load clones through the raw pointer. The
clone is the V we hand to the user; dropping it drops only the clone.
The slot's actual Box stays alive until a swap displaces it (and that
displaced Box is epoch-deferred).

## `find_shared_leaf_and_optimistic_parent` migration (Phase 4)

Today (lines 1041-1147), the descent reborrows `&InternalNode`:
```rust
let (c_swip, pos) = match *target_guard {
    Node::Internal(ref internal) => {
        let (pos, _) = internal.lower_bound(key);
        let swip = internal.edge_at(pos)?;
        (swip, pos)
    }
    ...
};
```

After Part A's atomic-K storage, change to raw projection mirroring
`find_optimistic_leaf` (lines 1190-1217):
```rust
let target_ptr = target_guard.as_ptr();
let kind = unsafe { Node::variant_raw(target_ptr) };
let c_swip_ptr = match kind {
    NodeKindRaw::Internal(internal_ptr) => {
        let (pos, _) = unsafe { InternalNode::lower_bound_raw(internal_ptr, key) };
        unsafe { InternalNode::edge_at_raw(internal_ptr, pos)? }
    }
    NodeKindRaw::Leaf(_) => break /* leaf reached */
};
```

The two functions converge on a shared descent body that returns either
an optimistic or shared leaf guard at the bottom.

## Test fixture additions

Add `unsafe impl OptimisticRead` blocks in these files:

- `tests/integration.rs`: `ComplexValue { ... type Slot = BoxedSlot<Self>; const EPOCH_DEFERRED_DROP: bool = true; }`
- `tests/functional_coverage.rs`, `tests/unsafe_coverage.rs`: `DropCounter`
- `tests/memory_tests.rs`: `Versions`
- `tests/property.rs`: any custom proptest types

## Moving quarantined tests into `--lib`

`tests/concurrency.rs` contains two tests at lines 713-815. After Phase 2,
move them into a new `#[cfg(test)] mod concurrent_optimistic` module
in `src/lib.rs` (or a dedicated `src/tests/concurrent.rs` test module
included via `#[path]`). They must run under `cargo miri test --lib`.

Wrap iteration counts in `cfg!(miri)` checks to keep Miri runtime
bounded:
```rust
let writer_iters = if cfg!(miri) { 10 } else { 50 };
let reader_iters = if cfg!(miri) { 100 } else { 200_000 };
```

## New concurrent splits-during-reads test (Phase 4)

Add a `--lib` test that:
1. Prefills the tree to a height that ensures multiple internal-node
   splits during the test (e.g., 100 keys with `LEAF_CAPACITY = 8`).
2. Spawns a writer thread that inserts / removes keys in a range that
   triggers splits and merges.
3. Spawns 2-3 reader threads using `Tree::lookup` / `Tree::range`
   concurrently.
4. Runs for `cfg!(miri) { 50 } else { 50_000 }` iterations.
5. Joins and validates: no reader observed a corrupted value; final
   tree state matches a reference `BTreeMap`.

This is the test the issue's first acceptance criterion calls for.

## Phased commit plan

1. **Storage trait scaffolding** — DONE (commit c6d97b7, src/atomic_slot.rs,
   15 Miri-clean tests including 6 concurrent ones).
2. **OptimisticRead trait change + built-in impls** — DONE (commit c6d97b7).
   `type Slot` associated type added; Copy blanket dropped; stdlib impls
   provided for integers, String, Vec<u8>, Arc<T>, &'static str, ();
   user test fixtures (RefcountedBlob, RcKey, ComplexValue, DropCounter×2,
   Counted) updated.
3. **AtomicLen for node `len` fields** — DONE (commit 2f79c3a). Both
   LeafNode and InternalNode `len` are now `AtomicLen`; ~40 access sites
   migrated. `len_raw` uses `AtomicLen::load_raw` (Acquire-load via raw
   projection).
4. **Add atomic mirror fields to LeafNode + propagate K/V OptimisticRead
   bounds** — DONE (commit 1b0c873 + 8c68aaa). `LeafNode` has
   `atomic_keys: SlotArray<K::Slot, LC>` and
   `atomic_values: SlotArray<V::Slot, LC>` fields, allocated empty.
   `GenericTree<K, V, IC, LC>` now bounds K, V on `OptimisticRead`.
5. **Write-path mirror maintenance** — ATTEMPTED, NOT VIABLE (stashed).
   Tried updating `insert_at`, `remove_at`, `split`, `merge` to maintain
   both `entries` and the mirror. Three blockers surfaced:
   - `Arc::strong_count` test assertions break because the mirror's
     clone doubles the held refcount per entry.
   - The overwrite path in `iter::insert` does `std::mem::replace` on
     `entries` without touching the mirror — mirror entries become
     stale after overwrite.
   - The displaced mirror entries from `remove_at` require
     `eg: &epoch::Guard` plumbing through 20+ call sites; tractable
     but doubles write cost for no semantic gain.

   **Conclusion:** the dual-storage approach is unworkable. The right
   path is **full migration**: drop `entries` and use `atomic_keys` +
   `atomic_values` as the sole storage.

6. **Full LeafNode storage migration** — TODO. The substantive remaining
   work. Drop the `entries` field. Rewrite every method on LeafNode to
   use the atomic storage.

   **Foundation provided in commit 0de2e92:**
   [`OptimisticSlot::load_into`]src/atomic_slot.rs returns
   `&T` materialised from atomic storage:
   - For [`BoxedSlot`]src/atomic_slot.rs, the borrow points into the
     `Box<T>` directly — no clone, valid for the surrounding lock.
   - For [`InlineSlot`]src/atomic_slot.rs, the value's bits are
     loaded into a caller-provided `MaybeUninit<T>` buffer.

   **Migration mechanics:**
   - Closure-based reads (`Tree::lookup`): supply a stack-local
     `MaybeUninit<V>` buffer; `load_into` returns `&V` for the
     closure call. Zero clone for boxed V, zero clone for inline V
     (just a u64 copy into buf for primitives).
   - Iterators (`RawSharedIter::next`, `Range::next`, etc.): add
     `MaybeUninit<K>` / `MaybeUninit<V>` buffer fields on the iterator
     struct. Each `next()` calls `load_into`, then returns the borrow.
     The iterator's `Option<(&K, &V)>` API stays intact; the
     references' lifetimes are tied to `&mut self` of `next()`.
   - For mut iterators (`RawExclusiveIter` `&mut V` returns), the same
     buffer pattern. Mutations to the buffer are written back to the
     atomic slot via `swap_init`. The displaced value is routed
     through `eg.defer` on the iterator's pinned epoch guard.

   **Key insight (re-read of issue scope):** shared-lock leaf body
   accesses (`Tree::lookup`'s closure call, `iter::next` returning
   `&K, &V`) are already memory-model-synchronised by the leaf's
   shared/exclusive lock — they are NOT the data race documented in
   #7. The data race documented in the issue is at:
   (a) the optimistic-fast-path leaf reads (no lock at all), and
   (b) the internal-node descent (held under optimistic guard, not
       shared lock).

   So the migration's job is to make the underlying storage atomic
   (so loads/stores commute at the memory-model level) without
   changing the shared-lock-reader API surface. `load_into` does
   exactly that.

7. **InternalNode K migration** — TODO. Replace
   `keys: InlineVec<K, IC>` with `keys: SlotArray<K::Slot, IC>`.
   `lower_bound_raw` switches from `ptr::read(K)` to atomic load via
   `SlotArray::load_raw`.

8. **`find_shared_leaf_and_optimistic_parent` raw-projection migration**
   — TODO. Replace the `match *target_guard` (which reborrows
   `&Node` / `&InternalNode`) with `Node::variant_raw` +
   `InternalNode::lower_bound_raw` + `InternalNode::edge_at_raw`.

9. **Move quarantined tests into `--lib`. Add new concurrent splits test.**
   TODO. Validates the fix under Miri.

10. **Update src/optimistic.rs module docs.** TODO. Remove "out of scope"
    carve-outs; document the fix.

## Estimated effort

Honest assessment based on the per-step blast radius:

- Step 2: ~2-3 hours (trait change + impls + test-fixture impls).
- Step 3: ~6-10 hours (LeafNode is the bulk of the work).
- Step 4: ~4-6 hours.
- Step 5: ~2-4 hours.
- Step 6: ~3-5 hours (new test design + Miri tuning).
- Step 7: ~1 hour.

Total: 18-29 hours of focused engineering. This is a multi-session PR.

## Current state (end of working session)

Commits landed on this branch (10 total):

| # | SHA | What |
|---|---|---|
| 1 | c6d97b7 | Phase 1 — `src/atomic_slot.rs` primitives + `OptimisticRead::Slot` |
| 2 | 2f79c3a | `AtomicLen` for both node `len` fields |
| 3 | 1b0c873 | Mirror fields + `K/V: OptimisticRead` bounds propagated |
| 4 | 8c68aaa | Clean up unused-import warnings |
| 5 | d98aa73 | Plan update — dual-storage lessons |
| 6 | 0de2e92 | `OptimisticSlot::load_into` for buffer-based borrows |
| 7 | 4f3a7c7 | Plan refinement: shared-lock leaf paths already synced |
| 8 | 1129e21 | Maintain atomic mirror in LeafNode writers (insert_at, remove_at, split, merge, swap_value_at) |
| 9 | fef306d | Read optimistic-fast-path V and K from atomic mirror; move the two quarantined stress tests into `--lib` |
| 10 | 897ba51 | Wrap SlotArray's inner array in UnsafeCell for Tree-Borrows safety |

### What's verified

- 258 lib tests + ~200 integration tests + 38 doctests pass on every commit.
- 18 `atomic_slot` unit tests pass under
  `MIRIFLAGS='-Zmiri-tree-borrows ...' cargo +nightly miri test --lib`,
  including 6 concurrent tests that exercise reader/writer pairings of
  `InlineSlot`, `BoxedSlot`, `SlotArray::shift_*`, `AtomicLen`, and
  `OptimisticOption`.
- The single-threaded `epoch_deferred_drop_basic` test passes under
  Miri `--lib` — proving the atomic-mirror primitives integrate
  cleanly with `Tree::insert_defer` / `lookup_optimistic` /
  `remove_defer` in non-concurrent use.

### What still fails

The two formerly-quarantined concurrent tests
(`epoch_deferred_drop_optimistic_reader_vs_defer_writer`,
`k_deferred_drop_optimistic_reader_vs_defer_writer`) pass under
`cargo test --lib` but fail under
`cargo +nightly miri test --lib` with a Tree-Borrows-experimental
"foreign tag" violation. The pattern: writer holds `&mut LeafNode`
via the exclusive lock guard; the leaf's `&mut` tag protects the
entire leaf memory tree; the optimistic reader's `&AtomicPtr<T>`
reborrow inside `BoxedSlot::try_load_raw_ptr` (or `&AtomicU16` for
inline storage) is foreign to the writer's tag. Tree Borrows treats
this as UB even though the underlying operations are stdlib atomic
ops on UnsafeCell-typed memory.

**Note:** the data-race-detector aspect — the explicit concern in
the issue text — is satisfied. The reader/writer pairings are
`AtomicPtr::load(Acquire)` vs `AtomicPtr::store(Release)` /
`swap(AcqRel)`, so memory-model commutativity holds. Real
concurrent execution under `cargo test`, ASan, and TSan is clean.
The remaining failure is specifically Miri's experimental Tree
Borrows enforcement of `&` reborrow scoping.

### Path to fully closing #7

Closing the Tree-Borrows residue requires moving the writer's leaf
mutation off the `&mut LeafNode` path:

- `ExclusiveGuard` would need to expose `as_ptr_mut() -> *mut LeafNode`
  rather than (or in addition to) `Deref<Target = LeafNode>`.
- Writer methods (`LeafNode::insert_at`, `remove_at`, `split`,
  `merge`, `swap_value_at`) would take `*mut Self` instead of
  `&mut self`, and project to `atomic_keys` / `atomic_values` via
  `ptr::addr_of_mut!`.
- `SlotArray` write methods (`shift_insert`, `shift_remove`,
  `swap_init`, etc.) would take `*const Self` and use
  `OptimisticSlot::*_raw_ptr` variants that go directly to the
  slot's inner atomic (`AtomicPtr<T>` for boxed,
  `T::Atomic` for inline) without ever materialising `&Slot`.

The `*_raw_ptr` variants on `OptimisticSlot` are partly in place
(`try_load_raw_ptr` is overridden on both `InlineSlot` and
`BoxedSlot`); the corresponding write-side overrides plus the
LeafNode/writer-path refactor are the remaining surgical work. The
trait architecture is ready to receive them.

After this, internal-node K migration follows the same pattern,
and `find_shared_leaf_and_optimistic_parent` migrates to the same
raw-projection descent the optimistic path already uses.

## Risks not yet resolved

- **`&'static str` V:** boxing it adds an allocation. Most existing test
  cases use `&str` only for literals; revisit if benchmarks complain.
- **AtomicLoadable for `bool`:** invalid bit patterns make a torn load
  potentially UB even atomically. Recommendation: omit from inline list;
  if needed by users, box it.
- **`InlineVec` len atomicisation:** simplest path is to add a
  `raw_len_atomic` returning `AtomicU16::load(Acquire)`, deprecate the
  non-atomic `raw_len`, and migrate callers.
- **Generic Tree height (`AtomicUsize`):** already atomic; no change.