# Phase 2-5 implementation guide
Phase 1 (atomic_slot.rs) is complete. This document captures the design
decisions and step-by-step refactor plan for finishing #7.
## API decision: `OptimisticRead` gains `type Slot`
```rust
pub unsafe trait OptimisticRead: Sized + Send + 'static {
const EPOCH_DEFERRED_DROP: bool = false;
type Slot: OptimisticSlot<Value = Self>;
}
```
The existing blanket `unsafe impl<T: Copy> OptimisticRead for T {}` is
**dropped**. Replaced with enumerated stdlib impls and an
`impl_optimistic_read_inline!` / `impl_optimistic_read_boxed!` macro
for user types.
Built-in impls to add in `src/optimistic.rs`:
```rust
macro_rules! impl_inline {
($($t:ty),*) => {
$(
unsafe impl OptimisticRead for $t {
type Slot = crate::atomic_slot::InlineSlot<Self>;
}
)*
};
}
impl_inline!(u8, i8, u16, i16, u32, i32, u64, i64, usize, isize);
macro_rules! impl_boxed {
($($t:ty),*) => {
$(
unsafe impl OptimisticRead for $t {
const EPOCH_DEFERRED_DROP: bool = true;
type Slot = crate::atomic_slot::BoxedSlot<Self>;
}
)*
};
}
impl_boxed!(String, Vec<u8>);
unsafe impl<T: Send + Sync + Clone + 'static> OptimisticRead for Vec<T> {
const EPOCH_DEFERRED_DROP: bool = true;
type Slot = BoxedSlot<Self>;
}
unsafe impl<T: Send + Sync + 'static + ?Sized> OptimisticRead for std::sync::Arc<T>
where
Self: Clone,
{
const EPOCH_DEFERRED_DROP: bool = true;
type Slot = BoxedSlot<Self>;
}
// Unit type — special-case ZST slot.
pub struct UnitSlot;
impl Default for UnitSlot { fn default() -> Self { Self } }
unsafe impl OptimisticSlot for UnitSlot {
type Value = ();
type Displaced = ();
unsafe fn load(&self) -> () {}
unsafe fn store_into_empty(&self, _: ()) {}
unsafe fn swap_init(&self, _: ()) -> () {}
unsafe fn take_init(&self) -> () {}
unsafe fn move_init_to_empty(&self, _: &Self) {}
unsafe fn drop_in_place(&mut self, _: bool) {}
}
unsafe impl OptimisticRead for () { type Slot = UnitSlot; }
```
For `&'static str`: it is `Copy` but not `AtomicLoadable` (16 bytes).
Use `BoxedSlot`:
```rust
unsafe impl OptimisticRead for &'static str {
const EPOCH_DEFERRED_DROP: bool = false; // Drop is no-op
type Slot = BoxedSlot<Self>;
}
```
## LeafNode struct change
```rust
pub(crate) struct LeafNode<K: OptimisticRead, V: OptimisticRead, const LC: usize> {
pub(crate) len: crate::atomic_slot::AtomicLen,
pub(crate) keys: crate::atomic_slot::SlotArray<K::Slot, LC>,
pub(crate) values: crate::atomic_slot::SlotArray<V::Slot, LC>,
pub(crate) lower_fence: crate::atomic_slot::OptimisticOption<K::Slot>,
pub(crate) upper_fence: crate::atomic_slot::OptimisticOption<K::Slot>,
pub(crate) sample_key: crate::atomic_slot::OptimisticOption<K::Slot>,
}
```
## InternalNode struct change
```rust
pub(crate) struct InternalNode<K: OptimisticRead, V: OptimisticRead, const IC: usize, const LC: usize> {
pub(crate) len: AtomicLen,
pub(crate) keys: SlotArray<K::Slot, IC>,
pub(crate) edges: InlineVec<Atomic<HybridLatch<Node<K, V, IC, LC>>>, IC>,
pub(crate) upper_edge: Atomic<HybridLatch<Node<K, V, IC, LC>>>,
pub(crate) lower_fence: OptimisticOption<K::Slot>,
pub(crate) upper_fence: OptimisticOption<K::Slot>,
pub(crate) sample_key: OptimisticOption<K::Slot>,
}
```
`edges`' inner `Atomic<...>` is already atomic; its `InlineVec` wrapper
needs `len: AtomicU16`. The existing `InlineVec` impl in
[src/inline_vec.rs](src/inline_vec.rs) can be migrated to atomic `len`
with `Release` stores under exclusive lock and `Acquire` loads on
readers — straightforward change.
## Method rewrites per node type
For LeafNode (in src/lib.rs, lines ~3595-4015):
| `new()` | `keys: SlotArray::new(), values: SlotArray::new(), len: AtomicLen::new(0), lower_fence/upper_fence/sample_key: OptimisticOption::default()` |
| `lower_bound<Q>(&self, key)` | Use `keys.load(pos)` instead of `entries.get(pos)`; binary search on owned K snapshots |
| `lower_bound_raw<Q>` (existing) | Use `SlotArray::load_raw` instead of `ptr::read` |
| `value_at(pos)` | `unsafe { self.values.load(pos) }` returns owned V |
| `key_at(pos)` | `unsafe { self.keys.load(pos) }` returns owned K |
| `kv_at(pos)` | Load K and V separately, build the tuple |
| `insert_at(pos, k, v)` | `keys.shift_insert(len, pos, k); values.shift_insert(len, pos, v); len.fetch_add(1)` |
| `remove_at(pos)` | Symmetric with `shift_remove`; returns `Displaced` (Box for boxed storage) |
| `split` | Move entries via `take_init` from self, `store_into_empty` on `right`; or migrate via raw shuffle |
| `merge` | Symmetric |
| `borrow_from_*` | Symmetric |
Caller-side `lookup` closure now receives owned V (via load). To
preserve the existing `F: FnMut(&V) -> R` API, materialise the loaded
V on the stack:
```rust
let v: V = unsafe { self.values.load(pos) };
let result = f(&v);
// For Copy V, drop is no-op. For boxed V, dropping v frees the local
// clone; the underlying Box stored in the slot is untouched.
result
```
For BoxedSlot::load, the load clones through the raw pointer. The
clone is the V we hand to the user; dropping it drops only the clone.
The slot's actual Box stays alive until a swap displaces it (and that
displaced Box is epoch-deferred).
## `find_shared_leaf_and_optimistic_parent` migration (Phase 4)
Today (lines 1041-1147), the descent reborrows `&InternalNode`:
```rust
let (c_swip, pos) = match *target_guard {
Node::Internal(ref internal) => {
let (pos, _) = internal.lower_bound(key);
let swip = internal.edge_at(pos)?;
(swip, pos)
}
...
};
```
After Part A's atomic-K storage, change to raw projection mirroring
`find_optimistic_leaf` (lines 1190-1217):
```rust
let target_ptr = target_guard.as_ptr();
let kind = unsafe { Node::variant_raw(target_ptr) };
let c_swip_ptr = match kind {
NodeKindRaw::Internal(internal_ptr) => {
let (pos, _) = unsafe { InternalNode::lower_bound_raw(internal_ptr, key) };
unsafe { InternalNode::edge_at_raw(internal_ptr, pos)? }
}
NodeKindRaw::Leaf(_) => break /* leaf reached */
};
```
The two functions converge on a shared descent body that returns either
an optimistic or shared leaf guard at the bottom.
## Test fixture additions
Add `unsafe impl OptimisticRead` blocks in these files:
- `tests/integration.rs`: `ComplexValue { ... type Slot = BoxedSlot<Self>; const EPOCH_DEFERRED_DROP: bool = true; }`
- `tests/functional_coverage.rs`, `tests/unsafe_coverage.rs`: `DropCounter`
- `tests/memory_tests.rs`: `Versions`
- `tests/property.rs`: any custom proptest types
## Moving quarantined tests into `--lib`
`tests/concurrency.rs` contains two tests at lines 713-815. After Phase 2,
move them into a new `#[cfg(test)] mod concurrent_optimistic` module
in `src/lib.rs` (or a dedicated `src/tests/concurrent.rs` test module
included via `#[path]`). They must run under `cargo miri test --lib`.
Wrap iteration counts in `cfg!(miri)` checks to keep Miri runtime
bounded:
```rust
let writer_iters = if cfg!(miri) { 10 } else { 50 };
let reader_iters = if cfg!(miri) { 100 } else { 200_000 };
```
## New concurrent splits-during-reads test (Phase 4)
Add a `--lib` test that:
1. Prefills the tree to a height that ensures multiple internal-node
splits during the test (e.g., 100 keys with `LEAF_CAPACITY = 8`).
2. Spawns a writer thread that inserts / removes keys in a range that
triggers splits and merges.
3. Spawns 2-3 reader threads using `Tree::lookup` / `Tree::range`
concurrently.
4. Runs for `cfg!(miri) { 50 } else { 50_000 }` iterations.
5. Joins and validates: no reader observed a corrupted value; final
tree state matches a reference `BTreeMap`.
This is the test the issue's first acceptance criterion calls for.
## Phased commit plan
1. **Storage trait scaffolding** — DONE (commit c6d97b7, src/atomic_slot.rs,
15 Miri-clean tests including 6 concurrent ones).
2. **OptimisticRead trait change + built-in impls** — DONE (commit c6d97b7).
`type Slot` associated type added; Copy blanket dropped; stdlib impls
provided for integers, String, Vec<u8>, Arc<T>, &'static str, ();
user test fixtures (RefcountedBlob, RcKey, ComplexValue, DropCounter×2,
Counted) updated.
3. **AtomicLen for node `len` fields** — DONE (commit 2f79c3a). Both
LeafNode and InternalNode `len` are now `AtomicLen`; ~40 access sites
migrated. `len_raw` uses `AtomicLen::load_raw` (Acquire-load via raw
projection).
4. **Add atomic mirror fields to LeafNode + propagate K/V OptimisticRead
bounds** — DONE (commit 1b0c873 + 8c68aaa). `LeafNode` has
`atomic_keys: SlotArray<K::Slot, LC>` and
`atomic_values: SlotArray<V::Slot, LC>` fields, allocated empty.
`GenericTree<K, V, IC, LC>` now bounds K, V on `OptimisticRead`.
5. **Write-path mirror maintenance** — ATTEMPTED, NOT VIABLE (stashed).
Tried updating `insert_at`, `remove_at`, `split`, `merge` to maintain
both `entries` and the mirror. Three blockers surfaced:
- `Arc::strong_count` test assertions break because the mirror's
clone doubles the held refcount per entry.
- The overwrite path in `iter::insert` does `std::mem::replace` on
`entries` without touching the mirror — mirror entries become
stale after overwrite.
- The displaced mirror entries from `remove_at` require
`eg: &epoch::Guard` plumbing through 20+ call sites; tractable
but doubles write cost for no semantic gain.
**Conclusion:** the dual-storage approach is unworkable. The right
path is **full migration**: drop `entries` and use `atomic_keys` +
`atomic_values` as the sole storage.
6. **Full LeafNode storage migration** — TODO. The substantive remaining
work. Drop the `entries` field. Rewrite every method on LeafNode to
use the atomic storage.
**Foundation provided in commit 0de2e92:**
[`OptimisticSlot::load_into`](src/atomic_slot.rs) returns
`&T` materialised from atomic storage:
- For [`BoxedSlot`](src/atomic_slot.rs), the borrow points into the
`Box<T>` directly — no clone, valid for the surrounding lock.
- For [`InlineSlot`](src/atomic_slot.rs), the value's bits are
loaded into a caller-provided `MaybeUninit<T>` buffer.
**Migration mechanics:**
- Closure-based reads (`Tree::lookup`): supply a stack-local
`MaybeUninit<V>` buffer; `load_into` returns `&V` for the
closure call. Zero clone for boxed V, zero clone for inline V
(just a u64 copy into buf for primitives).
- Iterators (`RawSharedIter::next`, `Range::next`, etc.): add
`MaybeUninit<K>` / `MaybeUninit<V>` buffer fields on the iterator
struct. Each `next()` calls `load_into`, then returns the borrow.
The iterator's `Option<(&K, &V)>` API stays intact; the
references' lifetimes are tied to `&mut self` of `next()`.
- For mut iterators (`RawExclusiveIter` `&mut V` returns), the same
buffer pattern. Mutations to the buffer are written back to the
atomic slot via `swap_init`. The displaced value is routed
through `eg.defer` on the iterator's pinned epoch guard.
**Key insight (re-read of issue scope):** shared-lock leaf body
accesses (`Tree::lookup`'s closure call, `iter::next` returning
`&K, &V`) are already memory-model-synchronised by the leaf's
shared/exclusive lock — they are NOT the data race documented in
#7. The data race documented in the issue is at:
(a) the optimistic-fast-path leaf reads (no lock at all), and
(b) the internal-node descent (held under optimistic guard, not
shared lock).
So the migration's job is to make the underlying storage atomic
(so loads/stores commute at the memory-model level) without
changing the shared-lock-reader API surface. `load_into` does
exactly that.
7. **InternalNode K migration** — TODO. Replace
`keys: InlineVec<K, IC>` with `keys: SlotArray<K::Slot, IC>`.
`lower_bound_raw` switches from `ptr::read(K)` to atomic load via
`SlotArray::load_raw`.
8. **`find_shared_leaf_and_optimistic_parent` raw-projection migration**
— TODO. Replace the `match *target_guard` (which reborrows
`&Node` / `&InternalNode`) with `Node::variant_raw` +
`InternalNode::lower_bound_raw` + `InternalNode::edge_at_raw`.
9. **Move quarantined tests into `--lib`. Add new concurrent splits test.**
TODO. Validates the fix under Miri.
10. **Update src/optimistic.rs module docs.** TODO. Remove "out of scope"
carve-outs; document the fix.
## Estimated effort
Honest assessment based on the per-step blast radius:
- Step 2: ~2-3 hours (trait change + impls + test-fixture impls).
- Step 3: ~6-10 hours (LeafNode is the bulk of the work).
- Step 4: ~4-6 hours.
- Step 5: ~2-4 hours.
- Step 6: ~3-5 hours (new test design + Miri tuning).
- Step 7: ~1 hour.
Total: 18-29 hours of focused engineering. This is a multi-session PR.
## Current state (end of working session)
Commits landed on this branch (10 total):
| 1 | c6d97b7 | Phase 1 — `src/atomic_slot.rs` primitives + `OptimisticRead::Slot` |
| 2 | 2f79c3a | `AtomicLen` for both node `len` fields |
| 3 | 1b0c873 | Mirror fields + `K/V: OptimisticRead` bounds propagated |
| 4 | 8c68aaa | Clean up unused-import warnings |
| 5 | d98aa73 | Plan update — dual-storage lessons |
| 6 | 0de2e92 | `OptimisticSlot::load_into` for buffer-based borrows |
| 7 | 4f3a7c7 | Plan refinement: shared-lock leaf paths already synced |
| 8 | 1129e21 | Maintain atomic mirror in LeafNode writers (insert_at, remove_at, split, merge, swap_value_at) |
| 9 | fef306d | Read optimistic-fast-path V and K from atomic mirror; move the two quarantined stress tests into `--lib` |
| 10 | 897ba51 | Wrap SlotArray's inner array in UnsafeCell for Tree-Borrows safety |
### What's verified
- 258 lib tests + ~200 integration tests + 38 doctests pass on every commit.
- 18 `atomic_slot` unit tests pass under
`MIRIFLAGS='-Zmiri-tree-borrows ...' cargo +nightly miri test --lib`,
including 6 concurrent tests that exercise reader/writer pairings of
`InlineSlot`, `BoxedSlot`, `SlotArray::shift_*`, `AtomicLen`, and
`OptimisticOption`.
- The single-threaded `epoch_deferred_drop_basic` test passes under
Miri `--lib` — proving the atomic-mirror primitives integrate
cleanly with `Tree::insert_defer` / `lookup_optimistic` /
`remove_defer` in non-concurrent use.
### What still fails
The two formerly-quarantined concurrent tests
(`epoch_deferred_drop_optimistic_reader_vs_defer_writer`,
`k_deferred_drop_optimistic_reader_vs_defer_writer`) pass under
`cargo test --lib` but fail under
`cargo +nightly miri test --lib` with a Tree-Borrows-experimental
"foreign tag" violation. The pattern: writer holds `&mut LeafNode`
via the exclusive lock guard; the leaf's `&mut` tag protects the
entire leaf memory tree; the optimistic reader's `&AtomicPtr<T>`
reborrow inside `BoxedSlot::try_load_raw_ptr` (or `&AtomicU16` for
inline storage) is foreign to the writer's tag. Tree Borrows treats
this as UB even though the underlying operations are stdlib atomic
ops on UnsafeCell-typed memory.
**Note:** the data-race-detector aspect — the explicit concern in
the issue text — is satisfied. The reader/writer pairings are
`AtomicPtr::load(Acquire)` vs `AtomicPtr::store(Release)` /
`swap(AcqRel)`, so memory-model commutativity holds. Real
concurrent execution under `cargo test`, ASan, and TSan is clean.
The remaining failure is specifically Miri's experimental Tree
Borrows enforcement of `&` reborrow scoping.
### Path to fully closing #7
Closing the Tree-Borrows residue requires moving the writer's leaf
mutation off the `&mut LeafNode` path:
- `ExclusiveGuard` would need to expose `as_ptr_mut() -> *mut LeafNode`
rather than (or in addition to) `Deref<Target = LeafNode>`.
- Writer methods (`LeafNode::insert_at`, `remove_at`, `split`,
`merge`, `swap_value_at`) would take `*mut Self` instead of
`&mut self`, and project to `atomic_keys` / `atomic_values` via
`ptr::addr_of_mut!`.
- `SlotArray` write methods (`shift_insert`, `shift_remove`,
`swap_init`, etc.) would take `*const Self` and use
`OptimisticSlot::*_raw_ptr` variants that go directly to the
slot's inner atomic (`AtomicPtr<T>` for boxed,
`T::Atomic` for inline) without ever materialising `&Slot`.
The `*_raw_ptr` variants on `OptimisticSlot` are partly in place
(`try_load_raw_ptr` is overridden on both `InlineSlot` and
`BoxedSlot`); the corresponding write-side overrides plus the
LeafNode/writer-path refactor are the remaining surgical work. The
trait architecture is ready to receive them.
After this, internal-node K migration follows the same pattern,
and `find_shared_leaf_and_optimistic_parent` migrates to the same
raw-projection descent the optimistic path already uses.
## Risks not yet resolved
- **`&'static str` V:** boxing it adds an allocation. Most existing test
cases use `&str` only for literals; revisit if benchmarks complain.
- **AtomicLoadable for `bool`:** invalid bit patterns make a torn load
potentially UB even atomically. Recommendation: omit from inline list;
if needed by users, box it.
- **`InlineVec` len atomicisation:** simplest path is to add a
`raw_len_atomic` returning `AtomicU16::load(Acquire)`, deprecate the
non-atomic `raw_len`, and migrate callers.
- **Generic Tree height (`AtomicUsize`):** already atomic; no change.