# TODO

- Eliminate vec of chunk mutators. This should be a linked list

- When expanding a vec that's in an oversized chunk, we should return the
previous chunk back to the system allocator instead of holding onto it in the
retired chunk list.

- **Remove the `capacity` field from `LocalChunk` / `SharedChunk` headers**
  (saves 8 bytes per chunk header). The field is redundant with the
  slice-tail length metadata of the DST `data: [UnsafeCell<u8>]`, except on
  the cache-pop path: chunks sit on the provider freelist linked via a
  thin `*mut u8` and `header_to_fat` (`src/internal/local_chunk.rs:185`,
  matching code in `src/internal/shared_chunk.rs`) currently reads the
  stored `capacity` to reconstruct the slice length. The provider already
  pins each freelist to a single class (`local_cache_class` /
  `shared_cache_class` in `src/internal/chunk_provider.rs`), so the size is
  known by context. Thread `SizeClass` through the pop path, reconstruct
  `cap = class.bytes() - header_size()` in `header_to_fat`, do the same
  for `SharedChunk`, and update `destroy` callers (which need the layout)
  to take the class too. Verify with the gungraun benchmarks that the
  header shrink doesn't move alignment / payload-offset and that the
  size-class plumbing is not on a hot path.

- **Batched shared-chunk refcount increments on Arc/Box allocation.** Each
  `Arc`/`Box` allocation currently performs an atomic `fetch_add(1, Relaxed)`
  on the owning `SharedChunk`'s `ref_count` (see `acquire_shared_chunk_ref` in
  `src/arena/alloc_value.rs` calling `SharedChunk::inc_ref` in
  `src/internal/shared_chunk.rs`). The previous generation of the crate
  avoided this by accumulating refcount increments locally in the active
  shared mutator and flushing them to the atomic counter only at
  chunk-transition / mutator-drop time. Bring this back: add a
  `pending_refs: usize` counter to the shared `ChunkMutator`; have
  `acquire_shared_chunk_ref` bump it instead of calling `inc_ref` per
  allocation; flush via a single `fetch_add(pending_refs, Relaxed)` when the
  mutator is uninstalled or dropped. Teardown / `dec_ref` must observe that
  the chunk's effective refcount is `atomic_ref_count + pending_refs` while
  the mutator is installed, and the overflow guard must still trip when the
  combined value would saturate.

## Unsafe-block-reduction opportunities (analysis 2026-06-06)

### Medium confidence (sound, but verify ordering/lifetimes)

- **Hoist `Arc::from_raw` / `Box::from_raw` out of the alloc retry loops.** The
  shared-slice and uninit retry loops construct the smart pointer separately in
  each exit arm (current-chunk fast path, oversized closure, post-refill):
  `src/arena/alloc_slice_arc.rs:170,183,217,228,253`,
  `src/arena/alloc_value.rs:603,625,662,675` (uninit arc/slice-arc), and
  `src/arena/alloc_unsized.rs:237,251` (dst box). Refactor each loop to `break`
  with the raw `NonNull` payload pointer and perform a single
  `Arc::from_raw`/`Box::from_raw` after the loop. CAVEAT: the `ChunkRef::forget()`
  that retains the fresh `+1` refcount, and any `publish_drop_count()` /
  stats recording, must still happen *before* the `break` in every arm so the
  adopted reference survives to the single construction point.
  Estimated: **net -6 blocks** across the three files.

- **Centralize the "initialized `NonNull` -> `&'a mut`" reborrows in
  `uninit.rs`.** `src/internal/uninit.rs:89,146,200,249,339,450` repeat
  `unsafe { ptr.as_mut() }` / `unsafe { &mut *ptr.as_ptr() }` at the end of the
  init paths. Add one private helper `fn initialized_mut<'a, T: ?Sized>(ptr:
  NonNull<T>) -> &'a mut T` that holds the single `unsafe`. CAVEAT: the helper
  forges the `'a` lifetime, so it must stay private to `uninit.rs` and only be
  called after the ticket has been consumed and the value fully initialized
  (same precondition the call sites already satisfy).
  Estimated: **net -5 blocks**.

- **Encapsulate the `DropEntry::placeholder` raw writes in `chunk_mutator.rs`.**
  `src/internal/chunk_mutator.rs:352-354,376-378,424-426,517-519` all do
  `unsafe { core::ptr::write(drop_slot.as_ptr(), DropEntry::placeholder(...)) }`.
  Add a private `fn write_drop_placeholder(drop_slot, value_offset, len)`
  holding the single `unsafe`. CAVEAT: this is a *safe* fn wrapping an unchecked
  write, sound only because every caller passes a freshly reserved, aligned,
  exclusively-owned slot from `try_reserve_drop_entry`; keep it private and
  document that invariant on the helper.
  Estimated: **net -3 blocks**.

- **Drop `chunk_ptr_unchecked` (`unwrap_unchecked`) in favor of an early
  `self.chunk?`.** `src/internal/chunk_mutator.rs:155-157` plus call sites `229`,
  `252` rely on the sentinel "empty mutator" proof to justify
  `unsafe { self.chunk.unwrap_unchecked() }`. Take `let chunk = self.chunk?;`
  up front; the subsequent `try_alloc*` already returns `None` for the empty
  mutator, so behavior is preserved without the unchecked unwrap. CAVEAT: confirm
  the `None`-propagation matches the sentinel behavior exactly for the empty
  mutator before removing the helper.
  Estimated: **net -3 blocks**.

### Lower value

- **`PrefixedUtf16Ptr` newtype** wrapping the length-prefixed `NonNull<u16>` in
  `src/strings/arc_utf16_str.rs:66,76` and `src/strings/box_utf16_str.rs:61,70,98`,
  with `len()` / `as_utf16_str()` / `as_mut_utf16_str()` methods holding the
  `read_prefix_len` + `from_raw_parts` + `from_slice_unchecked` unsafe. Sound only
  if constructed exclusively via the existing unsafe `from_raw` paths.
  Estimated: **net -2 blocks**.

### Deferred (perf-risk)

- **Consolidate try-current/oversized/refill loops.** `impl_alloc_local_with`,
  `impl_alloc_smart_with`, the slice-Arc copy/fill loops, prefixed shared
  loop, UTF-16 transcoding loop, and DST Box/Arc smart loops repeat the
  same "try current reservation; route oversized; refill and retry" shape.
  Each is `#[inline(always)]` on the allocation fast path; structural
  differences (local vs shared, with-drop vs without, ZST/uninit/zeroed
  branches, stats recording, slot-init helpers, different smart-pointer
  constructions) make a single macro/closure abstraction either fragile
  (closure-state capture risks codegen drift) or unwieldy (a macro with
  many positional knobs hurts readability without saving meaningful
  unsafe). Deferred to keep gungraun instruction counts stable. See
  simplification-report item 1.2.

## Simplification opportunities (analysis 2026-06-08)

### High-confidence wins (mechanical, no risk)

- **Dedup the "prefixed slice" arithmetic in `chunk_mutator.rs`.**
  `try_alloc_uninit_slice_prefixed` (`src/internal/chunk_mutator.rs:290-320`)
  and `try_alloc_uninit_slice_with_drop_prefixed`
  (`src/internal/chunk_mutator.rs:381-417`) compute the same
  `prefix_size / payload_offset / payload_bytes / total` and run the same
  unsafe block writing the prefix word and projecting the payload
  `NonNull`. Extract `fn try_alloc_prefixed_payload<T>(&self, len: usize)
  -> Option<(InChunk<…>, /*payload_addr*/ usize)>` that owns the layout
  math + prefix write; the two callers add only the drop-entry plumbing.
  Touches one file, ~25 lines.

- **Make `ChunkMutator::from_owned` reuse the payload-range math.**
  `from_owned` (`src/internal/chunk_mutator.rs:~65-85`) re-derives
  `start_addr / aligned_end_addr / aligned_end_offset` from `payload_ptr`
  / `capacity`; `payload_range`
  (`src/internal/chunk_mutator.rs:122-133`) already encapsulates exactly
  that calculation. Add a `payload_range_for(chunk)` taking a
  `NonNull<C>` (or set `self.chunk` first and call `payload_range()`).
  Touches one file, ~10 lines.

### Medium-confidence (worth doing, slight refactor)

- **Unify `release_local` / `release_shared` cache-bypass branches.**
  `src/internal/chunk_provider.rs:466-487` vs `494-505`. Same
  structure: read total → if uncacheable or below floor, destroy +
  release_bytes → else push to cache. Differences: which floor atomic
  (`Acquire` vs `Relaxed`), which `destroy`, single-threaded
  `local_cache.with` vs `push_shared`. Extract
  `fn should_bypass_cache(&self, total: usize, floor: &AtomicU8, ord:
  Ordering) -> bool` for at least the decision. The `Ordering`
  difference is real and intentional; pass it as a parameter.

- **Unify `acquire_normal_local` / `acquire_normal_shared`.**
  `src/internal/chunk_provider.rs:267-292` vs `377-393`. Same
  "advance floor if needed, pop cache, else allocate fresh" shape.
  Extract the floor-bump/pop control flow; pass flavor-specific pop /
  reinit / allocate-fresh closures. Slight closure overhead — verify
  codegen still inlines on the hot path.

- **Dedup `allocate_fresh_local` / `allocate_fresh_shared`.**
  `src/internal/chunk_provider.rs:336-351` vs `441-456`. Identical
  reserve-bytes / allocate / release-on-error scaffolding. Introduce
  `fn allocate_with_budget<F: FnOnce() -> Result<R, AllocError>>(&self,
  total: usize, build: F) -> Result<R, AllocError>` that handles the
  budget rollback. ~20 lines net reduction.

### Speculative (needs perf validation)

- **Collapse the uninit-slice allocation family.** `try_alloc_bytes`,
  `try_alloc_uninit_slice`, `try_alloc_uninit_slice_prefixed` in
  `src/internal/chunk_mutator.rs:245-320` share "compute size, reserve,
  convert ticket" with different ticket shapes. Could be split into a
  low-level `reserve_bytes_for_slice` + thin wrappers. Risk: hottest
  alloc paths; even minor codegen drift could move the gungraun
  benchmark numbers. Do not land without before/after callgrind data.