nexus-slab 0.10.2

A high-performance slab allocator optimized for predictable tail latency
Documentation

nexus-slab

A high-performance slab allocator for stable memory addresses without heap allocation overhead.

What Is This?

nexus-slab is a custom allocator pattern—not a replacement for Rust's global allocator, but a specialized allocator for specific use cases where you need:

  • Stable memory addresses - pointers remain valid until explicitly freed
  • Box-like semantics without Box - RAII ownership with pre-allocated backing storage
  • Node-based data structures - linked lists, trees, graphs with internal pointers
  • Predictable tail latency - no reallocation spikes during growth

Think of Slot<T> as analogous to Box<T>: an owning handle that provides access to a value and deallocates on drop. The difference is that Box allocates from the heap on every call, while Slot allocates from a pre-allocated slab—making allocation O(1) with no syscalls.

Quick Start

use nexus_slab::create_allocator;

// Define an allocator for your type
create_allocator!(order_alloc, Order);

// Initialize at startup (once per thread)
order_alloc::init().bounded(1024).build();

// Insert returns an 8-byte RAII Slot
let slot = order_alloc::insert(Order::new());
assert_eq!(slot.price, 100);

// Modify through the slot
slot.get_mut().quantity = 50;

// Slot auto-deallocates on drop
drop(slot);
assert_eq!(order_alloc::len(), 0);

Performance

All measurements in CPU cycles. See BENCHMARKS.md for methodology.

Macro API vs slab crate (p50)

Operation Slot API Key-based slab crate Notes
GET 2 3 3 Direct pointer, no lookup
GET (hot) 1 - 2 ILP - CPU pipelines loads
GET_MUT 2 2 3 Direct pointer
INSERT 8 - 4 +4 cycles TLS overhead
REMOVE 4 - 3 TLS overhead
REPLACE 2 - 4 Direct pointer, no lookup
CONTAINS 2 3 2 slot.is_valid() fastest

Key insight: The TLS lookup adds ~4 cycles to INSERT/REMOVE, but access operations (GET/REPLACE) have zero overhead because Slot caches the pointer. For access-heavy workloads, this is a net win.

Full Lifecycle Cost

Direct API Macro API Delta
INSERT 7 11 +4
GET 2 2 0
REMOVE 8 5 -3
Total 17 18 +1

One cycle per object lifecycle for the ergonomics of a global allocator pattern.

vs Box (Isolation Advantage)

The killer feature: slab is isolated from the global allocator. In production, Box::new() shares malloc with everything else. Your slab is yours alone.

Hot Cache (realistic steady-state)

Size Box p50 Slab p50 Box p99 Slab p99 Box p99.9 Slab p99.9
64B 12 9 20 12 48 19
256B 15 16 48 25 80 50
4096B 105 62 149 71 209 129

Cold Cache (single-op, true first-access latency)

Size Box p50 Slab p50 Box p99 Slab p99
64B 132 126 334 218
256B 166 122 336 230

Cold Cache (batched burst pattern)

Size Box p50 Slab p50 Box p99 Slab p99
64B 51 52 71 65
256B 58 62 94 80
4096B 181 120 270 166

Key findings:

  • Hot cache: Slab 1.7x faster at p50 for 4096B (62 vs 105 cycles)
  • Cold single-op: Slab 27% faster at p50 for 256B (122 vs 166 cycles)
  • Cold batched 4096B: Slab 1.5x faster at p50 (120 vs 181 cycles)
  • Tail latency: Slab consistently 1.5-2x better at p99

See BENCHMARKS.md for full methodology and stress test results.

Use Cases

Node-Based Data Structures

use nexus_slab::{create_allocator, Key};

struct Node {
    value: i32,
    next: Key,  // 4 bytes, not 8 for Option<Box<Node>>
    prev: Key,
}

create_allocator!(node_alloc, Node);

fn build_list() {
    node_alloc::init().bounded(1000).build();

    let head = node_alloc::insert(Node {
        value: 1,
        next: Key::NONE,
        prev: Key::NONE,
    });

    // Keys are stable - safe to store in other nodes
    let head_key = head.key();

    let tail = node_alloc::insert(Node {
        value: 2,
        next: Key::NONE,
        prev: head_key,
    });

    head.get_mut().next = tail.key();
}

Stable Memory Addresses

create_allocator!(buffer_alloc, [u8; 4096]);

fn get_stable_buffer() -> *const u8 {
    buffer_alloc::init().bounded(100).build();

    let slot = buffer_alloc::insert([0u8; 4096]);
    let ptr = slot.as_ptr() as *const u8;

    // Pointer remains valid as long as slot exists
    // No reallocation, no movement

    let _ = slot.leak();  // Keep alive, return key for later cleanup
    ptr
}

Key Serialization

Keys can be converted to/from u32 for external storage:

let slot = my_alloc::insert(value);
let key = slot.leak();

// Serialize
let raw: u32 = key.into_raw();
store_somewhere(raw);

// Deserialize
let raw = load_from_somewhere();
let key = Key::from_raw(raw);

// Access (caller must ensure validity)
let value = unsafe { my_alloc::get_unchecked(key) };

Warning: Keys are simple indices with no generation counter. If you store keys externally (databases, wire protocols), you must ensure the key is still valid before use. For wire protocols, prefer authoritative external identifiers (exchange order IDs, database primary keys) and use the slab key only for internal indexing.

API

Allocator Module (generated by create_allocator!)

Function Returns Description
init() Builder Start configuring the allocator
insert(value) Slot Insert, panics if full
try_insert(value) Option<Slot> Insert, returns None if full
contains_key(key) bool Check if key is valid
get_unchecked(key) &'static T Get by key (unsafe)
get_unchecked_mut(key) &'static mut T Get mut by key (unsafe)
len() / capacity() usize Slot counts
is_empty() bool Check if empty
is_initialized() bool Check if init() was called
shutdown() Result<(), SlotsRemaining> Shutdown (must be empty)

Slot (8 bytes)

Method Returns Description
get() &T Borrow the value
get_mut() &mut T Mutably borrow the value
replace(value) T Swap value, return old
into_inner() T Remove and return value
key() Key Get the key for this slot
leak() Key Keep alive, return key
is_valid() bool Check if slot is still valid
as_ptr() *const T Raw pointer to value
as_mut_ptr() *mut T Mutable raw pointer

Slot implements Deref and DerefMut for ergonomic access.

Key (4 bytes)

Method Returns Description
index() u32 The slot index
into_raw() u32 For serialization
from_raw(u32) Key From serialized value
is_none() bool Check if sentinel
is_some() bool Check if valid
Key::NONE Key Sentinel value

Builder Pattern

// Bounded: fixed capacity, returns None when full
my_alloc::init()
    .bounded(1024)
    .build();

// Unbounded: grows by adding chunks (no copying)
my_alloc::init()
    .unbounded()
    .chunk_capacity(4096)  // slots per chunk
    .capacity(10_000)      // pre-allocate
    .build();

Bounded vs Unbounded

Bounded Unbounded
Growth Fixed capacity Adds chunks
Full behavior Returns None Always succeeds
Tail latency Deterministic +2-4 cycles chunk lookup
Use case Known capacity Unknown/variable load

Use bounded when capacity is known—it's faster and fully deterministic.

Use unbounded when you need overflow headroom without Vec reallocation spikes.

Architecture

Slot Design

Each Slot is 8 bytes (single pointer). The VTable for slab operations is stored in thread-local storage:

Slot (8 bytes):
┌─────────────────────────┐
│ *mut SlotCell<T>        │  ← Direct pointer to value
└─────────────────────────┘

TLS (per allocator):
┌─────────────────────────┐
│ *const VTable<T>        │  ← Cached for fast access
└─────────────────────────┘

This design gives:

  • 8-byte handles (vs 16+ for pointer+vtable designs)
  • Zero-cost access (GET/REPLACE don't touch TLS)
  • RAII semantics (drop returns slot to freelist)

Memory Layout

Slab (contiguous allocation):
┌──────────────────────────────────────────┐
│ SlotCell 0: [stamp: u64][value: T]       │
│ SlotCell 1: [stamp: u64][value: T]       │
│ ...                                       │
│ SlotCell N: [stamp: u64][value: T]       │
└──────────────────────────────────────────┘

No Generational Indices

Keys are simple indices. This is intentional—see the Key documentation for rationale.

TL;DR: Your data has authoritative external identifiers (exchange order IDs, database keys). You validate against those anyway. Generational indices add ~8 cycles to catch bugs that domain validation already catches.

Thread Safety

Each thread has its own allocator instance. The allocator is !Send and !Sync.

Do not store Slot in thread_local!. Rust drops stack variables before TLS, so stack slots drop correctly. But if both Slot and the slab are in TLS, drop order is unspecified.

Direct API (Advanced)

For cases where the macro API doesn't fit (multiple slabs, dynamic creation), use the direct API:

use nexus_slab::bounded::BoundedSlab;

let slab = BoundedSlab::with_capacity(1024);
let slot = slab.insert(42).unwrap();
assert_eq!(*slot.get(), 42);

See the bounded and unbounded modules.

License

MIT OR Apache-2.0