nexus-slab

A high-performance slab allocator for stable memory addresses without heap allocation overhead.

What Is This?

nexus-slab is a custom allocator pattern—not a replacement for Rust's global allocator, but a specialized allocator for specific use cases where you need:

Stable memory addresses - pointers remain valid until explicitly freed
Box-like semantics without Box - RAII ownership with pre-allocated backing storage
Node-based data structures - linked lists, trees, graphs with internal pointers
Predictable tail latency - no reallocation spikes during growth

Think of Slot<T> as analogous to Box<T>: an owning handle that provides access to a value and deallocates on drop. The difference is that Box allocates from the heap on every call, while Slot allocates from a pre-allocated slab—making allocation O(1) with no syscalls.

Quick Start

use nexus_slab::create_allocator;

// Define an allocator for your type
create_allocator!(order_alloc, Order);

// Initialize at startup (once per thread)
order_alloc::init().bounded(1024).build();

// Insert returns an 8-byte RAII Slot
let slot = order_alloc::insert(Order::new());
assert_eq!(slot.price, 100);

// Modify through the slot
slot.get_mut().quantity = 50;

// Slot auto-deallocates on drop
drop(slot);
assert_eq!(order_alloc::len(), 0);

Performance

All measurements in CPU cycles. See BENCHMARKS.md for methodology.

Macro API vs slab crate (p50)

Operation	Slot API	Key-based	slab crate	Notes
GET	2	3	3	Direct pointer, no lookup
GET (hot)	1	-	2	ILP - CPU pipelines loads
GET_MUT	2	2	3	Direct pointer
INSERT	8	-	4	+4 cycles TLS overhead
REMOVE	4	-	3	TLS overhead
REPLACE	2	-	4	Direct pointer, no lookup
CONTAINS	2	3	2	slot.is_valid() fastest

Key insight: The TLS lookup adds ~4 cycles to INSERT/REMOVE, but access operations (GET/REPLACE) have zero overhead because Slot caches the pointer. For access-heavy workloads, this is a net win.

Full Lifecycle Cost

	Direct API	Macro API	Delta
INSERT	7	11	+4
GET	2	2	0
REMOVE	8	5	-3
Total	17	18	+1

One cycle per object lifecycle for the ergonomics of a global allocator pattern.

vs Box (Isolation Advantage)

The killer feature: slab is isolated from the global allocator. In production, Box::new() shares malloc with everything else. Your slab is yours alone.

Hot Cache (realistic steady-state)

Size	Box p50	Slab p50	Box p99	Slab p99	Box p99.9	Slab p99.9
64B	12	9	20	12	48	19
256B	15	16	48	25	80	50
4096B	105	62	149	71	209	129

Cold Cache (single-op, true first-access latency)

Size	Box p50	Slab p50	Box p99	Slab p99
64B	132	126	334	218
256B	166	122	336	230

Cold Cache (batched burst pattern)

Size	Box p50	Slab p50	Box p99	Slab p99
64B	51	52	71	65
256B	58	62	94	80
4096B	181	120	270	166

Key findings:

Hot cache: Slab 1.7x faster at p50 for 4096B (62 vs 105 cycles)
Cold single-op: Slab 27% faster at p50 for 256B (122 vs 166 cycles)
Cold batched 4096B: Slab 1.5x faster at p50 (120 vs 181 cycles)
Tail latency: Slab consistently 1.5-2x better at p99

See BENCHMARKS.md for full methodology and stress test results.

Use Cases

Node-Based Data Structures

use nexus_slab::{create_allocator, Key};

struct Node {
    value: i32,
    next: Key,  // 4 bytes, not 8 for Option<Box<Node>>
    prev: Key,
}

create_allocator!(node_alloc, Node);

fn build_list() {
    node_alloc::init().bounded(1000).build();

    let head = node_alloc::insert(Node {
        value: 1,
        next: Key::NONE,
        prev: Key::NONE,
    });

    // Keys are stable - safe to store in other nodes
    let head_key = head.key();

    let tail = node_alloc::insert(Node {
        value: 2,
        next: Key::NONE,
        prev: head_key,
    });

    head.get_mut().next = tail.key();
}

Stable Memory Addresses

create_allocator!(buffer_alloc, [u8; 4096]);

fn get_stable_buffer() -> *const u8 {
    buffer_alloc::init().bounded(100).build();

    let slot = buffer_alloc::insert([0u8; 4096]);
    let ptr = slot.as_ptr() as *const u8;

    // Pointer remains valid as long as slot exists
    // No reallocation, no movement

    let _ = slot.leak();  // Keep alive, return key for later cleanup
    ptr
}

Key Serialization

Keys can be converted to/from u32 for external storage:

let slot = my_alloc::insert(value);
let key = slot.leak();

// Serialize
let raw: u32 = key.into_raw();
store_somewhere(raw);

// Deserialize
let raw = load_from_somewhere();
let key = Key::from_raw(raw);

// Access (caller must ensure validity)
let value = unsafe { my_alloc::get_unchecked(key) };

Warning: Keys are simple indices with no generation counter. If you store keys externally (databases, wire protocols), you must ensure the key is still valid before use. For wire protocols, prefer authoritative external identifiers (exchange order IDs, database primary keys) and use the slab key only for internal indexing.

API

Allocator Module (generated by `create_allocator!`)

Function	Returns	Description
`init()`	`Builder`	Start configuring the allocator
`insert(value)`	`Slot`	Insert, panics if full
`try_insert(value)`	`Option<Slot>`	Insert, returns None if full
`contains_key(key)`	`bool`	Check if key is valid
`get_unchecked(key)`	`&'static T`	Get by key (unsafe)
`get_unchecked_mut(key)`	`&'static mut T`	Get mut by key (unsafe)
`len()` / `capacity()`	`usize`	Slot counts
`is_empty()`	`bool`	Check if empty
`is_initialized()`	`bool`	Check if init() was called
`shutdown()`	`Result<(), SlotsRemaining>`	Shutdown (must be empty)

Slot (8 bytes)

Method	Returns	Description
`get()`	`&T`	Borrow the value
`get_mut()`	`&mut T`	Mutably borrow the value
`replace(value)`	`T`	Swap value, return old
`into_inner()`	`T`	Remove and return value
`key()`	`Key`	Get the key for this slot
`leak()`	`Key`	Keep alive, return key
`is_valid()`	`bool`	Check if slot is still valid
`as_ptr()`	`*const T`	Raw pointer to value
`as_mut_ptr()`	`*mut T`	Mutable raw pointer

Slot implements Deref and DerefMut for ergonomic access.

Key (4 bytes)

Method	Returns	Description
`index()`	`u32`	The slot index
`into_raw()`	`u32`	For serialization
`from_raw(u32)`	`Key`	From serialized value
`is_none()`	`bool`	Check if sentinel
`is_some()`	`bool`	Check if valid
`Key::NONE`	`Key`	Sentinel value

Builder Pattern

// Bounded: fixed capacity, returns None when full
my_alloc::init()
    .bounded(1024)
    .build();

// Unbounded: grows by adding chunks (no copying)
my_alloc::init()
    .unbounded()
    .chunk_capacity(4096)  // slots per chunk
    .capacity(10_000)      // pre-allocate
    .build();

Bounded vs Unbounded

	Bounded	Unbounded
Growth	Fixed capacity	Adds chunks
Full behavior	Returns `None`	Always succeeds
Tail latency	Deterministic	+2-4 cycles chunk lookup
Use case	Known capacity	Unknown/variable load

Use bounded when capacity is known—it's faster and fully deterministic.

Use unbounded when you need overflow headroom without Vec reallocation spikes.

Architecture

Slot Design

Each Slot is 8 bytes (single pointer). The VTable for slab operations is stored in thread-local storage:

Slot (8 bytes):
┌─────────────────────────┐
│ *mut SlotCell<T>        │  ← Direct pointer to value
└─────────────────────────┘

TLS (per allocator):
┌─────────────────────────┐
│ *const VTable<T>        │  ← Cached for fast access
└─────────────────────────┘

This design gives:

8-byte handles (vs 16+ for pointer+vtable designs)
Zero-cost access (GET/REPLACE don't touch TLS)
RAII semantics (drop returns slot to freelist)

Memory Layout

Slab (contiguous allocation):
┌──────────────────────────────────────────┐
│ SlotCell 0: [stamp: u64][value: T]       │
│ SlotCell 1: [stamp: u64][value: T]       │
│ ...                                       │
│ SlotCell N: [stamp: u64][value: T]       │
└──────────────────────────────────────────┘

No Generational Indices

Keys are simple indices. This is intentional—see the Key documentation for rationale.

TL;DR: Your data has authoritative external identifiers (exchange order IDs, database keys). You validate against those anyway. Generational indices add ~8 cycles to catch bugs that domain validation already catches.

Thread Safety

Each thread has its own allocator instance. The allocator is !Send and !Sync.

Do not store Slot in thread_local!. Rust drops stack variables before TLS, so stack slots drop correctly. But if both Slot and the slab are in TLS, drop order is unspecified.

Direct API (Advanced)

For cases where the macro API doesn't fit (multiple slabs, dynamic creation), use the direct API:

use nexus_slab::bounded::BoundedSlab;

let slab = BoundedSlab::with_capacity(1024);
let slot = slab.insert(42).unwrap();
assert_eq!(*slot.get(), 42);

See the bounded and unbounded modules.

License

MIT OR Apache-2.0

nexus-slab 0.10.2