nexus-slab

A high-performance slab allocator optimized for predictable tail latency.

Why nexus-slab?

Traditional slab allocators using Vec exhibit bimodal p999 latency — most operations are fast, but occasional reallocations cause multi-thousand cycle spikes. For latency-critical systems like trading engines, a single slow operation can mean a missed fill.

nexus-slab eliminates reallocation copying by using independently-allocated slabs. When growth occurs, only the new slab is allocated — existing data never moves.

Performance

Benchmarked on Intel Core Ultra 7 155H, pinned to a physical core. Growth phase (where realloc/new allocation happens):

Metric	nexus-slab	slab crate
p50	24 cycles	22 cycles
p99	26 cycles	24 cycles
p999	38-46 cycles	32-3700 cycles (bimodal)
max	500-800K cycles	1.5-2M cycles

Trade 2-4 cycles on median for 60x better worst-case p999.

Architecture

Two-Level Freelist

slabs_head ─► Slab 2 ─► Slab 0 ─► NONE      Slab 1 (full)
                │         │                     │
                ▼         ▼                     ▼
             [slots]   [slots]              [no space]

Slab freelist: O(1) access to a slab with available space
Slot freelist: Per-slab LIFO chain of freed slots

Memory Layout

Slab 0 (independent allocation)    Slab 1 (independent allocation)
┌─────────────────────────────┐    ┌─────────────────────────────┐
│ Slot 0: [tag|value]         │    │ Slot 0: [tag|value]         │
│ Slot 1: [tag|value]         │    │ Slot 1: [tag|value]         │
│ ...                         │    │ ...                         │
└─────────────────────────────┘    └─────────────────────────────┘

SlabMeta[] (small Vec)
┌─────────────────────────────┐
│ freelist_head: u32          │  ← Per-slab slot freelist
│ bump_cursor: u32            │  ← Sequential allocation pointer
│ occupied: u32               │  ← Slot count
│ next_free_slab: u32         │  ← Slab freelist chain
└─────────────────────────────┘

Each slab is allocated independently — no contiguous backing array that needs reallocation.

Allocation Strategy

Slab freelist head: O(1) access to a slab with space
Slot freelist (LIFO): Reuse recently-freed slots for cache locality
Bump allocation: Sequential allocation when slot freelist is empty
Pop exhausted slabs: Remove from slab freelist when full
Growth (dynamic mode): Allocate new slab when slab freelist is empty

Remove: LIFO Behavior

When removing from a full slab, that slab is pushed to the front of the slab freelist. This means:

The next insert uses cache-hot memory (just touched by the remove)
High-churn workloads stay concentrated in fewer slabs
Better cache utilization overall

Per-Slab Freelists

Unlike global freelists, each slab maintains its own chain:

Cache locality: Freed slots reused LIFO within the same slab
Independent draining: Slabs can be fully emptied without affecting others
No cross-slab pointers: Simpler invariants, no fragmentation across slabs

Why Direct mmap? (Unix)

On Unix systems, nexus-slab bypasses std::alloc and calls mmap directly:

	mmap	std::alloc
p999	38-46 cycles	50-60 cycles
max	500-800K cycles	1.5-2M cycles
Huge pages	✓	✗
mlock	✓	✗

The system allocator adds bookkeeping overhead. For the last 10-20% of tail latency, direct mmap wins. Non-Unix platforms fall back to std::alloc with the same API (minus huge_pages and mlock).

Usage

Dynamic Slab (grows as needed)

use nexus_slab::DynamicSlab;

// Pre-allocate for expected capacity
let mut slab: DynamicSlab<u64> = DynamicSlab::with_capacity(10_000)?;

// O(1) operations
let key = slab.insert(42);
assert_eq!(slab[key], 42);

let value = slab.remove(key);
assert_eq!(value, 42);

// Grows automatically when needed (allocates new slab, never copies)
for i in 0..100_000 {
    slab.insert(i);
}

Fixed Slab (pre-allocated, no growth)

use nexus_slab::{SlabBuilder, FixedSlab, Full};

// All memory allocated upfront
let mut slab: FixedSlab<Order> = SlabBuilder::new()
    .fixed()
    .capacity(100_000)
    .build()?;

// Returns Err(Full) when exhausted — no surprise allocations
match slab.try_insert(order) {
    Ok(key) => { /* ... */ }
    Err(Full(value)) => { /* handle backpressure */ }
}

Builder API

use nexus_slab::SlabBuilder;

let slab: DynamicSlab<Order> = SlabBuilder::new()
    .capacity(100_000)           // Initial capacity hint
    .slab_bytes(2 * 1024 * 1024) // 2MB slabs (matches huge page size)
    .huge_pages(true)            // Unix: use MAP_HUGETLB
    .mlock(true)                 // Unix: lock pages in RAM
    .build()?;

Memory Control (Unix)

// Huge pages reduce TLB pressure for large slabs
let slab: DynamicSlab<Order> = SlabBuilder::new()
    .capacity(1_000_000)
    .huge_pages(true)  // Requires: echo 512 > /proc/sys/vm/nr_hugepages
    .build()?;

// mlock prevents swap-induced latency spikes
let slab: DynamicSlab<Order> = SlabBuilder::new()
    .capacity(100_000)
    .mlock(true)  // Requires: CAP_IPC_LOCK or sufficient RLIMIT_MEMLOCK
    .build()?;

API

Core Operations

Method	DynamicSlab	FixedSlab	Description
`insert(value)`	`Key`	—	Insert, grow if needed (panics on alloc failure)
`try_insert(value)`	—	`Result<Key, Full>`	Insert, returns `Err` if full
`slab[key]`	`&T` / `&mut T`	`&T` / `&mut T`	Index access (panics if invalid)
`get(key)`	`Option<&T>`	`Option<&T>`	Get reference, `None` if invalid
`get_mut(key)`	`Option<&mut T>`	`Option<&mut T>`	Get mutable reference, `None` if invalid
`get_unchecked(key)`	`&T`	`&T`	Get reference without bounds check
`get_unchecked_mut(key)`	`&mut T`	`&mut T`	Get mutable reference without bounds check
`remove(key)`	`T`	`T`	Remove and return value (panics if invalid)
`contains(key)`	`bool`	`bool`	Check if key is occupied
`len()`	`usize`	`usize`	Number of occupied slots
`capacity()`	`usize`	`usize`	Total slots available
`clear()`	`()`	`()`	Remove all elements

Vacant Entry Pattern

For self-referential structures:

// DynamicSlab
let entry = slab.vacant_entry();
let key = entry.key();
entry.insert(Node { self_key: key, data: 42 });

// FixedSlab
if let Some(entry) = slab.try_vacant_entry() {
    let key = entry.key();
    entry.insert(Node { self_key: key, data: 42 });
}

Key Design

Keys encode (slab_index, slot_index) in a u64. Keys are valid until removed:

let key = slab.insert(42);
assert!(slab.contains(key));

slab.remove(key);
assert!(!slab.contains(key));  // Key no longer valid

Note: There is no generation counter — reusing a key after removal is undefined behavior. The caller is responsible for tracking key validity.

When to Use This

Good fit:

Systems requiring large slabs with low latency operations
Trading systems, matching engines
Real-time audio/video processing
Game servers with strict latency budgets
Any system where p999 matters more than a few cycles on p50

Use slab crate instead when:

Median performance is the priority
You don't need huge pages or mlock
Simpler dependency is preferred

Platform Support

Platform	Allocator	Huge Pages	mlock
Linux	mmap	✓	✓
macOS	mmap	✗	✓
Other	std::alloc	✗	✗

License

MIT OR Apache-2.0

nexus-slab 0.5.0