nexus-slab
A high-performance slab allocator optimized for predictable tail latency.
Why nexus-slab?
Traditional slab allocators using Vec exhibit bimodal p999 latency — most operations are fast, but occasional reallocations cause multi-thousand cycle spikes. For latency-critical systems like trading engines, a single slow operation can mean a missed fill.
nexus-slab eliminates reallocation copying by using independently-allocated slabs. When growth occurs, only the new slab is allocated — existing data never moves.
Performance
Benchmarked on Intel Core Ultra 7 155H, pinned to a physical core. Growth phase (where realloc/new allocation happens):
| Metric | nexus-slab | slab crate |
|---|---|---|
| p50 | 24 cycles | 22 cycles |
| p99 | 28 cycles | 24 cycles |
| p999 | 40-48 cycles | 30-2400 cycles (bimodal) |
| max | 500-800K cycles | 1.5-2M cycles |
Trade 2-4 cycles on median for 60x better worst-case p999.
Architecture
Memory Layout
Slab 0 (independent allocation) Slab 1 (independent allocation)
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ Slot 0: [tag|value] │ │ Slot 0: [tag|value] │
│ Slot 1: [tag|value] │ │ Slot 1: [tag|value] │
│ ... │ │ ... │
└─────────────────────────────┘ └─────────────────────────────┘
SlabMeta[] (small Vec)
┌─────────────────────────────┐
│ freelist_head: u32 │ ← Per-slab freelist for draining
│ bump_cursor: u32 │ ← Sequential allocation pointer
│ occupied: u32 │ ← Count for slab selection
└─────────────────────────────┘
Each slab is allocated independently — no contiguous backing array that needs reallocation.
Allocation Strategy
- Freelist first (LIFO): Check active slab's freelist for recently-freed slots
- Bump allocation: Sequential allocation when freelist is empty (fresh slab)
- Cross-slab fallback: Search other slabs when active slab exhausted
- Growth (dynamic mode): Allocate new slab when all are full
LIFO freelist reuse means inserts write to cache-hot memory, reducing cache misses on high-churn workloads.
Per-Slab Freelists
Unlike global freelists, each slab maintains its own chain. This enables:
- Cache locality: Freed slots are reused LIFO, so the next insert writes to cache-hot memory
- Slab affinity: After a remove, the active slab switches to where the free slot is, concentrating activity
- Independent growth: New slabs don't touch existing slab structures
Allocation priority:
- Freelist (LIFO): Reuse recently-freed slots from active slab
- Bump cursor: Allocate untouched slots if freelist is empty
- Cross-slab: Search other slabs when active slab is exhausted
Why Direct mmap? (Unix)
On Unix systems, nexus-slab bypasses std::alloc and calls mmap directly:
| mmap | std::alloc | |
|---|---|---|
| p999 | 40-48 cycles | 50-60 cycles |
| max | 500-800K cycles | 1.5-2M cycles |
| Huge pages | ✓ | ✗ |
| mlock | ✓ | ✗ |
The system allocator adds bookkeeping overhead. For the last 10-20% of tail latency, direct mmap wins. Non-Unix platforms fall back to std::alloc with the same API (minus huge_pages and mlock).
Usage
Dynamic Slab (grows as needed)
use DynamicSlab;
// Pre-allocate for expected capacity
let mut slab: = with_capacity?;
// O(1) operations
let key = slab.insert?;
assert_eq!;
let value = slab.remove;
assert_eq!;
// Grows automatically when needed (allocates new slab, never copies)
for i in 0..100_000
Fixed Slab (pre-allocated, no growth)
use FixedSlab;
// All memory allocated upfront
let mut slab: = with_capacity?;
// Returns Err(Full) when exhausted — no surprise allocations
match slab.insert
Builder API
use SlabBuilder;
let slab: = new
.capacity // Initial capacity hint
.slab_bytes // 2MB slabs (matches huge page size)
.huge_pages // Unix: use MAP_HUGETLB
.mlock // Unix: lock pages in RAM
.build?;
Memory Control (Unix)
// Huge pages reduce TLB pressure for large slabs
let slab: = new
.capacity
.huge_pages // Requires: echo 512 > /proc/sys/vm/nr_hugepages
.build?;
// mlock prevents swap-induced latency spikes
let slab: = new
.capacity
.mlock // Requires: CAP_IPC_LOCK or sufficient RLIMIT_MEMLOCK
.build?;
API
Core Operations
| Method | Complexity | Description |
|---|---|---|
insert(value) |
O(1) | Insert value, returns key |
get(key) |
O(1) | Get reference by key |
get_mut(key) |
O(1) | Get mutable reference |
remove(key) |
O(1) | Remove and return value |
contains(key) |
O(1) | Check if key is occupied |
len() |
O(1) | Number of occupied slots |
capacity() |
O(1) | Total slots available |
Vacant Entry Pattern
For self-referential structures:
let entry = slab.vacant_entry?;
let key = entry.key;
let node = Node ;
entry.insert;
Key Design
Keys encode (slab_index, slot_index, generation) in a u64. Generation counters detect use-after-remove:
let key = slab.insert?;
slab.remove;
slab.insert?; // Reuses same slot with new generation
assert!; // Old key returns None, not new value
When to Use This
Good fit:
- Trading systems, matching engines
- Real-time audio/video processing
- Game servers with strict latency budgets
- Any system where p999 matters more than p50
Use slab crate instead when:
- Median performance is the priority
- You don't need huge pages or mlock
- Simpler dependency is preferred
Platform Support
| Platform | Allocator | Huge Pages | mlock |
|---|---|---|---|
| Linux | mmap | ✓ | ✓ |
| macOS | mmap | ✗ | ✓ |
| Other | std::alloc | ✗ | ✗ |
License
MIT OR Apache-2.0