nexus-slab
A high-performance slab allocator optimized for predictable tail latency.
Why nexus-slab?
Traditional slab allocators using Vec exhibit bimodal p999 latency — most operations are fast, but occasional reallocations cause multi-thousand cycle spikes. For latency-critical systems like trading engines, a single slow operation can mean a missed fill.
nexus-slab eliminates reallocation copying by using independently-allocated slabs. When growth occurs, only the new slab is allocated — existing data never moves.
Performance
Benchmarked on Intel Core Ultra 7 155H, pinned to a physical core. Growth phase (where realloc/new allocation happens):
| Metric | nexus-slab | slab crate |
|---|---|---|
| p50 | 24 cycles | 22 cycles |
| p99 | 26 cycles | 24 cycles |
| p999 | 38-46 cycles | 32-3700 cycles (bimodal) |
| max | 500-800K cycles | 1.5-2M cycles |
Trade 2-4 cycles on median for 60x better worst-case p999.
Architecture
Two-Level Freelist
slabs_head ─► Slab 2 ─► Slab 0 ─► NONE Slab 1 (full)
│ │ │
▼ ▼ ▼
[slots] [slots] [no space]
- Slab freelist: O(1) access to a slab with available space
- Slot freelist: Per-slab LIFO chain of freed slots
Memory Layout
Slab 0 (independent allocation) Slab 1 (independent allocation)
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ Slot 0: [tag|value] │ │ Slot 0: [tag|value] │
│ Slot 1: [tag|value] │ │ Slot 1: [tag|value] │
│ ... │ │ ... │
└─────────────────────────────┘ └─────────────────────────────┘
SlabMeta[] (small Vec)
┌─────────────────────────────┐
│ freelist_head: u32 │ ← Per-slab slot freelist
│ bump_cursor: u32 │ ← Sequential allocation pointer
│ occupied: u32 │ ← Slot count
│ next_free_slab: u32 │ ← Slab freelist chain
└─────────────────────────────┘
Each slab is allocated independently — no contiguous backing array that needs reallocation.
Allocation Strategy
- Slab freelist head: O(1) access to a slab with space
- Slot freelist (LIFO): Reuse recently-freed slots for cache locality
- Bump allocation: Sequential allocation when slot freelist is empty
- Pop exhausted slabs: Remove from slab freelist when full
- Growth (dynamic mode): Allocate new slab when slab freelist is empty
Remove: LIFO Behavior
When removing from a full slab, that slab is pushed to the front of the slab freelist. This means:
- The next insert uses cache-hot memory (just touched by the remove)
- High-churn workloads stay concentrated in fewer slabs
- Better cache utilization overall
Per-Slab Freelists
Unlike global freelists, each slab maintains its own chain:
- Cache locality: Freed slots reused LIFO within the same slab
- Independent draining: Slabs can be fully emptied without affecting others
- No cross-slab pointers: Simpler invariants, no fragmentation across slabs
Why Direct mmap? (Unix)
On Unix systems, nexus-slab bypasses std::alloc and calls mmap directly:
| mmap | std::alloc | |
|---|---|---|
| p999 | 38-46 cycles | 50-60 cycles |
| max | 500-800K cycles | 1.5-2M cycles |
| Huge pages | ✓ | ✗ |
| mlock | ✓ | ✗ |
The system allocator adds bookkeeping overhead. For the last 10-20% of tail latency, direct mmap wins. Non-Unix platforms fall back to std::alloc with the same API (minus huge_pages and mlock).
Usage
Dynamic Slab (grows as needed)
use DynamicSlab;
// Pre-allocate for expected capacity
let mut slab: = with_capacity?;
// O(1) operations
let key = slab.insert;
assert_eq!;
let value = slab.remove;
assert_eq!;
// Grows automatically when needed (allocates new slab, never copies)
for i in 0..100_000
Fixed Slab (pre-allocated, no growth)
use ;
// All memory allocated upfront
let mut slab: = new
.fixed
.capacity
.build?;
// Returns Err(Full) when exhausted — no surprise allocations
match slab.try_insert
Builder API
use SlabBuilder;
let slab: = new
.capacity // Initial capacity hint
.slab_bytes // 2MB slabs (matches huge page size)
.huge_pages // Unix: use MAP_HUGETLB
.mlock // Unix: lock pages in RAM
.build?;
Memory Control (Unix)
// Huge pages reduce TLB pressure for large slabs
let slab: = new
.capacity
.huge_pages // Requires: echo 512 > /proc/sys/vm/nr_hugepages
.build?;
// mlock prevents swap-induced latency spikes
let slab: = new
.capacity
.mlock // Requires: CAP_IPC_LOCK or sufficient RLIMIT_MEMLOCK
.build?;
API
Core Operations
| Method | DynamicSlab | FixedSlab | Description |
|---|---|---|---|
insert(value) |
Key |
— | Insert, grow if needed (panics on alloc failure) |
try_insert(value) |
— | Result<Key, Full> |
Insert, returns Err if full |
slab[key] |
&T / &mut T |
&T / &mut T |
Index access (panics if invalid) |
get(key) |
Option<&T> |
Option<&T> |
Get reference, None if invalid |
get_mut(key) |
Option<&mut T> |
Option<&mut T> |
Get mutable reference, None if invalid |
get_unchecked(key) |
&T |
&T |
Get reference without bounds check |
get_unchecked_mut(key) |
&mut T |
&mut T |
Get mutable reference without bounds check |
remove(key) |
T |
T |
Remove and return value (panics if invalid) |
contains(key) |
bool |
bool |
Check if key is occupied |
len() |
usize |
usize |
Number of occupied slots |
capacity() |
usize |
usize |
Total slots available |
clear() |
() |
() |
Remove all elements |
Vacant Entry Pattern
For self-referential structures:
// DynamicSlab
let entry = slab.vacant_entry;
let key = entry.key;
entry.insert;
// FixedSlab
if let Some = slab.try_vacant_entry
Key Design
Keys encode (slab_index, slot_index) in a u64. Keys are valid until removed:
let key = slab.insert;
assert!;
slab.remove;
assert!; // Key no longer valid
Note: There is no generation counter — reusing a key after removal is undefined behavior. The caller is responsible for tracking key validity.
When to Use This
Good fit:
- Systems requiring large slabs with low latency operations
- Trading systems, matching engines
- Real-time audio/video processing
- Game servers with strict latency budgets
- Any system where p999 matters more than a few cycles on p50
Use slab crate instead when:
- Median performance is the priority
- You don't need huge pages or mlock
- Simpler dependency is preferred
Platform Support
| Platform | Allocator | Huge Pages | mlock |
|---|---|---|---|
| Linux | mmap | ✓ | ✓ |
| macOS | mmap | ✗ | ✓ |
| Other | std::alloc | ✗ | ✗ |
License
MIT OR Apache-2.0