ultraslayer 0.2.3

Ultra‑low latency DRAM refresh‑stall prevention using hardware‑level hedging.
Documentation

⚡️ UltraSlayer – DRAM Refresh‑Stall Killer

UltraSlayer is a lock‑free, hardware‑aware memory slab that eliminates the “DRAM refresh stall” (tREFI) tail‑latency that destroys nanosecond‑level determinism in High‑Frequency‑Trading (HFT) and other ultra‑low‑latency workloads.

It mirrors every hot‑path object across a configurable number of physical DRAM channels and lets a dedicated Slayer Core race the reads in parallel, guaranteeing that at least one channel will answer before a refresh can stall the request.

⚠️ WARNING – UltraSlayer uses unsafe, volatile loads/stores and a core that spins 100 % of the time. Use it only for the critical hot‑path of a latency‑sensitive application.

License Crates.io Crates.io Total Downloads Follow @absalomedia


🎯 The Problem – DRAM “Tail”

Situation Latency
Normal DRAM read ≈ 60 ns
Read that hits a refresh (tREFI) ≈ 200 ns + (spike)

A single 200 ns jitter can be the difference between a profitable trade and a missed opportunity.


🚀 The Solution – Hardware Hedging

Step What UltraSlayer does
Mirroring Stores each hot object on N distinct DRAM channels (different DIMMs / banks).
Slayer Core A dedicated thread, pinned to a physical core, issues N parallel reads at the pipeline level.
Race‑to‑first The first response that arrives is returned; the other reads are discarded.
Deterministic latency Probability that all channels are refreshed simultaneously is 1/N → tail is dramatically reduced.

The core spins continuously to keep the core hot and avoid C‑state exits that would re‑introduce jitter.


❤️ Inspiration – Laurie Wired’s TailSlayer

UltraSlayer is a Rust port of the original TailSlayer implementation created by Laurie Wired.

Laurie Wired’s work introduced the concept of hardware‑level hedging to eliminate DRAM refresh‑stall tail latency. UltraSlayer adapts that concept to safe‑ish Rust while preserving the same deterministic guarantees.


🛠️ New Features (v0.2)

Feature Description
Configurable channel count --channels N (2‑8 mirrors).
Cross-Platform Memory Native mmap (Linux) and VirtualAlloc (Windows) support.
Huge‑Page support Uses MAP_HUGETLB on Linux $\rightarrow$ zero TLB misses.
Spin policies busy (full spin), hybrid (spin → yield), sleep (periodic pause).
Side‑car (sidecar feature) Builds a cdylib with a tiny C‑FFI (ul_init, ul_read_u64, …).
POSIX Shared‑Memory wrapper (src/shm.rs) ShmSlab<T> lets multiple processes map the same slab via /dev/shm (Linux only).
Criterion benchmark harness (benchmark feature) benches/read_latency.rs measures nanosecond read latency for 2/4/8 channels.
CLI demo binary (cli feature) examples/ultraslayer_cli.rs parses flags, creates the slab, starts the core, and idles.
Zero‑copy slice view (slice feature) src/slice.rs exposes a raw‑pointer slice for bulk reads without copying.
Full LTO + thin‑LTO options Optimised release builds for the smallest, fastest binary.

📋 System Requirements

Requirement Linux Windows
Kernel/OS Kernel $\ge$ 5.10 Windows 10 / 11
Huge Pages sudo sysctl -w vm.nr_hugepages=2048 Standard Virtual Memory (Automatic)
DRAM Channels $\ge$ 2 physical channels $\ge$ 2 physical channels
CPU Affinity taskset / chrt SetThreadAffinityMask (via core_affinity)
Permissions Root/Sudo for Huge Pages/RT priority Administrator for certain memory flags

📦 Getting Started – Build & Install

1️⃣ Clone the repository

git clone https://github.com/absalomedia/ultraslayer.git

cd ultraslayer

2️⃣ Build the core library (default)

cargo build --release

3️⃣ Optional builds

Goal Cargo command What you get
CLI demo (examples/ultraslayer_cli.rs) cargo build --release --features cli target/release/examples/ultraslayer_cli
C‑FFI side‑car (libultraslayer.so) cargo build --release --features sidecar target/release/libultraslayer.so
Zero‑copy slices (src/slice.rs) cargo build --release --features slice Enables the .slice() API in UltraSlayer
Benchmark harness (Criterion) cargo bench --features benchmark Runs benches/read_latency.rs and prints latency tables
All features cargo build --release --features "cli sidecar slice benchmark" Everything compiled together

The release profile already uses full LTO, opt-level = 3, panic = "abort" and a single codegen unit for maximum inlining.


▶️ Running UltraSlayer (demo binary)

Linux (with Real‑Time priority)

# Use the binary located in the examples directory

sudo chrt -f 99 taskset -c 2 target/release/examples/ultraslayer_cli \

    --channels 4 \

    --size 2GiB \

    --spin busy

Windows

# Use the binary located in the examples directory
.\target\release\examples\ultraslayer_cli.exe --channels 4 --size 2GiB --spin busy

Alternatively, run directly via Cargo:

cargo run --release --features cli --example ultraslayer_cli -- --channels 4 --size 2GiB --spin busy

Flags

Flag Meaning
--channels N Number of DRAM mirrors (default 2).
--size <bytes> Total slab size per channel (e.g. 2GiB, 512MiB).
--spin <policy> busy, hybrid, or sleep (default busy).

📊 Benchmarking

UltraSlayer ships with two ways to benchmark latency.

1️⃣ Criterion read‑latency benchmark

cargo bench --features benchmark

The benchmark creates slabs with 2, 4, and 8 channels, fills them with deterministic data, then performs 1 000 000 random reads per configuration while measuring nanosecond‑resolution latency.

2️⃣ Stand‑alone micro‑benchmark binary

Via Cargo:

cargo run --release --features benchmark --example benchmark -- --channels 4 --size 2GiB --ops 1_000_000 --spin busy

Via binary (for Real-Time priority on Linux):

# First, build the example

cargo build --release --features benchmark


# Then run the resulting binary from the examples folder

sudo chrt -f 99 taskset -c 2 target/release/examples/benchmark \

    --channels 4 --size 2GiB --ops 1_000_000 --spin busy

Windows execution:

.\target\release\examples\benchmark.exe --channels 4 --size 2GiB --ops 1_000_000 --spin busy

🔌 Integration Guide

A️ Pure Rust Engine

use std::sync::Arc;
use ultraslayer::{UltraSlayer, SpinPolicy};

fn main() {
    // 2 GiB slab, 4 mirrored channels, busy‑spin policy
    let slayer = Arc::new(
        UltraSlayer::<u64>::new(4, 2 << 30) // Direct allocation
    );
    slayer.set_spin_policy(SpinPolicy::Busy);
    slayer.spawn_slayer_core(0);       // start core on CPU 0
    slayer.pin_to_core(0);            // pin current thread to core 0

    // Hot‑path usage (example: reading a price)
    let price = slayer.read(PRICE_IDX);
    
    // Optional: Bulk read via slice (requires --features slice)
    // let view = unsafe { slayer.slice() };
    // let first_val = view[0];
}

All public methods (read, write, slice, stats, set_spin_policy, pin_to_core) are exposed through the crate root (ultraslayer::UltraSlayer).

B️ Non‑Rust Languages (C / Node / Python) – Side‑car

cargo build --release --features sidecar

The exported C API (in src/ffi.rs) provides ul_init, ul_start_core, ul_read_u64, ul_write_u64, and ul_destroy.

C️ Multiple Processes – POSIX Shared‑Memory (Linux Only)

use ultraslayer::ShmSlab;

// Process A – creates the slab
let shm = ShmSlab::<u64>::create("ultra_slab", 4, 2 << 30)?;
let slayer = shm.into_ultraslayer();

📁 Project Layout

ultraslayer/
├─ src/
│   ├─ lib.rs            ← public UltraSlayer API
│   ├─ slab.rs           ← low‑level mirroring & volatile ops
│   ├─ arch.rs           ← CPU‑affinity helpers
│   ├─ reader.rs         ← internal read‑path logic
│   ├─ main.rs           ← optional entry point
│   ├─ shm.rs            ← POSIX shared‑memory wrapper (Linux)
│   ├─ ffi.rs            ← C‑FFI side‑car (feature = "sidecar")
│   └─ slice.rs          ← zero‑copy slice view
├─ benches/
│   └─ read_latency.rs   ← Criterion read‑latency benchmark
├─ examples/
│   ├─ ultraslayer_cli.rs    ← CLI demo binary (feature = "cli")
│   └─ benchmark.rs      ← micro‑benchmark binary (feature = "benchmark")
├─ Cargo.toml
└─ README.md            ← this file

📜 License

UltraSlayer is released under the Apache License, Version 2.0.


TL;DR – Quick start for a typical HFT node

# 1️⃣ Reserve huge pages (Linux only)

sudo sysctl -w vm.nr_hugepages=2048


# 2️⃣ Build with the CLI demo + side‑car + slice view

cargo build --release --features "cli sidecar slice"


# 3️⃣ Run the demo (core 2, 4 channels, 2 GiB per channel)

sudo chrt -f 99 taskset -c 2 target/release/examples/ultraslayer_cli \

    --channels 4 --size 2GiB --spin busy

UltraSlayer – the practical, Rust‑native answer to Laurie Wired’s TailSlayer concept. 🚀