BitWheel

A high-performance hierarchical timer wheel for Rust, optimized for consistent low-latency for event-driven applications.

Features

O(1) operations — insert, cancel, and poll are constant-time regardless of timer count
Predictable tail latency — no cascading between gears eliminates latency spikes
Zero allocation on hot path — all memory pre-allocated; trades memory for determinism
Bounded capacity by design — insert failure signals backpressure, not a bug
Failover option — BitWheelWithFailover gracefully degrades to BTreeMap when full
Precision tradeoff — coarser gears sacrifice millisecond precision (does a 1-hour timer need it?)
Compile-time configuration — tune resolution, capacity, and range via const generics

Quick Start

use bitwheel::{BalancedWheel, Timer, TimerHandle};
use std::time::{Duration, Instant};

struct OrderTimeout {
    order_id: u64,
}

impl Timer for OrderTimeout {
    type Context = Vec<u64>;

    fn fire(&mut self, _now: Instant, ctx: &mut Self::Context) -> Option<Instant> {
        ctx.push(self.order_id);
        None // one-shot
    }
}

fn main() {
    let mut wheel: Box<BalancedWheel<OrderTimeout>> = BalancedWheel::boxed();
    
    // Insert a timer
    let when = Instant::now() + Duration::from_millis(100);
    let handle = wheel.insert(when, OrderTimeout { order_id: 12345 }).unwrap();
    
    // Cancel if order fills
    if let Some(timer) = wheel.cancel(handle) {
        println!("Cancelled order {}", timer.order_id);
    }
    
    // Poll in event loop
    let mut fired = Vec::new();
    wheel.poll(Instant::now(), &mut fired).unwrap();
}

Preconfigured Wheels

BitWheel provides type aliases for common use cases:

Type	Resolution	Range	Slot Cap	Use Case
`BalancedWheel`	5ms	~23 hours	16	General purpose
`BalancedFastWheel`	5ms	~23 hours	64	Minimal probing
`BalancedLightWheel`	5ms	~23 hours	8	Memory constrained
`PreciseWheel`	1ms	~4.7 hours	16	Sub-10ms timeouts
`BurstWheel`	5ms	~23 hours	32	High timer density
`ExtendedWheel`	16ms	~3 days	16	Long-running timers

Each has a WithFailover variant (e.g., BalancedWheelWithFailover) that falls back to BTreeMap on overflow.

use bitwheel::{PreciseWheel, BurstWheelWithFailover};

// 1ms resolution for tight timeouts
let mut precise: Box<PreciseWheel<MyTimer>> = PreciseWheel::boxed();

// High capacity with safety net
let mut burst: Box<BurstWheelWithFailover<MyTimer>> = BurstWheelWithFailover::boxed();

Design Philosophy

Insert Can Fail (And That's OK)

Unlike most timer implementations, BitWheel::insert returns Result. When slots overflow:

match wheel.insert(when, timer) {
    Ok(handle) => { /* timer scheduled */ }
    Err(InsertError(timer)) => {
        // Capacity exceeded — apply backpressure
        // This is a feature: your config doesn't match your workload
    }
}

For systems where dropping timers is unacceptable, use BitWheelWithFailover:

use bitwheel::BalancedWheelWithFailover;

let mut wheel: Box<BalancedWheelWithFailover<MyTimer>> = 
    BitWheelWithFailover::boxed();

// Never fails — overflows go to BTreeMap
let handle = wheel.insert(when, timer);

Precision vs. Predictability

Traditional hierarchical wheels cascade timers between levels as time advances. This causes latency spikes when many timers move simultaneously.

BitWheel trades precision for predictability:

Timers in higher gears (longer delays) have coarser resolution
A 1-hour timer might fire within a ~64ms window — acceptable for most use cases
No cascading means no surprise latency spikes

Memory Tradeoff

BitWheel pre-allocates all slot storage at construction:

Configuration	Approximate Memory
`BalancedWheel`	~200 KB
`BalancedFastWheel`	~800 KB
`BurstWheel`	~400 KB

This trades memory for determinism — no allocator calls during insert/poll.

How It's Fast

Hierarchical gear design: 64 slots per gear, each gear 64× coarser than the previous. 4 gears cover ~4.6 hours at 5ms resolution.

Skip-ahead optimization: Tracks next_fire_tick globally. Empty polls jump directly to the next timer instead of iterating idle slots.

Bitmap-accelerated lookup: Each gear maintains a 64-bit occupied bitmap. Finding the next non-empty slot is rotate_right + trailing_zeros — a few CPU cycles.

Dirty tracking: Gear metadata only recomputes when timers are removed, not on every poll.

Fixed-size slabs with intrusive lists: Each slot is a pre-allocated slab with O(1) insert/remove via intrusive linked lists. No pointer chasing through heap allocations.

No cascading: Timers stay in their original slot until they fire. Higher gears have lower precision, but zero overhead from timer migration.

Benchmarks

Measured with HDR histograms (100k warmup, 1M iterations, --release).

Interleaved Insert Latency

Timers inserted between existing entries — stresses tree rebalancing and cascading.

Percentile	BitWheel	BitWheel+Failover	BTreeMap	HHWT
p50	26 ns	27 ns	34 ns	39 ns
p90	28 ns	31 ns	49 ns	54 ns
p99	34 ns	50 ns	100 ns	278 ns
p99.9	48 ns	75 ns	131 ns	990 ns
p99.99	209 ns	209 ns	365 ns	2757 ns

Realistic Trading Simulation

Order timeout management: 80% inserts, 5% cancels, continuous polling.

Operation	BitWheel	BitWheel+Failover	BTreeMap	HHWT
Insert p50	26 ns	27 ns	42 ns	28 ns
Insert p99.9	61 ns	66 ns	79 ns	87 ns
Poll p50	15 ns	17 ns	17 ns	19 ns
Poll p99.9	61 ns	65 ns	149 ns	1515 ns
Cancel p50	14 ns	14 ns	54 ns	20 ns
Cancel p99.9	33 ns	36 ns	156 ns	45 ns

Summary

Metric	vs BTreeMap	vs HHWT
Insert	1.4-1.7× faster	~same p50, 2× better p99.9
Cancel	3.6× faster	1.4× slower p50, similar tail
Poll tail (p99.9)	1.6× better	14× better
Predictability (stddev)	2× better	9× better

Comparisons:

BTreeMap: BTreeMap<(u64, u32), T> — similar to tokio's internal timer
HHWT: hierarchical_hash_wheel_timer v1.3

Advanced Configuration

use bitwheel::BitWheel;

// BitWheel<Timer, NUM_GEARS, RESOLUTION_MS, SLOT_CAP, MAX_PROBES>
type CustomWheel<T> = BitWheel<T, 4, 1, 64, 8>;

Parameter	Default	Description
`NUM_GEARS`	5	Hierarchy levels (more = longer max delay)
`RESOLUTION_MS`	5	Base tick resolution in milliseconds
`SLOT_CAP`	32	Timers per slot before probing
`MAX_PROBES`	3	Linear probe distance on collision

Capacity Planning

Each gear has 64 slots. When a slot fills to SLOT_CAP, inserts probe linearly to adjacent slots.

Total capacity: 64 × NUM_GEARS × SLOT_CAP timers (theoretical max)

Hotspot capacity: SLOT_CAP × MAX_PROBES timers at the same target time

The Probing Tradeoff

Config	Latency	Memory	Best For
Large `SLOT_CAP`, small `MAX_PROBES`	Tight tails	Higher	Latency-critical systems
Small `SLOT_CAP`, large `MAX_PROBES`	Variable tails	Lower	Memory-constrained systems

Why probing hurts tails: Each probe is a potential cache miss and adds branches. With MAX_PROBES=2, insert is almost always single-slot. With MAX_PROBES=16, worst-case insert touches 16 slots.

Sizing for Your Workload

Estimate hotspot density: How many timers fire within one tick? (e.g., 1000 orders/sec × 100ms timeout = 100 timers per 1ms tick)
Set SLOT_CAP to handle typical density without probing
Set MAX_PROBES for burst headroom (2-4× typical density)
Use failover if drops are unacceptable: BitWheelWithFailover catches overflow in a BTreeMap

Event Loop Integration

use bitwheel::{BalancedWheel, Timer};
use std::time::{Duration, Instant};

fn event_loop<T: Timer>(wheel: &mut BalancedWheel<T>, ctx: &mut T::Context) {
    loop {
        // Poll timers
        if let Err(e) = wheel.poll(Instant::now(), ctx) {
            eprintln!("{} timers failed to reschedule", e.0);
        }
        
        // Sleep until next timer or max interval
        match wheel.duration_until_next() {
            Some(d) => std::thread::sleep(d.min(Duration::from_millis(100))),
            None => std::thread::sleep(Duration::from_millis(100)),
        }
    }
}

License

MIT OR Apache-2.0

bitwheel 0.3.1