bitwheel 0.3.1

High-performance fixed capacity timer wheel
Documentation

BitWheel

A high-performance hierarchical timer wheel for Rust, optimized for consistent low-latency for event-driven applications.

Features

  • O(1) operations — insert, cancel, and poll are constant-time regardless of timer count
  • Predictable tail latency — no cascading between gears eliminates latency spikes
  • Zero allocation on hot path — all memory pre-allocated; trades memory for determinism
  • Bounded capacity by design — insert failure signals backpressure, not a bug
  • Failover optionBitWheelWithFailover gracefully degrades to BTreeMap when full
  • Precision tradeoff — coarser gears sacrifice millisecond precision (does a 1-hour timer need it?)
  • Compile-time configuration — tune resolution, capacity, and range via const generics

Quick Start

use bitwheel::{BalancedWheel, Timer, TimerHandle};
use std::time::{Duration, Instant};

struct OrderTimeout {
    order_id: u64,
}

impl Timer for OrderTimeout {
    type Context = Vec<u64>;

    fn fire(&mut self, _now: Instant, ctx: &mut Self::Context) -> Option<Instant> {
        ctx.push(self.order_id);
        None // one-shot
    }
}

fn main() {
    let mut wheel: Box<BalancedWheel<OrderTimeout>> = BalancedWheel::boxed();
    
    // Insert a timer
    let when = Instant::now() + Duration::from_millis(100);
    let handle = wheel.insert(when, OrderTimeout { order_id: 12345 }).unwrap();
    
    // Cancel if order fills
    if let Some(timer) = wheel.cancel(handle) {
        println!("Cancelled order {}", timer.order_id);
    }
    
    // Poll in event loop
    let mut fired = Vec::new();
    wheel.poll(Instant::now(), &mut fired).unwrap();
}

Preconfigured Wheels

BitWheel provides type aliases for common use cases:

Type Resolution Range Slot Cap Use Case
BalancedWheel 5ms ~23 hours 16 General purpose
BalancedFastWheel 5ms ~23 hours 64 Minimal probing
BalancedLightWheel 5ms ~23 hours 8 Memory constrained
PreciseWheel 1ms ~4.7 hours 16 Sub-10ms timeouts
BurstWheel 5ms ~23 hours 32 High timer density
ExtendedWheel 16ms ~3 days 16 Long-running timers

Each has a WithFailover variant (e.g., BalancedWheelWithFailover) that falls back to BTreeMap on overflow.

use bitwheel::{PreciseWheel, BurstWheelWithFailover};

// 1ms resolution for tight timeouts
let mut precise: Box<PreciseWheel<MyTimer>> = PreciseWheel::boxed();

// High capacity with safety net
let mut burst: Box<BurstWheelWithFailover<MyTimer>> = BurstWheelWithFailover::boxed();

Design Philosophy

Insert Can Fail (And That's OK)

Unlike most timer implementations, BitWheel::insert returns Result. When slots overflow:

match wheel.insert(when, timer) {
    Ok(handle) => { /* timer scheduled */ }
    Err(InsertError(timer)) => {
        // Capacity exceeded — apply backpressure
        // This is a feature: your config doesn't match your workload
    }
}

For systems where dropping timers is unacceptable, use BitWheelWithFailover:

use bitwheel::BalancedWheelWithFailover;

let mut wheel: Box<BalancedWheelWithFailover<MyTimer>> = 
    BitWheelWithFailover::boxed();

// Never fails — overflows go to BTreeMap
let handle = wheel.insert(when, timer);

Precision vs. Predictability

Traditional hierarchical wheels cascade timers between levels as time advances. This causes latency spikes when many timers move simultaneously.

BitWheel trades precision for predictability:

  • Timers in higher gears (longer delays) have coarser resolution
  • A 1-hour timer might fire within a ~64ms window — acceptable for most use cases
  • No cascading means no surprise latency spikes

Memory Tradeoff

BitWheel pre-allocates all slot storage at construction:

Configuration Approximate Memory
BalancedWheel ~200 KB
BalancedFastWheel ~800 KB
BurstWheel ~400 KB

This trades memory for determinism — no allocator calls during insert/poll.

How It's Fast

Hierarchical gear design: 64 slots per gear, each gear 64× coarser than the previous. 4 gears cover ~4.6 hours at 5ms resolution.

Skip-ahead optimization: Tracks next_fire_tick globally. Empty polls jump directly to the next timer instead of iterating idle slots.

Bitmap-accelerated lookup: Each gear maintains a 64-bit occupied bitmap. Finding the next non-empty slot is rotate_right + trailing_zeros — a few CPU cycles.

Dirty tracking: Gear metadata only recomputes when timers are removed, not on every poll.

Fixed-size slabs with intrusive lists: Each slot is a pre-allocated slab with O(1) insert/remove via intrusive linked lists. No pointer chasing through heap allocations.

No cascading: Timers stay in their original slot until they fire. Higher gears have lower precision, but zero overhead from timer migration.

Benchmarks

Measured with HDR histograms (100k warmup, 1M iterations, --release).

Interleaved Insert Latency

Timers inserted between existing entries — stresses tree rebalancing and cascading.

Percentile BitWheel BitWheel+Failover BTreeMap HHWT
p50 26 ns 27 ns 34 ns 39 ns
p90 28 ns 31 ns 49 ns 54 ns
p99 34 ns 50 ns 100 ns 278 ns
p99.9 48 ns 75 ns 131 ns 990 ns
p99.99 209 ns 209 ns 365 ns 2757 ns

Realistic Trading Simulation

Order timeout management: 80% inserts, 5% cancels, continuous polling.

Operation BitWheel BitWheel+Failover BTreeMap HHWT
Insert p50 26 ns 27 ns 42 ns 28 ns
Insert p99.9 61 ns 66 ns 79 ns 87 ns
Poll p50 15 ns 17 ns 17 ns 19 ns
Poll p99.9 61 ns 65 ns 149 ns 1515 ns
Cancel p50 14 ns 14 ns 54 ns 20 ns
Cancel p99.9 33 ns 36 ns 156 ns 45 ns

Summary

Metric vs BTreeMap vs HHWT
Insert 1.4-1.7× faster ~same p50, 2× better p99.9
Cancel 3.6× faster 1.4× slower p50, similar tail
Poll tail (p99.9) 1.6× better 14× better
Predictability (stddev) 2× better 9× better

Comparisons:

Advanced Configuration

use bitwheel::BitWheel;

// BitWheel<Timer, NUM_GEARS, RESOLUTION_MS, SLOT_CAP, MAX_PROBES>
type CustomWheel<T> = BitWheel<T, 4, 1, 64, 8>;
Parameter Default Description
NUM_GEARS 5 Hierarchy levels (more = longer max delay)
RESOLUTION_MS 5 Base tick resolution in milliseconds
SLOT_CAP 32 Timers per slot before probing
MAX_PROBES 3 Linear probe distance on collision

Capacity Planning

Each gear has 64 slots. When a slot fills to SLOT_CAP, inserts probe linearly to adjacent slots.

Total capacity: 64 × NUM_GEARS × SLOT_CAP timers (theoretical max)

Hotspot capacity: SLOT_CAP × MAX_PROBES timers at the same target time

The Probing Tradeoff

Config Latency Memory Best For
Large SLOT_CAP, small MAX_PROBES Tight tails Higher Latency-critical systems
Small SLOT_CAP, large MAX_PROBES Variable tails Lower Memory-constrained systems

Why probing hurts tails: Each probe is a potential cache miss and adds branches. With MAX_PROBES=2, insert is almost always single-slot. With MAX_PROBES=16, worst-case insert touches 16 slots.

Sizing for Your Workload

  1. Estimate hotspot density: How many timers fire within one tick? (e.g., 1000 orders/sec × 100ms timeout = 100 timers per 1ms tick)

  2. Set SLOT_CAP to handle typical density without probing

  3. Set MAX_PROBES for burst headroom (2-4× typical density)

  4. Use failover if drops are unacceptable: BitWheelWithFailover catches overflow in a BTreeMap

Event Loop Integration

use bitwheel::{BalancedWheel, Timer};
use std::time::{Duration, Instant};

fn event_loop<T: Timer>(wheel: &mut BalancedWheel<T>, ctx: &mut T::Context) {
    loop {
        // Poll timers
        if let Err(e) = wheel.poll(Instant::now(), ctx) {
            eprintln!("{} timers failed to reschedule", e.0);
        }
        
        // Sleep until next timer or max interval
        match wheel.duration_until_next() {
            Some(d) => std::thread::sleep(d.min(Duration::from_millis(100))),
            None => std::thread::sleep(Duration::from_millis(100)),
        }
    }
}

License

MIT OR Apache-2.0