BitWheel
A high-performance hierarchical timer wheel for Rust, optimized for consistent low-latency for event-driven applications.
Features
- O(1) operations — insert, cancel, and poll are constant-time regardless of timer count
- Predictable tail latency — no cascading between gears eliminates latency spikes
- Zero allocation on hot path — all memory pre-allocated; trades memory for determinism
- Bounded capacity by design — insert failure signals backpressure, not a bug
- Failover option —
BitWheelWithFailovergracefully degrades to BTreeMap when full - Dual-lane option —
DualBitWheelseparates periodic and oneshot timers with independent configurations - Precision tradeoff — coarser gears sacrifice millisecond precision (does a 1-hour timer need it?)
- Compile-time configuration — tune resolution, capacity, and range via const generics
Quick Start
use ;
use ;
Preconfigured Wheels
BitWheel provides type aliases for common use cases:
| Type | Resolution | Range | Slot Cap | Use Case |
|---|---|---|---|---|
BalancedWheel |
5ms | ~23 hours | 16 | General purpose |
BalancedFastWheel |
5ms | ~23 hours | 64 | Minimal probing |
BalancedLightWheel |
5ms | ~23 hours | 8 | Memory constrained |
PreciseWheel |
1ms | ~4.7 hours | 16 | Sub-10ms timeouts |
BurstWheel |
5ms | ~23 hours | 32 | High timer density |
ExtendedWheel |
16ms | ~3 days | 16 | Long-running timers |
Each has a WithFailover variant (e.g., BalancedWheelWithFailover) that falls back to BTreeMap on overflow.
use ;
// 1ms resolution for tight timeouts
let mut precise: = boxed;
// High capacity with safety net
let mut burst: = boxed;
Dual-Lane Wheels
DualBitWheel runs two independent wheels — one for periodic timers (heartbeats, monitoring) and one for oneshot timers (order timeouts). Each lane has its own resolution and capacity tuning.
| Type | Periodic | Oneshot | Use Case |
|---|---|---|---|
StandardBalancedDualWheel |
100ms / 8 cap | 5ms / 16 cap | General purpose |
FastPreciseDualWheel |
100ms / 16 cap | 1ms / 64 cap | HFT order timeouts |
StandardBurstDualWheel |
100ms / 8 cap | 5ms / 32 cap | Bursty workloads |
use ;
let mut wheel: = boxed;
// Periodic timers (heartbeats) — coarse resolution, high probe tolerance
let handle = wheel.insert_periodic;
// Oneshot timers (order timeouts) — fine resolution, tight latency
let handle = wheel.insert_oneshot;
Design Philosophy
Insert Can Fail (And That's OK)
Unlike most timer implementations, BitWheel::insert returns Result. When slots overflow:
match wheel.insert
For systems where dropping timers is unacceptable, use BitWheelWithFailover:
use BalancedWheelWithFailover;
let mut wheel: =
boxed;
// Never fails — overflows go to BTreeMap
let handle = wheel.insert;
Precision vs. Predictability
Traditional hierarchical wheels cascade timers between levels as time advances. This causes latency spikes when many timers move simultaneously.
BitWheel trades precision for predictability:
- Timers in higher gears (longer delays) have coarser resolution
- A 1-hour timer might fire within a ~64ms window — acceptable for most use cases
- No cascading means no surprise latency spikes
Memory Tradeoff
BitWheel pre-allocates all slot storage at construction:
| Configuration | Approximate Memory |
|---|---|
BalancedWheel |
~200 KB |
BalancedFastWheel |
~800 KB |
BurstWheel |
~400 KB |
This trades memory for determinism — no allocator calls during insert/poll.
How It's Fast
Hierarchical gear design: 64 slots per gear, each gear 64× coarser than the previous. 4 gears cover ~4.6 hours at 5ms resolution.
Skip-ahead optimization: Tracks next_fire_tick globally. Empty polls jump directly to the next timer instead of iterating idle slots.
Bitmap-accelerated lookup: Each gear maintains a 64-bit occupied bitmap. Finding the next non-empty slot is rotate_right + trailing_zeros — a few CPU cycles.
Dirty tracking: Gear metadata only recomputes when timers are removed, not on every poll.
Fixed-size slabs with intrusive lists: Each slot is a pre-allocated slab with O(1) insert/remove via intrusive linked lists. No pointer chasing through heap allocations.
No cascading: Timers stay in their original slot until they fire. Higher gears have lower precision, but zero overhead from timer migration.
Benchmarks
Measured with HDR histograms (100k warmup, 1M iterations, --release).
Interleaved Insert Latency
Timers inserted between existing entries — stresses tree rebalancing and cascading.
| Percentile | BitWheel | BitWheel+Failover | DualBitWheel | BTreeMap | HHWT |
|---|---|---|---|---|---|
| p50 | 26 ns | 27 ns | 26 ns | 34 ns | 39 ns |
| p90 | 28 ns | 31 ns | 29 ns | 49 ns | 54 ns |
| p99 | 34 ns | 50 ns | 39 ns | 100 ns | 278 ns |
| p99.9 | 48 ns | 75 ns | 67 ns | 131 ns | 990 ns |
| p99.99 | 131 ns | 149 ns | 145 ns | 201 ns | 2415 ns |
Realistic Trading Simulation
Order timeout management: 80% inserts, 5% cancels, continuous polling.
| Operation | BitWheel | BitWheel+Failover | DualBitWheel | BTreeMap | HHWT |
|---|---|---|---|---|---|
| Insert p50 | 26 ns | 27 ns | 26 ns | 42 ns | 28 ns |
| Insert p99.9 | 61 ns | 66 ns | 66 ns | 79 ns | 87 ns |
| Poll p50 | 15 ns | 17 ns | 19 ns | 17 ns | 19 ns |
| Poll p99.9 | 61 ns | 65 ns | 67 ns | 149 ns | 1515 ns |
| Cancel p50 | 14 ns | 14 ns | 14 ns | 54 ns | 20 ns |
| Cancel p99.9 | 33 ns | 36 ns | 37 ns | 156 ns | 45 ns |
Summary
| Metric | vs BTreeMap | vs HHWT |
|---|---|---|
| Insert | 1.4-1.7× faster | ~same p50, 2× better p99.9 |
| Cancel | 3.6× faster | 1.4× slower p50, similar tail |
| Poll tail (p99.9) | 1.6× better | 14× better |
| Predictability (stddev) | 2× better | 9× better |
Comparisons:
- BTreeMap:
BTreeMap<(u64, u32), T>— similar to tokio's internal timer - HHWT: hierarchical_hash_wheel_timer v1.3
Advanced Configuration
use BitWheel;
// BitWheel<Timer, NUM_GEARS, RESOLUTION_MS, SLOT_CAP, MAX_PROBES>
type CustomWheel<T> = ;
| Parameter | Default | Description |
|---|---|---|
NUM_GEARS |
5 | Hierarchy levels (more = longer max delay) |
RESOLUTION_MS |
5 | Base tick resolution in milliseconds |
SLOT_CAP |
32 | Timers per slot before probing |
MAX_PROBES |
3 | Linear probe distance on collision |
Capacity Planning
Each gear has 64 slots. When a slot fills to SLOT_CAP, inserts probe linearly to adjacent slots.
Total capacity: 64 × NUM_GEARS × SLOT_CAP timers (theoretical max)
Hotspot capacity: SLOT_CAP × MAX_PROBES timers at the same target time
The Probing Tradeoff
| Config | Latency | Memory | Best For |
|---|---|---|---|
Large SLOT_CAP, small MAX_PROBES |
Tight tails | Higher | Latency-critical systems |
Small SLOT_CAP, large MAX_PROBES |
Variable tails | Lower | Memory-constrained systems |
Why probing hurts tails: Each probe is a potential cache miss and adds branches. With MAX_PROBES=2, insert is almost always single-slot. With MAX_PROBES=16, worst-case insert touches 16 slots.
Sizing for Your Workload
-
Estimate hotspot density: How many timers fire within one tick? (e.g., 1000 orders/sec × 100ms timeout = 100 timers per 1ms tick)
-
Set
SLOT_CAPto handle typical density without probing -
Set
MAX_PROBESfor burst headroom (2-4× typical density) -
Use failover if drops are unacceptable:
BitWheelWithFailovercatches overflow in a BTreeMap
Event Loop Integration
use ;
use ;
License
MIT OR Apache-2.0