bitwheel 0.4.0

High-performance fixed capacity timer wheel
Documentation
# BitWheel

A high-performance hierarchical timer wheel for Rust, optimized for consistent low-latency for event-driven applications.

## Features

- **O(1) operations** — insert, cancel, and poll are constant-time regardless of timer count
- **Predictable tail latency** — no cascading between gears eliminates latency spikes
- **Zero allocation on hot path** — all memory pre-allocated; trades memory for determinism
- **Bounded capacity by design** — insert failure signals backpressure, not a bug
- **Failover option** — `BitWheelWithFailover` gracefully degrades to BTreeMap when full
- **Dual-lane option** — `DualBitWheel` separates periodic and oneshot timers with independent configurations
- **Precision tradeoff** — coarser gears sacrifice millisecond precision (does a 1-hour timer need it?)
- **Compile-time configuration** — tune resolution, capacity, and range via const generics

## Quick Start
```rust
use bitwheel::{BalancedWheel, Timer, TimerHandle};
use std::time::{Duration, Instant};

struct OrderTimeout {
    order_id: u64,
}

impl Timer for OrderTimeout {
    type Context = Vec<u64>;

    fn fire(&mut self, _now: Instant, ctx: &mut Self::Context) -> Option<Instant> {
        ctx.push(self.order_id);
        None // one-shot
    }
}

fn main() {
    let mut wheel: Box<BalancedWheel<OrderTimeout>> = BalancedWheel::boxed();
    
    // Insert a timer
    let when = Instant::now() + Duration::from_millis(100);
    let handle = wheel.insert(when, OrderTimeout { order_id: 12345 }).unwrap();
    
    // Cancel if order fills
    if let Some(timer) = wheel.cancel(handle) {
        println!("Cancelled order {}", timer.order_id);
    }
    
    // Poll in event loop
    let mut fired = Vec::new();
    wheel.poll(Instant::now(), &mut fired).unwrap();
}
```

## Preconfigured Wheels

BitWheel provides type aliases for common use cases:

| Type | Resolution | Range | Slot Cap | Use Case |
|------|------------|-------|----------|----------|
| `BalancedWheel` | 5ms | ~23 hours | 16 | General purpose |
| `BalancedFastWheel` | 5ms | ~23 hours | 64 | Minimal probing |
| `BalancedLightWheel` | 5ms | ~23 hours | 8 | Memory constrained |
| `PreciseWheel` | 1ms | ~4.7 hours | 16 | Sub-10ms timeouts |
| `BurstWheel` | 5ms | ~23 hours | 32 | High timer density |
| `ExtendedWheel` | 16ms | ~3 days | 16 | Long-running timers |

Each has a `WithFailover` variant (e.g., `BalancedWheelWithFailover`) that falls back to BTreeMap on overflow.
```rust
use bitwheel::{PreciseWheel, BurstWheelWithFailover};

// 1ms resolution for tight timeouts
let mut precise: Box<PreciseWheel<MyTimer>> = PreciseWheel::boxed();

// High capacity with safety net
let mut burst: Box<BurstWheelWithFailover<MyTimer>> = BurstWheelWithFailover::boxed();
```

### Dual-Lane Wheels

`DualBitWheel` runs two independent wheels — one for periodic timers (heartbeats, monitoring) and one for oneshot timers (order timeouts). Each lane has its own resolution and capacity tuning.

| Type | Periodic | Oneshot | Use Case |
|------|----------|---------|----------|
| `StandardBalancedDualWheel` | 100ms / 8 cap | 5ms / 16 cap | General purpose |
| `FastPreciseDualWheel` | 100ms / 16 cap | 1ms / 64 cap | HFT order timeouts |
| `StandardBurstDualWheel` | 100ms / 8 cap | 5ms / 32 cap | Bursty workloads |
```rust
use bitwheel::{StandardBalancedDualWheel, DualBitWheel};

let mut wheel: Box<StandardBalancedDualWheel<MyTimer>> = DualBitWheel::boxed();

// Periodic timers (heartbeats) — coarse resolution, high probe tolerance
let handle = wheel.insert_periodic(when, timer);

// Oneshot timers (order timeouts) — fine resolution, tight latency
let handle = wheel.insert_oneshot(when, timer);
```

## Design Philosophy

### Insert Can Fail (And That's OK)

Unlike most timer implementations, `BitWheel::insert` returns `Result`. When slots overflow:
```rust
match wheel.insert(when, timer) {
    Ok(handle) => { /* timer scheduled */ }
    Err(InsertError(timer)) => {
        // Capacity exceeded — apply backpressure
        // This is a feature: your config doesn't match your workload
    }
}
```

For systems where dropping timers is unacceptable, use `BitWheelWithFailover`:
```rust
use bitwheel::BalancedWheelWithFailover;

let mut wheel: Box<BalancedWheelWithFailover<MyTimer>> = 
    BitWheelWithFailover::boxed();

// Never fails — overflows go to BTreeMap
let handle = wheel.insert(when, timer);
```

### Precision vs. Predictability

Traditional hierarchical wheels cascade timers between levels as time advances. This causes latency spikes when many timers move simultaneously.

BitWheel trades precision for predictability:
- Timers in higher gears (longer delays) have coarser resolution
- A 1-hour timer might fire within a ~64ms window — acceptable for most use cases
- No cascading means no surprise latency spikes

### Memory Tradeoff

BitWheel pre-allocates all slot storage at construction:

| Configuration | Approximate Memory |
|---------------|-------------------|
| `BalancedWheel` | ~200 KB |
| `BalancedFastWheel` | ~800 KB |
| `BurstWheel` | ~400 KB |

This trades memory for determinism — no allocator calls during insert/poll.

## How It's Fast

**Hierarchical gear design**: 64 slots per gear, each gear 64× coarser than the previous. 4 gears cover ~4.6 hours at 5ms resolution.

**Skip-ahead optimization**: Tracks `next_fire_tick` globally. Empty polls jump directly to the next timer instead of iterating idle slots.

**Bitmap-accelerated lookup**: Each gear maintains a 64-bit occupied bitmap. Finding the next non-empty slot is `rotate_right` + `trailing_zeros` — a few CPU cycles.

**Dirty tracking**: Gear metadata only recomputes when timers are removed, not on every poll.

**Fixed-size slabs with intrusive lists**: Each slot is a pre-allocated slab with O(1) insert/remove via intrusive linked lists. No pointer chasing through heap allocations.

**No cascading**: Timers stay in their original slot until they fire. Higher gears have lower precision, but zero overhead from timer migration.

## Benchmarks

Measured with HDR histograms (100k warmup, 1M iterations, `--release`).

### Interleaved Insert Latency

Timers inserted between existing entries — stresses tree rebalancing and cascading.

| Percentile | BitWheel | BitWheel+Failover | DualBitWheel | BTreeMap | HHWT |
|------------|----------|-------------------|--------------|----------|------|
| p50 | **26 ns** | 27 ns | 26 ns | 34 ns | 39 ns |
| p90 | **28 ns** | 31 ns | 29 ns | 49 ns | 54 ns |
| p99 | **34 ns** | 50 ns | 39 ns | 100 ns | 278 ns |
| p99.9 | **48 ns** | 75 ns | 67 ns | 131 ns | 990 ns |
| p99.99 | **131 ns** | 149 ns | 145 ns | 201 ns | 2415 ns |

### Realistic Trading Simulation

Order timeout management: 80% inserts, 5% cancels, continuous polling.

| Operation | BitWheel | BitWheel+Failover | DualBitWheel | BTreeMap | HHWT |
|-----------|----------|-------------------|--------------|----------|------|
| Insert p50 | **26 ns** | 27 ns | 26 ns | 42 ns | 28 ns |
| Insert p99.9 | **61 ns** | 66 ns | 66 ns | 79 ns | 87 ns |
| Poll p50 | **15 ns** | 17 ns | 19 ns | 17 ns | 19 ns |
| Poll p99.9 | **61 ns** | 65 ns | 67 ns | 149 ns | 1515 ns |
| Cancel p50 | **14 ns** | 14 ns | 14 ns | 54 ns | 20 ns |
| Cancel p99.9 | **33 ns** | 36 ns | 37 ns | 156 ns | 45 ns |

### Summary

| Metric | vs BTreeMap | vs HHWT |
|--------|-------------|---------|
| Insert | 1.4-1.7× faster | ~same p50, 2× better p99.9 |
| Cancel | **3.6× faster** | 1.4× slower p50, similar tail |
| Poll tail (p99.9) | **1.6× better** | **14× better** |
| Predictability (stddev) | 2× better | 9× better |

Comparisons:
- **BTreeMap**: `BTreeMap<(u64, u32), T>` — similar to tokio's internal timer
- **HHWT**: [hierarchical_hash_wheel_timer](https://crates.io/crates/hierarchical_hash_wheel_timer) v1.3

## Advanced Configuration
```rust
use bitwheel::BitWheel;

// BitWheel<Timer, NUM_GEARS, RESOLUTION_MS, SLOT_CAP, MAX_PROBES>
type CustomWheel<T> = BitWheel<T, 4, 1, 64, 8>;
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `NUM_GEARS` | 5 | Hierarchy levels (more = longer max delay) |
| `RESOLUTION_MS` | 5 | Base tick resolution in milliseconds |
| `SLOT_CAP` | 32 | Timers per slot before probing |
| `MAX_PROBES` | 3 | Linear probe distance on collision |

### Capacity Planning

Each gear has 64 slots. When a slot fills to `SLOT_CAP`, inserts probe linearly to adjacent slots.

**Total capacity**: `64 × NUM_GEARS × SLOT_CAP` timers (theoretical max)

**Hotspot capacity**: `SLOT_CAP × MAX_PROBES` timers at the same target time

#### The Probing Tradeoff

| Config | Latency | Memory | Best For |
|--------|---------|--------|----------|
| Large `SLOT_CAP`, small `MAX_PROBES` | Tight tails | Higher | Latency-critical systems |
| Small `SLOT_CAP`, large `MAX_PROBES` | Variable tails | Lower | Memory-constrained systems |

**Why probing hurts tails**: Each probe is a potential cache miss and adds branches. With `MAX_PROBES=2`, insert is almost always single-slot. With `MAX_PROBES=16`, worst-case insert touches 16 slots.

#### Sizing for Your Workload

1. **Estimate hotspot density**: How many timers fire within one tick? (e.g., 1000 orders/sec × 100ms timeout = 100 timers per 1ms tick)

2. **Set `SLOT_CAP`** to handle typical density without probing

3. **Set `MAX_PROBES`** for burst headroom (2-4× typical density)

4. **Use failover** if drops are unacceptable: `BitWheelWithFailover` catches overflow in a BTreeMap

## Event Loop Integration
```rust
use bitwheel::{BalancedWheel, Timer};
use std::time::{Duration, Instant};

fn event_loop<T: Timer>(wheel: &mut BalancedWheel<T>, ctx: &mut T::Context) {
    loop {
        // Poll timers
        if let Err(e) = wheel.poll(Instant::now(), ctx) {
            eprintln!("{} timers failed to reschedule", e.0);
        }
        
        // Sleep until next timer or max interval
        match wheel.duration_until_next() {
            Some(d) => std::thread::sleep(d.min(Duration::from_millis(100))),
            None => std::thread::sleep(Duration::from_millis(100)),
        }
    }
}
```

## License

MIT OR Apache-2.0