Skip to main content

Crate latency_buckets

Crate latency_buckets 

Source
Expand description

§latency-buckets

Streaming histogram for LLM call latencies with constant-memory percentile estimation.

Buckets are log-scale (base 2), covering 1 µs to ~17 minutes in 30 buckets. Each record is O(1). Percentiles are estimated by linear interpolation inside the chosen bucket; expected error is roughly half a bucket width (≤ 50% of the bucket value).

§Example

use latency_buckets::Histogram;
use std::time::Duration;

let mut h = Histogram::new();
for ms in [10, 50, 200, 800, 1500, 3000] {
    h.record(Duration::from_millis(ms));
}
let p50 = h.percentile(0.50);
// Coarse log-scale buckets; expect a value in the same order of magnitude.
assert!(p50.as_millis() >= 50 && p50.as_millis() <= 2000);

Structs§

Histogram
Streaming latency histogram.