Expand description
§latency-buckets
Streaming histogram for LLM call latencies with constant-memory percentile estimation.
Buckets are log-scale (base 2), covering 1 µs to ~17 minutes in 30
buckets. Each record is O(1). Percentiles are estimated by
linear interpolation inside the chosen bucket; expected error is
roughly half a bucket width (≤ 50% of the bucket value).
§Example
use latency_buckets::Histogram;
use std::time::Duration;
let mut h = Histogram::new();
for ms in [10, 50, 200, 800, 1500, 3000] {
h.record(Duration::from_millis(ms));
}
let p50 = h.percentile(0.50);
// Coarse log-scale buckets; expect a value in the same order of magnitude.
assert!(p50.as_millis() >= 50 && p50.as_millis() <= 2000);Structs§
- Histogram
- Streaming latency histogram.