# base2histogram

`base2histogram` is a fixed-size histogram that tracks any `u64` distribution
with 252 bucket counters by default (~2.0 KiB of counters per slot), and
answers percentile queries (P50, P99, P99.9, …) with under 2% error for
typical latency workloads.
- **Near-zero error for API latency tracking** — log-normal distributions
(typical for API/service latency) achieve **< 0.3% error at P50/P95/P99**
with 252 bucket counters by default (~2.0 KiB of counters per slot):
```text
LN-API P50 0.000% P95 0.160% P99 0.228%
```
- **`O(1)` recording**, fixed memory — no sorting, no resizing, suitable for
hot paths.
- **Full `u64` range** — from nanoseconds to hours in a single histogram.
- **Sliding-window** aggregation via multi-slot mode.
## Usage
```rust
use base2histogram::Histogram;
let mut hist = Histogram::<()>::new();
hist.record(5);
hist.record(8);
hist.record(13);
hist.record_n(21, 3);
assert_eq!(hist.total(), 6);
assert_eq!(hist.percentile(0.50), 13);
assert_eq!(hist.percentile(0.99), 24);
```
## Sliding Window
Use `with_slots()` plus `advance()` when metrics should be aggregated over a
bounded set of recent windows:
```rust
use base2histogram::Histogram;
let mut hist = Histogram::<&'static str>::with_slots(2);
hist.record_n(10, 2);
hist.advance("warm");
hist.record_n(100, 3);
assert_eq!(hist.total(), 5);
hist.advance("steady");
// The oldest slot is evicted once the slot limit is reached.
assert_eq!(hist.total(), 3);
```
## Common Stats
```rust
use base2histogram::Histogram;
let mut hist = Histogram::<()>::new();
hist.record_n(20, 80);
hist.record_n(80, 20);
let stats = hist.percentile_stats();
assert_eq!(stats.samples, 100);
assert_eq!(stats.p50, 21);
assert_eq!(stats.p90, 87);
```
## Percentile Accuracy
**Trapezoidal interpolation** vs returning bucket midpoint (log-normal API latency, WIDTH=3, 1M samples):
| midpoint | 5.018% | 7.732% | 4.861% |
| trapezoidal | 0.000% | 0.080% | 0.086% |
The interpolation estimates a density gradient from neighbor buckets and solves the inverse CDF — no extra storage needed.
The `WIDTH` parameter controls bucket granularity: each bucket group uses
`WIDTH` bits, giving `2^(WIDTH-1)` buckets per group. Higher `WIDTH` means
finer resolution but more memory. The default is `WIDTH=3` (252 buckets).
Measured with 1,000,000 samples per distribution (`cargo run --bin accuracy`).
Memory below counts bucket counters only; each slot also keeps a small cached
region summary, total count, and optional metadata.
Error = `|exact - estimated| / exact × 100%`, shown at P50 / P95 / P99:
```text
W=1 W=2 W=3 W=4 W=5 W=6
--------------------------------------------------------------------------------
Uniform P50 0.105% 0.028% 0.012% 0.018% 0.019% 0.002%
P95 3.058% 2.551% 1.078% 0.292% 0.005% 0.005%
P99 4.473% 4.315% 3.744% 0.991% 0.306% 0.089%
LN-API P50 2.007% 0.274% 0.000% 0.000% 0.000% 0.000%
P95 8.927% 0.480% 0.160% 0.080% 0.040% 0.000%
P99 2.795% 0.998% 0.228% 0.000% 0.029% 0.029%
Bimodal P50 2.170% 0.592% 0.197% 0.197% 0.000% 0.000%
P95 1.181% 0.378% 0.032% 0.032% 0.038% 0.008%
P99 1.965% 1.733% 0.434% 0.045% 0.014% 0.013%
Expon P50 0.723% 0.000% 0.000% 0.000% 0.000% 0.000%
P95 0.932% 0.233% 0.067% 0.000% 0.033% 0.000%
P99 9.430% 0.672% 0.108% 0.022% 0.022% 0.022%
LN-DB P50 1.413% 0.034% 0.034% 0.000% 0.000% 0.000%
P95 2.976% 0.329% 0.006% 0.000% 0.019% 0.019%
P99 4.282% 0.502% 0.066% 0.007% 0.003% 0.062%
Sequent P50 0.092% 0.000% 0.000% 0.000% 0.000% 0.000%
P95 3.013% 2.539% 1.054% 0.316% 0.000% 0.000%
P99 4.455% 4.305% 3.732% 1.026% 0.315% 0.093%
Pareto P50 10.127% 0.633% 0.000% 0.000% 0.000% 0.000%
P95 2.174% 0.815% 0.136% 0.000% 0.000% 0.000%
P99 0.602% 0.185% 0.093% 0.093% 0.000% 0.000%
--------------------------------------------------------------------------------
Buckets 65 128 252 496 976 1920
Memory 520 B 1.0 KB 2.0 KB 3.9 KB 7.6 KB 15.0 KB
--------------------------------------------------------------------------------
```
Distributions: Uniform [0, 1M], Log-normal API latency (σ=0.5),
Bimodal cache hit/miss (90/10), Exponential IO wait, Log-normal DB query
(σ=1.0), Sequential [1..N], Pareto heavy tail (α=1.5).
## Algorithm
See [Algorithm](docs/algorithm.md) for a detailed description of the float-like bucket encoding, trapezoidal interpolation, and error bound analysis.
## Comparison Notes
- [base2histogram vs Heistogram](docs/base2histogram-vs-heistogram.md)
- [base2histogram vs H2 Histogram](docs/base2histogram-vs-h2histogram.md)
## License
Licensed under Apache-2.0. See [LICENSE](./LICENSE).