Hstats: Online Statistics and Histograms for Data Streams
A Rust library for computing histograms and statistics from data streams without loading entire datasets into memory. Designed for parallel workloads where independent histograms can be merged into a single result.
Features
- Online computation - processes values one at a time, constant memory usage
- Parallel-friendly - build histograms per-thread, then
merge()them - Underflow/overflow tracking - values outside
[start, end)are counted separately - Statistics - min, max, mean, and standard deviation via Welford's algorithm (rolling-stats)
Displaytrait - configurable text-based histogram output with custom precision and bar charactersno_stdcompatible - works inno_stdenvironments that supportalloc
Getting Started
Add the following to your Cargo.toml:
[]
= "0.2.0"
Usage
use Hstats;
// Create a histogram with 10 bins over the range [0.0, 100.0)
let mut hist = new;
// Add values
for value in &
// Query statistics
println!;
println!;
// Print the histogram
println!;
Parallel usage
Build histograms independently on each thread, then merge:
// On each thread:
let mut local = new;
for value in chunk
// After all threads finish, merge results:
let combined = histograms.into_iter
.reduce
.unwrap;
See examples/single-thread.rs and examples/multi-thread.rs for complete runnable examples.
Examples
Run the examples with:
cargo run --example single-thread --release
cargo run --example multi-thread --release
Sample output from the multi-thread example:
Number of random samples: 50000000
Number of bins: 30
Start: -8
End: 10
Thread count: 20
Chunk size: 2500000
Number of hstats to merge: 20
Start | End
------|-------
-inf | -8.00 | 21553 (0.04%)
-8.00 | -7.40 | 21752 (0.04%)
-7.40 | -6.80 | 40523 (0.08%)
-6.80 | -6.20 | ░ 73078 (0.15%)
-6.20 | -5.60 | ░ 125206 (0.25%)
-5.60 | -5.00 | ░░░ 207593 (0.42%)
-5.00 | -4.40 | ░░░░ 331470 (0.66%)
-4.40 | -3.80 | ░░░░░░░ 508330 (1.02%)
-3.80 | -3.20 | ░░░░░░░░░░░ 745836 (1.49%)
-3.20 | -2.60 | ░░░░░░░░░░░░░░░ 1054228 (2.11%)
-2.60 | -2.00 | ░░░░░░░░░░░░░░░░░░░░░ 1433304 (2.87%)
-2.00 | -1.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1868758 (3.74%)
-1.40 | -0.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2339425 (4.68%)
-0.80 | -0.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2819448 (5.64%)
-0.20 | 0.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3261165 (6.52%)
0.40 | 1.00 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3623683 (7.25%)
1.00 | 1.60 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3875841 (7.75%)
1.60 | 2.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3980760 (7.96%)
2.20 | 2.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3928130 (7.86%)
2.80 | 3.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3725868 (7.45%)
3.40 | 4.00 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3393196 (6.79%)
4.00 | 4.60 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2971132 (5.94%)
4.60 | 5.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2497847 (5.00%)
5.20 | 5.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2022084 (4.04%)
5.80 | 6.40 | ░░░░░░░░░░░░░░░░░░░░░░░ 1569143 (3.14%)
6.40 | 7.00 | ░░░░░░░░░░░░░░░░░ 1171523 (2.34%)
7.00 | 7.60 | ░░░░░░░░░░░░ 841536 (1.68%)
7.60 | 8.20 | ░░░░░░░░ 579675 (1.16%)
8.20 | 8.80 | ░░░░░ 383658 (0.77%)
8.80 | 9.40 | ░░░ 243884 (0.49%)
9.40 | 10.00 | ░░ 148977 (0.30%)
10.00 | inf | ░░ 191394 (0.38%)
Total Count: 50000000 Min: -14.19 Max: 18.04 Mean: 2.00 Std Dev: 3.00
real 0m1.905s
user 0m9.727s
sys 0m0.127s
License
hstats is licensed under your choice of either the Apache License, Version 2.0, or the MIT
license.