This crate provides approximate quantiles over data streams in a moderate amount of memory.
Order statistics is a rough business. Exact solutions are expensive in terms of memory and computation. Recent literature has advanced approximations but each have fundamental tradeoffs. This crate is intended to be a collection of approximate algorithms that provide guarantees around space consumption.
This is an implementation of the algorithm presented in Cormode, Korn, Muthukrishnan, Srivastava's paper "Effective Computation of Biased Quantiles over Data Streams". The ambition here is to approximate quantiles on a stream of data without having a boatload of information kept in memory.
Greenwald Khanna calculates epsilon-approximate quantiles.
If the desired quantile is phi, the epsilon-approximate
quantile is any element in the range of elements that rank
'histogram' approximates a distribution calculation by counting the number
of times samples fall into pre-configured bins. This implementation does not
require bins to be equally sized. The user must specify upper bounds on bins
Misra-Gries calculates an ε-approximate frequency count for a stream of N elements. The output is the k most frequent elements.