This library contains tools for simulating space efficient histograms. For the purposes of this library, a histogram is a map from labels to frequencies, both of which must be numerical.
We provide is a standard histogram implementation (
stores label-frequency pairs at face value using a HashMap.
Furthermore, we provide three additional histogram implementations that use either
strictly less memory than
StandardHistogram or use a fixed amount of memory. The
optimizations are done across two dimensions, the label storage (referred to as "compaction")
and the frequency storage (referred to as "compression").
Hence the name "C-Squared Histograms". The three implementations are as follows:
CompressedHistogram- This implementation uses strictly less space than a
StandardHistogramby approximating the frequencies
CompactHistogram- This implementation consumes a fixed amount of space depending on a few precision parameters. It saves space by approximating labels.
C2Histogram- This implementation also utilizes a fixed amount of space, usually much less than a CompactHistogram. It approximates both labels and frequencies.
Each of these implementations is parameterized such that the trade-off between precision and memory performance is directly controlled by the user.
To construct a
StandardHistogram, use the function
the other histograms can be derived from that standard histogram using the conversion functions
Space saving histogram implementations
General trait definitions and