Crate cumulfreqtable

source ·
Expand description

A cumulative frequency table stores and/or compute the absolute frequency and the cumulative frequency for ever position in the table.

Formal definition from Wikipedia:

In statistics, the frequency (or absolute frequency) of an event i is the number nᵢ of times the observation has occurred/recorded in an experiment or study. The cumulative frequency is the total of the absolute frequencies of all events at or below a certain point in an ordered list of events.

Example

use cumulfreqtable::CumulFreqTable; // Import the trait in scope.

let mut table = cumulfreqtable::BinaryIndexedTree::new(16);
table.inc(0);
table.inc(3);
table.add(5, 3);

assert_eq!(table.freq(0), 1);
assert_eq!(table.freq(3), 1);
assert_eq!(table.freq(5), 3);
assert_eq!(table.freq(6), 0);

assert_eq!(table.sum(0), 1);
assert_eq!(table.sum(3), 2);
assert_eq!(table.sum(5), 5);
assert_eq!(table.sum(6), 5);

assert_eq!(table.total(), 5);

Implementations

This crate offers two implementations of the CumulFreqTable trait:

  • FreqTable: stores the absolute frequency of every positions in an array O(1) but computes the cumulative frequency in O(len). Best for small tables.
  • BinaryIndexedTree: stores the cumulative frequency of every positions in a binary indexed tree. The runtime complexity is O(㏒₂ len) for all operations.

There is also cumulfreq_array::CumulFreqTable that computes and stores the cumulative frequency of every positions on update. It’s only purpose is to validate the benchmark results.

Benchmarks

For small tables, FreqTable is slightly faster than BinaryIndexedTree, presumably because it fits within the CPU cache. The cross-over point depends on the computer.

You can run the benchmarks yourself with cargo criterion.

Notice that the graphs below have logarithmic scales.

Intel i7-4790K (2014)

On a 2014 Intel i7-4790K with dual-channel DDR3 1600 Mhz, the binary_indexed_tree becomes faster somewhere above 256×64 bits elements.

Gnuplot Produced by GNUPLOT 5.4 patchlevel 2 0.01 0.1 1 10 10 100 1000 10000 Average time (µs) Input Size (Elements) freq_array freq_array gnuplot_plot_2 cumulfreq_array cumulfreq_array gnuplot_plot_4 binary_indexed_tree binary_indexed_tree gnuplot_plot_6 inc+cumul+total: Comparison
AMD Ryzen 7 PRO 5850U (2021)

On a 2021 AMD Ryzen 7 PRO 5850U with dual-channel DDR4 3200 Mhz, the binary_indexed_tree becomes faster somewhere above 512×64 bits elements.

Gnuplot Produced by GNUPLOT 5.4 patchlevel 2 0.01 0.1 1 10 10 100 1000 10000 Average time (µs) Input Size (Elements) freq_array freq_array gnuplot_plot_2 cumulfreq_array cumulfreq_array gnuplot_plot_4 binary_indexed_tree binary_indexed_tree gnuplot_plot_6 inc+cumul+total: Comparison

Re-exports

Modules

Traits

  • A cumulative frequency table maintains the cumulative frequency for every positions in the table. Different implementations offer different performance characteristics.