Module quantiles::misra_gries [] [src]

Misra-Gries calculates an ε-approximate frequency count for a stream of N elements. The output is the k most frequent elements.

  1. the approximate count f'[e] is smaller than the true frequency f[e] of e, but by at most εN, i.e., (f[e] - εN) ≤ f'[e] ≤ f[e]
  2. any element e with a frequency f[e] ≥ εN appears in the result set

The error bound ε = 1/(k+1) where k is the number of counters used in the algorithm. When k = 1 i.e. a single counter, the algorithm is equivalent to the Boyer-Moore Majority algorithm.

If you want to check for elements that appear at least εN times, you will want to perform a second pass to calculate the exact frequencies of the values in the result set which can be done in constant space.

@article{MISRA1982143, title = "Finding repeated elements", journal = "Science of Computer Programming", volume = "2", number = "2", pages = "143 - 152", year = "1982", issn = "0167-6423", doi = "", url = "", author = "J. Misra and David Gries", }


use quantiles::misra_gries::*;

let k: usize = 3;
let numbers: Vec<u32> = vec![1,3,2,1,3,4,3,1,2,1];
let counts = misra_gries(numbers.iter(), k);
let bound = numbers.len() / (k+1);
let in_range = |f_expected: usize, f_approx: usize| {
  f_approx <= f_expected &&
  (bound >= f_expected || f_approx >= (f_expected - bound))
assert!(in_range(4usize, *counts.get(&1).unwrap()));
assert!(in_range(2usize, *counts.get(&2).unwrap()));
assert!(in_range(3usize, *counts.get(&3).unwrap()));



Calculates the k most frequent elements in the iterable stream of elements stream using an ε-approximate frequency count where ε = 1/(k+1)