Module quantiles::greenwald_khanna [] [src]

Greenwald Khanna calculates epsilon-approximate quantiles. If the desired quantile is phi, the epsilon-approximate quantile is any element in the range of elements that rank between lbound((phi-epsilon) x N) and lbound((phi+epsilon) x N)

terminology from the paper:

  • S: set of observations
  • n: number of observations in S
  • v[i]: observation i in S
  • r: rank of observation in S from 1 to n.
  • r_min(v[i]): lower bound on rank r of v[i]
  • r_max(v[i]): upper bound on rank r of v[i]
  • g[i] = r_min(v[i]) - r_min(v[i - 1])
  • delta[i] = r_max(v[i]) - r_min(v[i])
  • t[i] = tuple(v[i], g[i], delta[i])
  • phi: quantile as a real number in the range [0,1]
  • r: ubound(phi * n)

identities:

  • r_min(v[i]) = forall j<=i sum of g[j]
  • r_max(v[i]) = ( forall j<=i sum of g[j] ) + delta[i]
  • g[i] + delta[i] - 1 is an upper bound on the total number of observations
  • between v[i] and v[i-1]
  • sum of g[i] = n

results:

  • max_i(g[i] + delta[i]) <= 2 * epsilon * n
  • a tuple is full if g[i] + delta[i] = floor(2 * epsilon * n)

@inproceedings{Greenwald:2001:SOC:375663.375670, author = {Greenwald, Michael and Khanna, Sanjeev}, title = {Space-efficient Online Computation of Quantile Summaries}, booktitle = {Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data}, series = {SIGMOD '01}, year = {2001}, isbn = {1-58113-332-4}, location = {Santa Barbara, California, USA}, pages = {58--66}, numpages = {9}, url = {http://doi.acm.org/10.1145/375663.375670}, doi = {10.1145/375663.375670}, acmid = {375670}, publisher = {ACM}, address = {New York, NY, USA}, }

Examples

use quantiles::greenwald_khanna::*;

let epsilon = 0.01;

let mut stream = Stream::new(epsilon);

let n = 1001;
for i in 1..n {
    stream.insert(i);
}
let in_range = |phi: f64, value: u32| {
  let lower = ((phi - epsilon) * (n as f64)) as u32;
  let upper = ((phi + epsilon) * (n as f64)) as u32;
  (epsilon > phi || lower <= value) && value <= upper
};
assert!(in_range(0f64, *stream.quantile(0f64)));
assert!(in_range(0.1f64, *stream.quantile(0.1f64)));
assert!(in_range(0.2f64, *stream.quantile(0.2f64)));
assert!(in_range(0.3f64, *stream.quantile(0.3f64)));
assert!(in_range(0.4f64, *stream.quantile(0.4f64)));
assert!(in_range(1f64, *stream.quantile(1f64)));

Structs

Stream

The summary S of the observations seen so far.

Tuple

3-tuple of a value v[i], g[i] and delta[i].

Functions

find_insert_pos

Locates the proper position of v in a vector vs such that when v is inserted at position i, it is less then the element at i+1 if any, and greater than or equal to the element at i-1 if any.

find_insert_pos_linear

Locates the proper position of v in a vector vs such that when v is inserted at position i, it is less then the element at i+1 if any, and greater than or equal to the element at i-1 if any. Works by scanning the slice from start to end.