Module quantiles::greenwald_khanna
[−]
[src]
Greenwald Khanna calculates epsilon-approximate quantiles.
If the desired quantile is phi, the epsilon-approximate
quantile is any element in the range of elements that rank
between lbound((phi-epsilon) x N)
and lbound((phi+epsilon) x N)
terminology from the paper:
- S: set of observations
- n: number of observations in S
- v[i]: observation i in S
- r: rank of observation in S from 1 to n.
r_min(v[i])
: lower bound on rank r of v[i]r_max(v[i])
: upper bound on rank r of v[i]g[i] = r_min(v[i]) - r_min(v[i - 1])
delta[i] = r_max(v[i]) - r_min(v[i])
t[i] = tuple(v[i], g[i], delta[i])
- phi: quantile as a real number in the range [0,1]
- r: ubound(phi * n)
identities:
- r_min(v[i]) = forall j<=i sum of g[j]
- r_max(v[i]) = ( forall j<=i sum of g[j] ) + delta[i]
- g[i] + delta[i] - 1 is an upper bound on the total number of observations
- between v[i] and v[i-1]
- sum of g[i] = n
results:
- max_i(g[i] + delta[i]) <= 2 * epsilon * n
- a tuple is full if g[i] + delta[i] = floor(2 * epsilon * n)
@inproceedings{Greenwald:2001:SOC:375663.375670, author = {Greenwald, Michael and Khanna, Sanjeev}, title = {Space-efficient Online Computation of Quantile Summaries}, booktitle = {Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data}, series = {SIGMOD '01}, year = {2001}, isbn = {1-58113-332-4}, location = {Santa Barbara, California, USA}, pages = {58--66}, numpages = {9}, url = {http://doi.acm.org/10.1145/375663.375670}, doi = {10.1145/375663.375670}, acmid = {375670}, publisher = {ACM}, address = {New York, NY, USA}, }
Examples
use quantiles::greenwald_khanna::*; let epsilon = 0.01; let mut stream = Stream::new(epsilon); let n = 1001; for i in 1..n { stream.insert(i); } let in_range = |phi: f64, value: u32| { let lower = ((phi - epsilon) * (n as f64)) as u32; let upper = ((phi + epsilon) * (n as f64)) as u32; (epsilon > phi || lower <= value) && value <= upper }; assert!(in_range(0f64, *stream.quantile(0f64))); assert!(in_range(0.1f64, *stream.quantile(0.1f64))); assert!(in_range(0.2f64, *stream.quantile(0.2f64))); assert!(in_range(0.3f64, *stream.quantile(0.3f64))); assert!(in_range(0.4f64, *stream.quantile(0.4f64))); assert!(in_range(1f64, *stream.quantile(1f64)));
Structs
Stream |
The summary S of the observations seen so far. |
Tuple |
3-tuple of a value v[i], g[i] and delta[i]. |
Functions
find_insert_pos |
Locates the proper position of v in a vector vs such that when v is inserted at position i, it is less then the element at i+1 if any, and greater than or equal to the element at i-1 if any. |
find_insert_pos_linear |
Locates the proper position of v in a vector vs such that when v is inserted at position i, it is less then the element at i+1 if any, and greater than or equal to the element at i-1 if any. Works by scanning the slice from start to end. |