Struct quantogram::Quantogram

source · [−]

pub struct Quantogram { /* private fields */ }

Expand description

Provides a weighted Histogram of f64 values for computing approximate quantiles. This guarantees a configurable maximum absolute relative error and uses sparse storage to reduce memory usage.

Worst case accuracy defaults to one percent (0.01) absolute relative error. The error is unbiased, uniform for the entire range of numbers. The error for quantiles 0 and 1 (the minimum and maximum, respectively) is guaranteed to be zero, except if either of those values is removed.

If all inserted values are given a weight of one, this behaves as an unweighted (normal) histogram.

Samples may be added or removed. However, removing a sample that equals the minimum or maximum will cause those values to be replaced by the center value of the appropriate extreme histogram bucket.

The full valid range of floats is divided into two levels of buckets. For the default case with 1% error, here is what that means:

Top. The top level divides the number range into buckets for each power of two and between positive and negative numbers. It has a maximum of 508 buckets in two sparse lists. f64 values can range from 2^-126 to 2^127, thus there are a maximum of 254 buckets for positive numbers and 254 buckets for negatives. Bottom. The second level of buckets are in a single sparse list spanning the full range from the smallest negative number to the largest positive number. Each power of two range is broken into 35 buckets whose size varies exponentially. Each bucket is larger than the previous by a factor of 1.02 (or smaller by 1.02, over the range of negative values). The value of 1.02 was chosen because 1.02^35 = 1.999889553, which differs from 2 by 0.00011. That means that the 35th bucket for each power of two is slightly larger than the rest, but not by enough to wreck the error guarantee. There are a maximum of 1 + 508*35 buckets in this list, or 17,781. (The “1” is for the zero bucket.)

A bucket is not created until at least one value is added to it. Removing the last item in a bucket will not cause its memory to be freed; its weight will be set to zero.

(If you are familiar with the Rule of 70 used to estimate the doubling period for a given interest rate, it was probably chosen because of this nice property of 1.02.)

The error rate of 0.01 and the bin scale factor of 1.02 are related in this way:

 
            (1 + error)     (1 + 1/101)      102
   scale = ------------- = ------------- = ------- = 1.02
            (1 - error)     (1 - 1/101)      100

So technically, the worst case error is 1/101, or 0.99%, not 1%, but the math is easier when using 1.02 instead of 1.020202020202 and the error on the last bucket is ideal. (A smaller error in the last bucket is available for error = 1/176, but the memory requirements are much higher.)

Usage:

Typical usage (unweighted samples with the default accuracy of 1%):

 use quantogram::Quantogram;
 let mut q = Quantogram::new();
 q.add(10.0);
 q.add(40.0);
 q.add(20.0);
 q.add(30.0);
 q.add(50.0);

 assert_eq!(q.min().unwrap(), 10.0);
 assert_eq!(q.max().unwrap(), 50.0);
 assert_eq!(q.mean().unwrap(), 30.0);
 assert_eq!(q.median().unwrap(), 30.0);
 assert_eq!(q.quantile(0.75).unwrap(), 40.0);

 q.remove(10.0);
 q.remove(20.0);
 assert_eq!(q.mean().unwrap(), 40.0);

Notes:

Coarse bins are for powers of two not some other value because getting the largest power of two less than or equal to a number calls a fast intrinsic function. This makes assigning a number to a bin very fast.
When inquiring about a quantile, a value will be returned so long as one number in range is added. NANs and Infinities will be disregarded. The NANs and Infinities are available as separate counts.
Unbounded errors are possible in the edge case of a large gap in the middle of the data. Take the case of the median. If there are an even number of items in the data with a large gap in the exact middle, then the proper formula for the median is the mean of the last value below the gap and the first value above the gap. To correct for this, use the fussy_quantile method. It will probe for quantiles at Φ + ε and Φ - ε for a small value of ε, and average the pair that best span the gap. If the histogram is unweighted (all weights are one) then the value of ε should be 1/2N, where N is the number of items already added to the Histogram. If the samples are weighted, not sure what to do.
If one sample is in a bin, the value for the bin will be set to the accurate sample, not the midpoint of the range for that bin. If a second sample is added, the bin value will be set to the midpoint. Consequently, bins with one sample added have no error.

Struct quantogram::Quantogram

Implementations

impl Quantogram

pub fn new() -> Self

pub fn with_configuration( growth: f64, bins: usize, smallest_power: isize, largest_power: isize) -> Self

pub fn replace_hsm_cache(&mut self, new_cache: HalfSampleModeCache)

pub fn add(&mut self, sample: f64)

pub fn remove(&mut self, sample: f64)

pub fn add_weighted(&mut self, sample: f64, weight: f64) -> f64

pub fn add_unweighted_samples<'a, S>( &mut self, samples: impl Iterator<Item = &'a S>) where S: 'a + Into<f64> + Copy,

pub fn mean(&self) -> Option<f64>

pub fn min(&self) -> Option<f64>

pub fn max(&self) -> Option<f64>

pub fn count(&self) -> usize

pub fn finite(&self) -> f64

pub fn zero(&self) -> f64

pub fn nan(&self) -> f64

pub fn median(&self) -> Option<f64>

pub fn variance(&self) -> f64

pub fn stddev(&self) -> Option<f64>

pub fn mode(&self) -> Vec<f64>

pub fn hsm(&self) -> Option<f64>

pub fn quantile(&self, phi: f64) -> Option<f64>

pub fn fussy_quantile(&self, phi: f64, threshold_ratio: f64) -> Option<f64>

pub fn quantile_at(&self, value: f64) -> Option<(f64, f64)>

pub fn range(&self) -> Option<f64>

pub fn q1(&self) -> Option<f64>

pub fn q3(&self) -> Option<f64>

pub fn iqr(&self) -> Option<f64>

pub fn quartile_deviation(&self) -> Option<f64>

pub fn coeff_of_range(&self) -> Option<f64>

pub fn coeff_of_quartile_dev(&self) -> Option<f64>

pub fn coeff_of_stddev(&self) -> Option<f64>

pub fn coeff_of_variation(&self) -> Option<f64>

pub fn size(&self) -> usize

pub fn power_of_two(sample: f64) -> Option<isize>

Trait Implementations

impl Debug for Quantogram

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations

impl !RefUnwindSafe for Quantogram

impl Send for Quantogram

impl !Sync for Quantogram

impl Unpin for Quantogram

impl UnwindSafe for Quantogram

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for T where V: MultiLane<T>,

fn vzip(self) -> V

pub fn with_configuration(
growth: f64,
bins: usize,
smallest_power: isize,
largest_power: isize
) -> Self

pub fn add_unweighted_samples<'a, S>(
&mut self,
samples: impl Iterator<Item = &'a S>
) where
S: 'a + Into<f64> + Copy,

pub fn quantile_at(&self, value: f64) -> Option<(f64, f64 )>

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,

impl<V, T> VZip<V> for T where
V: MultiLane<T>,