[][src]Struct pdatastructs::reservoirsampling::ReservoirSampling

pub struct ReservoirSampling<T, R> where
    R: Rng
{ /* fields omitted */ }

A ReservoirSampling is a data-structure that samples a fixed number of elements from a data stream, without the need to memorize any additional elements.

Examples

use pdatastructs::reservoirsampling::ReservoirSampling;
use pdatastructs::rand::{ChaChaRng, SeedableRng};

// set up sampler for 10 elements
let rng = ChaChaRng::from_seed([0; 32]);
let mut sampler = ReservoirSampling::new(10, rng);

// add some data
for i in 0..1000 {
    let x = i % 2;
    sampler.add(x);
}

// later
assert_eq!(sampler.reservoir().iter().sum::<i64>(), 5);

Applications

  • Getting a set of representative samples of a large data set.
  • Collecting distribution information of a stream of data, which can then recover approximative histograms or CDFs (Cumulative Distribution Function).

How It Works

Setup

The sampler is initialized with an empty reservoir vector and a potential capacity k.

Insertion

If the sampler did see less than k elements, the element is appended to the reservoir vector, so no sampling takes place (full take).

After the reservoir is filled up with k elements, the "normal reservoir sampling" phase starts until t = k * 4 elements were processed. In this phase, every incoming element replaces may replace a random, existing element in the reservoir, iff a random number x in [1, i] (i being the number of seen elements) is smaller or equal to k.

In the final phase, not every new incoming element may processed. Instead, some elements are skipped. The number of skipped elements is chosen randomly but with respect to the number of seen elements. Every non-skipped element replaces a random, existing element of the reservoir.

Read-Out

The reservoir vector is just returned, no further calculation is required.

See Also

  • std::vec::Vec: can be used to get an exact sampling but needs to store all elements of the incoming data stream

References

Methods

impl<T, R> ReservoirSampling<T, R> where
    R: Rng
[src]

pub fn new(k: usize, rng: R) -> Self[src]

Create new reservoir sampler that keeps k samples and uses rng for its random decisions.

pub fn k(&self) -> usize[src]

Number of samples that should be kept.

pub fn reservoir(&self) -> &Vec<T>[src]

Read-only copy of the reservoir. Contains at most k entries.

pub fn i(&self) -> usize[src]

Number of data points seen.

pub fn add(&mut self, obj: T)[src]

Observe new data point.

pub fn is_empty(&self) -> bool[src]

Checks if reservoir is empty (i.e. no data points where observed)

pub fn clear(&mut self)[src]

Clear sampler state. It now behaves like a fresh one (i.e. no data points seen so far).

Trait Implementations

impl<T: Clone, R: Clone> Clone for ReservoirSampling<T, R> where
    R: Rng
[src]

fn clone_from(&mut self, source: &Self)
1.0.0
[src]

Performs copy-assignment from source. Read more

impl<T, R> Extend<T> for ReservoirSampling<T, R> where
    R: Rng
[src]

impl<T, R> Debug for ReservoirSampling<T, R> where
    R: Rng
[src]

Auto Trait Implementations

impl<T, R> Send for ReservoirSampling<T, R> where
    R: Send,
    T: Send

impl<T, R> Sync for ReservoirSampling<T, R> where
    R: Sync,
    T: Sync

Blanket Implementations

impl<T, U> Into for T where
    U: From<T>, 
[src]

impl<T> ToOwned for T where
    T: Clone
[src]

type Owned = T

The resulting type after obtaining ownership.

impl<T> From for T[src]

impl<T, U> TryFrom for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T> Borrow for T where
    T: ?Sized
[src]

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> BorrowMut for T where
    T: ?Sized
[src]

impl<T, U> TryInto for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.