[−][src]Struct pdatastructs::reservoirsampling::ReservoirSampling
A ReservoirSampling is a data-structure that samples a fixed number of elements from a data stream, without the need to memorize any additional elements.
Examples
use pdatastructs::reservoirsampling::ReservoirSampling; use pdatastructs::rand::{ChaChaRng, SeedableRng}; // set up sampler for 10 elements let rng = ChaChaRng::from_seed([0; 32]); let mut sampler = ReservoirSampling::new(10, rng); // add some data for i in 0..1000 { let x = i % 2; sampler.add(x); } // later assert_eq!(sampler.reservoir().iter().sum::<i64>(), 5);
Applications
- Getting a set of representative samples of a large data set.
- Collecting distribution information of a stream of data, which can then recover approximative histograms or CDFs (Cumulative Distribution Function).
How It Works
Setup
The sampler is initialized with an empty reservoir vector and a potential capacity k
.
Insertion
If the sampler did see less than k
elements, the element is appended to the reservoir vector,
so no sampling takes place (full take).
After the reservoir is filled up with k
elements, the "normal reservoir sampling" phase
starts until t = k * 4
elements were processed. In this phase, every incoming element
replaces may replace a random, existing element in the reservoir, iff a random number x in [1, i]
(i being the number of seen elements) is smaller or equal to k
.
In the final phase, not every new incoming element may processed. Instead, some elements are skipped. The number of skipped elements is chosen randomly but with respect to the number of seen elements. Every non-skipped element replaces a random, existing element of the reservoir.
Read-Out
The reservoir vector is just returned, no further calculation is required.
See Also
std::vec::Vec
: can be used to get an exact sampling but needs to store all elements of the incoming data stream
References
Methods
impl<T, R> ReservoirSampling<T, R> where
R: Rng,
[src]
R: Rng,
pub fn new(k: usize, rng: R) -> Self
[src]
Create new reservoir sampler that keeps k
samples and uses rng
for its random
decisions.
pub fn k(&self) -> usize
[src]
Number of samples that should be kept.
pub fn reservoir(&self) -> &Vec<T>
[src]
Read-only copy of the reservoir. Contains at most k
entries.
pub fn i(&self) -> usize
[src]
Number of data points seen.
pub fn add(&mut self, obj: T)
[src]
Observe new data point.
pub fn is_empty(&self) -> bool
[src]
Checks if reservoir is empty (i.e. no data points where observed)
pub fn clear(&mut self)
[src]
Clear sampler state. It now behaves like a fresh one (i.e. no data points seen so far).
Trait Implementations
impl<T: Clone, R: Clone> Clone for ReservoirSampling<T, R> where
R: Rng,
[src]
R: Rng,
fn clone(&self) -> ReservoirSampling<T, R>
[src]
fn clone_from(&mut self, source: &Self)
1.0.0[src]
Performs copy-assignment from source
. Read more
impl<T, R> Extend<T> for ReservoirSampling<T, R> where
R: Rng,
[src]
R: Rng,
fn extend<S: IntoIterator<Item = T>>(&mut self, iter: S)
[src]
impl<T, R> Debug for ReservoirSampling<T, R> where
R: Rng,
[src]
R: Rng,
Auto Trait Implementations
impl<T, R> Send for ReservoirSampling<T, R> where
R: Send,
T: Send,
R: Send,
T: Send,
impl<T, R> Sync for ReservoirSampling<T, R> where
R: Sync,
T: Sync,
R: Sync,
T: Sync,
Blanket Implementations
impl<T, U> Into for T where
U: From<T>,
[src]
U: From<T>,
impl<T> ToOwned for T where
T: Clone,
[src]
T: Clone,
type Owned = T
The resulting type after obtaining ownership.
fn to_owned(&self) -> T
[src]
fn clone_into(&self, target: &mut T)
[src]
impl<T> From for T
[src]
impl<T, U> TryFrom for T where
U: Into<T>,
[src]
U: Into<T>,
type Error = Infallible
The type returned in the event of a conversion error.
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
[src]
impl<T> Borrow for T where
T: ?Sized,
[src]
T: ?Sized,
impl<T> Any for T where
T: 'static + ?Sized,
[src]
T: 'static + ?Sized,
impl<T> BorrowMut for T where
T: ?Sized,
[src]
T: ?Sized,
fn borrow_mut(&mut self) -> &mut T
[src]
impl<T, U> TryInto for T where
U: TryFrom<T>,
[src]
U: TryFrom<T>,