infinity_sampler/math.rs
1//! # Algorithm explainer
2//!
3//! The insertion indexes are from _0_ to _N-1_ for a buffer of size _N_ such that:
4//! * The first _N_ indexes are _0..=N-1_
5//! * The rest of the indexes are generated in a loop of size _2N_
6//! * The older the value, the exponentially less likely it is to be kept
7//!
8//! The algorithm is as follows:
9//!
10//! * Consider a buffer of size _N_ and a stream of values _V<sub>i</sub>_.
11//! * Store the value _V<sub>0</sub>_ at index _0_.
12//! * For _V<sub>i</sub>_, first make a sampling decision:
13//! * Consider the significant bits of the binary representation of _i_.
14//! * Remove _log<sub>2</sub>N_ most significant bits.
15//! * If the remainder is not zero, discard _V<sub>i</sub>_.
16//! * Store _V<sub>i</sub>_ at index _(i - 1) mod (N - 1) + 1_.
17//!
18//! This has following properties:
19//!
20//! * The sampling rate is halved every time another _N/2_ values have been selected.
21//! * At all times, if the reservoir has observed _M_ values, the buffer will contain an even spread of samples with the distance between samples being exactly either 2 <sup>⌊log<sub>2</sub>(M/N)⌋</sup> or 2<sup>⌈log<sub>2</sub>(M/N)⌉</sup> i.e. nearest powers of 2 to _M/N_.
22//! * Each time exactly _N * 2<sup>M</sup>_ values observed by the Reservoir (i.e. _N + MN/2_ values have been positively sampled), the buffer will contain perfectly evenly spead values with indexes _2<sup>M</sup>x_.
23//!
24//! ## Example for N=16
25//!
26//! Consider the following chart showing the insertion indexes for a 16-element buffer. The numbers
27//! denote value indexes as seen by the Reservoir and only the values selected by the sampler are shown. The time moves from left to right, top to bottom.
28//!
29//! ```ignore
30//! |---------------- buffer -----------------|
31//! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // pattern 0 (initial loop)
32//! - 16 18 20 22 24 26 28 30 // pattern 1 / group 0
33//! | 32 36 40 44 // pattern 2 / group 0
34//! | 48 52 56 60 / group 1
35//! | 64 72 // pattern 3 / group 0
36//! r| 80 88 / group 1
37//! e| 96 104 / group 2
38//! p| 112 120 / group 3
39//! e| 128 // pattern 4 / group 0
40//! a| 144 / group 1
41//! t| 160 / group 2
42//! s| 176 / group 3
43//! | 192 / group 4
44//! | 208 / group 5
45//! | 224 / group 6
46//! - 240 / group 7
47//! ```
48//!
49//! ### Item insertion indices
50//!
51//! ```ignore
52//! Pattern 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
53//! Pattern 1: 1 3 5 7 9 11 13 15
54//! Pattern 2: 2 6 10 14
55//! 3 7 11 15
56//! Pattern 3: 4 12
57//! 5 13
58//! 6 14
59//! 7 15
60//! Pattenr 4: 8
61//! 9
62//! 10
63//! 11
64//! 12
65//! 13
66//! 14
67//! 15
68//! ```
69//!
70//! ### Buffer contents (sorted)
71//!
72//! * After 16 items observed: `0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15`
73//! * After 32 items observed: `0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30`
74//! * After 64 items observed: `0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60`
75//!