infinity_sampler/
math.rs

1//! # Algorithm explainer
2//!
3//! The insertion indexes are from _0_ to _N-1_ for a buffer of size _N_ such that:
4//! * The first _N_ indexes are _0..=N-1_
5//! * The rest of the indexes are generated in a loop of size _2N_
6//! * The older the value, the exponentially less likely it is to be kept
7//!
8//! The algorithm is as follows:
9//!
10//! * Consider a buffer of size _N_ and a stream of values _V<sub>i</sub>_.
11//! * Store the value _V<sub>0</sub>_ at index _0_.
12//! * For _V<sub>i</sub>_, first make a sampling decision:
13//!     * Consider the significant bits of the binary representation of _i_.
14//!     * Remove _log<sub>2</sub>N_ most significant bits.
15//!     * If the remainder is not zero, discard _V<sub>i</sub>_.
16//! * Store _V<sub>i</sub>_ at index _(i - 1) mod (N - 1) + 1_.
17//!
18//! This has following properties:
19//!
20//! * The sampling rate is halved every time another _N/2_ values have been selected.
21//! * At all times, if the reservoir has observed _M_ values, the buffer will contain an even spread of samples with the distance between samples being exactly either 2 <sup>⌊log<sub>2</sub>(M/N)⌋</sup> or 2<sup>⌈log<sub>2</sub>(M/N)⌉</sup> i.e. nearest powers of 2 to _M/N_.
22//! * Each time exactly _N * 2<sup>M</sup>_ values observed by the Reservoir (i.e. _N + MN/2_ values have been positively sampled), the buffer will contain perfectly evenly spead values with indexes _2<sup>M</sup>x_.
23//!
24//! ## Example for N=16
25//!
26//! Consider the following chart showing the insertion indexes for a 16-element buffer. The numbers
27//! denote value indexes as seen by the Reservoir and only the values selected by the sampler are shown. The time moves from left to right, top to bottom.
28//!
29//! ```ignore
30//!     |----------------   buffer   -----------------|
31//!     0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15     // pattern 0 (initial loop)
32//!  -     16    18    20    22    24    26    28    30     // pattern 1 / group 0
33//!  |        32          36          40          44        // pattern 2 / group 0
34//!  |           48          52          56          60                  / group 1
35//!  |              64                      72              // pattern 3 / group 0
36//! r|                 80                      88                        / group 1
37//! e|                    96                      104                    / group 2
38//! p|                       112                     120                 / group 3
39//! e|                          128                         // pattern 4 / group 0
40//! a|                             144                                   / group 1
41//! t|                                160                                / group 2
42//! s|                                   176                             / group 3
43//!  |                                      192                          / group 4
44//!  |                                         208                       / group 5
45//!  |                                            224                    / group 6
46//!  -                                               240                 / group 7
47//! ```
48//!
49//! ### Item insertion indices
50//!
51//! ```ignore
52//! Pattern 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
53//! Pattern 1: 1 3 5 7 9 11 13 15
54//! Pattern 2: 2 6 10 14
55//!            3 7 11 15
56//! Pattern 3: 4 12
57//!            5 13
58//!            6 14
59//!            7 15
60//! Pattenr 4: 8
61//!            9
62//!            10
63//!            11
64//!            12
65//!            13
66//!            14
67//!            15
68//! ```
69//!
70//! ### Buffer contents (sorted)
71//!
72//! * After 16 items observed: `0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15`
73//! * After 32 items observed: `0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30`
74//! * After 64 items observed: `0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60`
75//!