Expand description
§Algorithm explainer
The insertion indexes are from 0 to N-1 for a buffer of size N such that:
- The first N indexes are 0..=N-1
- The rest of the indexes are generated in a loop of size 2N
- The older the value, the exponentially less likely it is to be kept
The algorithm is as follows:
- Consider a buffer of size N and a stream of values Vi.
- Store the value V0 at index 0.
- For Vi, first make a sampling decision:
- Consider the significant bits of the binary representation of i.
- Remove log2N most significant bits.
- If the remainder is not zero, discard Vi.
- Store Vi at index (i - 1) mod (N - 1) + 1.
This has following properties:
- The sampling rate is halved every time another N/2 values have been selected.
- At all times, if the reservoir has observed M values, the buffer will contain an even spread of samples with the distance between samples being exactly either 2 ⌊log2(M/N)⌋ or 2⌈log2(M/N)⌉ i.e. nearest powers of 2 to M/N.
- Each time exactly N * 2M values observed by the Reservoir (i.e. N + MN/2 values have been positively sampled), the buffer will contain perfectly evenly spead values with indexes 2Mx.
§Example for N=16
Consider the following chart showing the insertion indexes for a 16-element buffer. The numbers denote value indexes as seen by the Reservoir and only the values selected by the sampler are shown. The time moves from left to right, top to bottom.
ⓘ
|---------------- buffer -----------------|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // pattern 0 (initial loop)
- 16 18 20 22 24 26 28 30 // pattern 1 / group 0
| 32 36 40 44 // pattern 2 / group 0
| 48 52 56 60 / group 1
| 64 72 // pattern 3 / group 0
r| 80 88 / group 1
e| 96 104 / group 2
p| 112 120 / group 3
e| 128 // pattern 4 / group 0
a| 144 / group 1
t| 160 / group 2
s| 176 / group 3
| 192 / group 4
| 208 / group 5
| 224 / group 6
- 240 / group 7§Item insertion indices
ⓘ
Pattern 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pattern 1: 1 3 5 7 9 11 13 15
Pattern 2: 2 6 10 14
3 7 11 15
Pattern 3: 4 12
5 13
6 14
7 15
Pattenr 4: 8
9
10
11
12
13
14
15§Buffer contents (sorted)
- After 16 items observed:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - After 32 items observed:
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 - After 64 items observed:
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60