optimal_combined_sampling

Function optimal_combined_sampling 

Source
pub const fn optimal_combined_sampling(
    n: usize,
    len: usize,
    max_result: u8,
) -> u8
Expand description

Calculates such a sampling density of select values that provides an approximately constant space overhead (independent of set/unset bits ratio in the vector) of CombinedSampling.

The parameters describe the bit vector and the desired result range:

  • n – numbers of ones (for select) or zeros (for select0) in the vector,
  • len – length of the vector in bits,
  • max_result – the largest possible result, returned only and always for n >= 75%len (must be in range [7, 31]).

The result is the base 2 logarithm of the recommended sampling, and it is always in range [7, max_result]. The actual sampling is 2 times denser, but every second sample sometimes cannot be recorded accurately. The minimum value is 7, as we never sample two bits in the same 64-bit word.

A good value for max_result is 13, which leads to about 0.39% space overhead, and, for n = 50%u, results in sampling positions of every 2^12(/2)=4096(/2) ones (or zeros for select0). As max_result decreases, the speed of select queries increases at the cost of higher space overhead (which doubles with each decrease by 1).