Struct hyperloglog_rs::hyperloglog::HyperLogLog
source · pub struct HyperLogLog<const PRECISION: usize, const BITS: usize> { /* private fields */ }
Expand description
A probabilistic algorithm for estimating the number of distinct elements in a set.
HyperLogLog is a probabilistic algorithm designed to estimate the number of distinct elements in a set. It does so by taking advantage of the fact that the representation of an element can be transformed into a uniform distribution in a space with a fixed range.
HyperLogLog works by maintaining a fixed-sized register array, where each register holds a counter. The algorithm splits the input set into subsets, applies a hash function to each element in the subset, and then updates the corresponding counter in the register array.
HyperLogLog uses a trick called “probabilistic counting” to estimate the number of distinct elements in the set. Each register counter is converted to a binary string, and the algorithm counts the number of leading zeros in each binary string. The maximum number of leading zeros over all counters is used to estimate the number of distinct elements in the set.
HyperLogLog has a tunable parameter called precision that determines the accuracy of the algorithm. Higher precision leads to better accuracy, but requires more memory. The error rate of the algorithm is guaranteed to be within a certain bound, depending on the chosen precision.
Examples
use hyperloglog_rs::HyperLogLog;
let mut hll = HyperLogLog::<10, 6>::new();
hll.insert(&"apple");
hll.insert(&"banana");
hll.insert(&"cherry");
let estimated_cardinality = hll.estimate_cardinality();
assert!(estimated_cardinality >= 3.0_f32 * 0.9 &&
estimated_cardinality <= 3.0_f32 * 1.1);
Citations
This implementation is based on the following papers:
- Flajolet, Philippe, et al. “HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm.” DMTCS Proceedings 1 (2007): 127-146.
- Heule, Stefan, Marc Nunkesser, and Alexander Hall. “HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm.” Proceedings of the 16th International Conference on Extending Database Technology. 2013.
Implementations§
source§impl<const PRECISION: usize, const BITS: usize> HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize> HyperLogLog<PRECISION, BITS>
sourcepub const NUMBER_OF_REGISTERS: usize = _
pub const NUMBER_OF_REGISTERS: usize = _
The number of registers used by the HyperLogLog algorithm, which depends on its precision.
sourcepub const SMALL_RANGE_CORRECTION_THRESHOLD: f32 = _
pub const SMALL_RANGE_CORRECTION_THRESHOLD: f32 = _
The threshold value used in the small range correction of the HyperLogLog algorithm.
sourcepub const TWO_32: f32 = 4.2949673E+9f32
pub const TWO_32: f32 = 4.2949673E+9f32
The float value of 2^32, used in the intermediate range correction of the HyperLogLog algorithm.
sourcepub const INTERMEDIATE_RANGE_CORRECTION_THRESHOLD: f32 = 143165584f32
pub const INTERMEDIATE_RANGE_CORRECTION_THRESHOLD: f32 = 143165584f32
The threshold value used in the intermediate range correction of the HyperLogLog algorithm.
sourcepub const LOWER_REGISTER_MASK: u32 = _
pub const LOWER_REGISTER_MASK: u32 = _
The mask used to obtain the lower register bits in the HyperLogLog algorithm.
sourcepub const NUMBER_OF_REGISTERS_IN_WORD: usize = _
pub const NUMBER_OF_REGISTERS_IN_WORD: usize = _
The number of registers that can fit in a single 32-bit word in the HyperLogLog algorithm.
sourcepub const PRECOMPUTED_RECIPROCALS: [f32; { _ }] = _
pub const PRECOMPUTED_RECIPROCALS: [f32; { _ }] = _
The precomputed reciprocals used in the HyperLogLog algorithm for better performance.
sourcepub const SMALL_CORRECTIONS: [f32; { _ }] = _
pub const SMALL_CORRECTIONS: [f32; { _ }] = _
The precomputed small corrections used in the HyperLogLog algorithm for better performance.
sourcepub fn from_registers(registers: [u32; { _ }]) -> Self
pub fn from_registers(registers: [u32; { _ }]) -> Self
Create a new HyperLogLog counter from an array of registers.
Arguments
registers
- An array of u32 registers to use for the HyperLogLog counter.
Returns
A new HyperLogLog counter initialized with the given registers.
Examples
use hyperloglog_rs::HyperLogLog;
let registers = [0_u32; 1 << 4];
let hll = HyperLogLog::<4, 6>::from_registers(registers);
assert_eq!(hll.len(), 1 << 4);
sourcepub fn estimate_cardinality(&self) -> f32
pub fn estimate_cardinality(&self) -> f32
Estimates the cardinality of the set based on the HLL counter data.
Example
const PRECISION: usize = 8;
const BITS: usize = 5;
let mut hll = HyperLogLog::<PRECISION, BITS>::new();
let elements = vec![1, 2, 3, 4, 5];
for element in &elements {
hll.insert(element);
}
let estimated_cardinality = hll.estimate_cardinality();
assert!(estimated_cardinality >= elements.len() as f32 * 0.9 &&
estimated_cardinality <= elements.len() as f32 * 1.1);
Returns
f32
- The estimated cardinality of the set.
sourcepub fn iter(&self) -> impl Iterator<Item = u32> + '_
pub fn iter(&self) -> impl Iterator<Item = u32> + '_
Returns an iterator over the register values of the HyperLogLog instance.
The register values are extracted from the words array, where each word contains multiple register values. This method first checks that the size of the words array matches the expected number of registers per word, which is determined by the number of bits per register. It then iterates over each word in the array and extracts the register values using bit shifting and masking operations. Finally, it takes only the expected number of register values and returns an iterator over them.
Returns
An iterator over the register values of the HyperLogLog instance.
Examples
use hyperloglog_rs::HyperLogLog;
const PRECISION: usize = 8;
const BITS: usize = 5;
const HYPERLOGLOG_SIZE: usize = 1 << PRECISION;
let mut hll = HyperLogLog::<PRECISION, BITS>::new();
assert_eq!(hll.iter().count(), HYPERLOGLOG_SIZE);
hll.insert(&"foo");
hll.insert(&"bar");
let mut hll2 = HyperLogLog::<PRECISION, BITS>::new();
hll2|= hll;
assert_eq!(hll2.iter().count(), HYPERLOGLOG_SIZE);
sourcepub fn len(&self) -> usize
pub fn len(&self) -> usize
Returns the number of registers in the HLL counter.
Example
// Create a new HLL counter with 128 registers
let mut hll = HyperLogLog::<12, 8>::new();
assert_eq!(hll.len(), 4096);
// Insert some elements into the HLL counter
hll.insert(&1);
hll.insert(&2);
hll.insert(&3);
assert_eq!(hll.len(), 1 << 12);
// Merge another HLL counter with 128 registers
let mut hll2 = HyperLogLog::<12, 8>::new();
hll2.insert(&4);
hll2.insert(&5);
hll |= hll2;
assert_eq!(hll.len(), 1 << 12);
sourcepub fn is_empty(&self) -> bool
pub fn is_empty(&self) -> bool
Returns whether no element was yet added to the HLL counter.
Examples
use hyperloglog_rs::HyperLogLog;
let mut hll: HyperLogLog<8, 8> = HyperLogLog::new();
assert!(hll.is_empty());
hll.insert(&1);
assert!(!hll.is_empty());
sourcepub fn get_number_of_bits(&self) -> usize
pub fn get_number_of_bits(&self) -> usize
Returns the number of bits used to represent each register in the HyperLogLog counter.
Returns
An unsigned integer value representing the number of bits used to represent each register in the HyperLogLog counter.
Example
use hyperloglog_rs::HyperLogLog;
let hll = HyperLogLog::<13, 6>::new();
assert_eq!(hll.get_number_of_bits(), 6);
sourcepub fn get_number_of_padding_registers(&self) -> usize
pub fn get_number_of_padding_registers(&self) -> usize
Returns the number of extra registers that are not actually used.
Examples
// Create a HyperLogLog counter with precision 10 and 6-bit registers
let mut hll = HyperLogLog::<10, 6>::new();
// Since the number of registers is not a multiple of the number of registers in a word,
// there are padding registers that are not actually used.
assert_eq!(hll.get_number_of_padding_registers(), 1);
// Insert some elements into the counter
hll.insert(&1);
hll.insert(&2);
// The number of padding registers is still the same
assert_eq!(hll.get_number_of_padding_registers(), 1);
sourcepub fn get_number_of_zero_registers(&self) -> usize
pub fn get_number_of_zero_registers(&self) -> usize
Returns the number of registers with zero values. This value is used for computing a small correction when estimating the cardinality of a small set.
Examples
// Create a new HyperLogLog counter with precision 14 and 5 bits per register.
let mut hll = HyperLogLog::<14, 5>::new();
// Add some elements to the counter.
hll.insert(&1);
hll.insert(&2);
hll.insert(&3);
// Get the number of zero registers.
let number_of_zero_registers = hll.get_number_of_zero_registers();
assert_eq!(number_of_zero_registers, 16381);
pub fn get_number_of_non_zero_registers(&self) -> usize
sourcepub fn get_registers(&self) -> [u32; { _ }]
pub fn get_registers(&self) -> [u32; { _ }]
Returns an array of registers of the HyperLogLog counter.
Examples
let mut hll = HyperLogLog::<10, 6>::new();
hll.insert(&4);
hll.insert(&5);
hll.insert(&6);
let registers = hll.get_registers();
assert_eq!(registers.len(), 1024);
assert!(registers.iter().any(|&x| x > 0));
We can also create an HLL from registers, and then check whether the registers are what we expect:
let expected = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 11, 0];
let mut hll = HyperLogLog::<4, 6>::from_registers(expected);
assert_eq!(hll.get_registers(), expected, "Expected {:?}, got {:?}", expected, hll.get_registers());
sourcepub fn get_hash_and_index<T: Hash>(&self, value: &T) -> (u32, usize)
pub fn get_hash_and_index<T: Hash>(&self, value: &T) -> (u32, usize)
Returns the hash value and the corresponding register’s index for a given value.
Arguments
value
- A reference to the value to be hashed.
Examples
use hyperloglog_rs::HyperLogLog;
let mut hll: HyperLogLog<8, 6> = HyperLogLog::new();
let value = 42;
let (hash, index) = hll.get_hash_and_index(&value);
assert_eq!(index, 54, "Expected index {}, got {}.", 54, index);
assert_eq!(hash, 3623031424, "Expected hash {}, got {}.", 3623031424, hash);
sourcepub fn may_contain<T: Hash>(&self, rhs: &T) -> bool
pub fn may_contain<T: Hash>(&self, rhs: &T) -> bool
sourcepub fn insert<T: Hash>(&mut self, rhs: T)
pub fn insert<T: Hash>(&mut self, rhs: T)
Adds an element to the HyperLogLog counter.
Arguments
rhs
- The element to add.
Examples
use hyperloglog_rs::HyperLogLog;
const PRECISION: usize = 10;
let mut hll = HyperLogLog::<PRECISION, 6>::new();
hll.insert("Hello");
hll.insert("World");
assert!(hll.estimate_cardinality() >= 2.0);
Performance
The performance of this function depends on the size of the HyperLogLog counter (N
), the number
of distinct elements in the input, and the hash function used to hash elements. For a given value of N
,
the function has an average time complexity of O(1) and a worst-case time complexity of O(log N).
However, the actual time complexity may vary depending on the distribution of the hashed elements.
Errors
This function does not return any errors.
Trait Implementations§
source§impl<const PRECISION: usize, const BITS: usize> BitOr<HyperLogLog<PRECISION, BITS>> for HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize> BitOr<HyperLogLog<PRECISION, BITS>> for HyperLogLog<PRECISION, BITS>
source§fn bitor(self, rhs: Self) -> Self
fn bitor(self, rhs: Self) -> Self
Computes the union between two HyperLogLog counters of the same precision and number of bits per register.
Example
let mut hll1 = HyperLogLog::<14, 5>::new();
hll1.insert(&1);
hll1.insert(&2);
let mut hll2 = HyperLogLog::<14, 5>::new();
hll2.insert(&2);
hll2.insert(&3);
let hll_union = hll1 | hll2;
assert!(hll_union.estimate_cardinality() >= 3.0_f32 * 0.9 &&
hll_union.estimate_cardinality() <= 3.0_f32 * 1.1);
§type Output = HyperLogLog<PRECISION, BITS>
type Output = HyperLogLog<PRECISION, BITS>
|
operator.source§impl<const PRECISION: usize, const BITS: usize> BitOrAssign<HyperLogLog<PRECISION, BITS>> for HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize> BitOrAssign<HyperLogLog<PRECISION, BITS>> for HyperLogLog<PRECISION, BITS>
source§fn bitor_assign(&mut self, rhs: Self)
fn bitor_assign(&mut self, rhs: Self)
Computes union between HLL counters.
let mut hll = HyperLogLog::<8, 6>::new();
hll.insert(1u8);
let mut hll2 = HyperLogLog::<8, 6>::new();
hll2.insert(2u8);
hll.bitor_assign(hll2);
assert!(hll.estimate_cardinality() > 2.0 - 0.1, "The cardinality is {}, we were expecting 2.", hll.estimate_cardinality());
assert!(hll.estimate_cardinality() < 2.0 + 0.1, "The cardinality is {}, we were expecting 2.", hll.estimate_cardinality());
let mut hll = HyperLogLog::<8, 6>::new();
hll.insert(1u8);
let mut hll2 = HyperLogLog::<8, 6>::new();
hll2.insert(1u8);
hll.bitor_assign(hll2);
assert!(hll.estimate_cardinality() > 1.0 - 0.1, "The cardinality is {}, we were expecting 1.", hll.estimate_cardinality());
assert!(hll.estimate_cardinality() < 1.0 + 0.1, "The cardinality is {}, we were expecting 1.", hll.estimate_cardinality());
let mut hll3 = HyperLogLog::<16, 6>::new();
hll3.insert(3u8);
hll3.insert(4u8);
let mut hll4 = HyperLogLog::<16, 6>::new();
hll4.insert(5u8);
hll4.insert(6u8);
hll3.bitor_assign(hll4);
assert!(hll3.estimate_cardinality() > 4.0 - 0.1, "Expected a value equal to around 4, got {}", hll3.estimate_cardinality());
assert!(hll3.estimate_cardinality() < 4.0 + 0.1, "Expected a value equal to around 4, got {}", hll3.estimate_cardinality());
source§impl<const PRECISION: usize, const BITS: usize> Clone for HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize> Clone for HyperLogLog<PRECISION, BITS>
source§fn clone(&self) -> HyperLogLog<PRECISION, BITS>
fn clone(&self) -> HyperLogLog<PRECISION, BITS>
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl<const PRECISION: usize, const BITS: usize> Default for HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize> Default for HyperLogLog<PRECISION, BITS>
Implements the Default trait for HyperLogLog.
HyperLogLog is a probabilistic cardinality estimator that uses a fixed amount of memory to estimate the number of distinct elements in a set.
Examples
let hll: HyperLogLog<10, 6> = Default::default();
assert_eq!(hll.len(), 1024);
assert_eq!(hll.get_number_of_bits(), 6);
source§impl<const PRECISION: usize, const BITS: usize, T: Hash> From<T> for HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize, T: Hash> From<T> for HyperLogLog<PRECISION, BITS>
source§fn from(value: T) -> Self
fn from(value: T) -> Self
Create a new HyperLogLog counter from a value.
This method creates a new empty HyperLogLog counter and inserts the hash
of the given value into it. The value can be any type that implements
the Hash
trait.
Examples
let hll = HyperLogLog::<14, 5>::from("test");
assert!(hll.estimate_cardinality() >= 1.0_f32);
assert!(!hll.is_empty());
assert!(hll.may_contain(&"test"));
source§impl<const PRECISION: usize, const BITS: usize> PartialEq<HyperLogLog<PRECISION, BITS>> for HyperLogLog<PRECISION, BITS>
impl<const PRECISION: usize, const BITS: usize> PartialEq<HyperLogLog<PRECISION, BITS>> for HyperLogLog<PRECISION, BITS>
source§fn eq(&self, other: &HyperLogLog<PRECISION, BITS>) -> bool
fn eq(&self, other: &HyperLogLog<PRECISION, BITS>) -> bool
self
and other
values to be equal, and is used
by ==
.