Struct HyperLogLogWithMultiplicities

Source
pub struct HyperLogLogWithMultiplicities<P: Precision + WordType<BITS>, const BITS: usize> { /* private fields */ }
Expand description

A HyperLogLog counter with multiplicities.

§Implementation details

This struct differs from the traditional HyperLogLog counter in that it stores the multiplicities of the registers. This allows us to speed up significantly the computation of the cardinality of the counter, as we do not need to compute the harmonic mean of the registers but we can instead use the multiplities instead, reducing by a large amount the sums we need to compute.

For instance, for a counter with 2^14 registers, we need to compute the harmonic mean of 2^14 registers, i.e. 16384 registers. With the multiplicities, we only need to compute the sum of the multiplicities, which is much smaller, and at most equal to 52 when you use 6 bits per register.

That being said, when memory is an extreme concern, you may want to use the traditional HyperLogLog as this struct contains the multiplicities vector, which in the example case we considered above would be adding u16 * 52 = 104 bytes to the size of the counter.

Additionally, note that while one may expect to obtain better accuracy by executing less sums, we do not observe any statistically significant difference in the accuracy of the counter when using the multiplicities instead of the registers in our tests.

Note that this struct DOES NOT provide any other faster operation other than the estimation of the cardinality of the counter. All other operations, such as the union of two counters, are fast as they are implemented using the traditional HyperLogLog counter.

Implementations§

Source§

impl<P: Precision + WordType<BITS>, const BITS: usize> HyperLogLogWithMultiplicities<P, BITS>

Source

pub fn from_words(words: &P::Words) -> Self

Create a new HyperLogLog counter from an array of words.

§Arguments
  • words - An array of u64 words to use for the HyperLogLog counter.
§Returns

A new HyperLogLog counter initialized with the given words.

§Examples
use hyperloglog_rs::prelude::*;

let words = [0_u32; 4];
let hll = HyperLogLogWithMultiplicities::<Precision4, 6>::from_words(&words);
assert_eq!(hll.len(), 16);
Source

pub fn from_registers(registers: &[u32]) -> Self

Create a new HyperLogLog counter from an array of registers.

§Arguments
  • registers - An array of u32 registers to use for the HyperLogLog counter.
§Returns

A new HyperLogLog counter initialized with the given registers.

§Examples
use hyperloglog_rs::prelude::*;

let registers = [0_u32; 1 << 4];
let hll = HyperLogLogWithMultiplicities::<Precision4, 6>::from_registers(&registers);
assert_eq!(hll.len(), 1 << 4);
Source

pub fn insert<T: Hash>(&mut self, rhs: T) -> bool

Adds an element to the HyperLogLog counter , and returns whether the counter has changed.

§Arguments
  • rhs - The element to add.
§Examples
use hyperloglog_rs::prelude::*;

let mut hll = HyperLogLogWithMultiplicities::<Precision10, 6>::default();

hll.insert("Hello");
hll.insert("World");

assert!(hll.estimate_cardinality() >= 2.0);
§Performance

The performance of this function depends on the size of the HyperLogLog counter (N), the number of distinct elements in the input, and the hash function used to hash elements. For a given value of N, the function has an average time complexity of O(1) and a worst-case time complexity of O(log N). However, the actual time complexity may vary depending on the distribution of the hashed elements.

§Errors

This function does not return any errors.

Trait Implementations§

Source§

impl<P: Precision + WordType<BITS>, const BITS: usize> Default for HyperLogLogWithMultiplicities<P, BITS>

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<P: Precision + WordType<BITS>, const BITS: usize> From<HyperLogLog<P, BITS>> for HyperLogLogWithMultiplicities<P, BITS>

Source§

fn from(hll: HyperLogLog<P, BITS>) -> Self

Converts to this type from the input type.
Source§

impl<P: Precision + WordType<BITS>, const BITS: usize> From<HyperLogLogWithMultiplicities<P, BITS>> for HyperLogLog<P, BITS>

Source§

fn from(hll: HyperLogLogWithMultiplicities<P, BITS>) -> Self

Converts to this type from the input type.
Source§

impl<P: Precision + WordType<BITS>, const BITS: usize, A: Hash> FromIterator<A> for HyperLogLogWithMultiplicities<P, BITS>

Source§

fn from_iter<T: IntoIterator<Item = A>>(iter: T) -> Self

Creates a new HyperLogLogWithMultiplicities counter and adds all elements from an iterator to it.

§Examples
use hyperloglog_rs::prelude::*;

let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];
let hll: HyperLogLogWithMultiplicities<Precision12, 5> = data.iter().collect();
assert!(
    hll.estimate_cardinality() > 0.9 * data.len() as f32,
    concat!(
        "The estimate is too low, expected ",
        "at least {}, got {}",
    ),
    0.9 * data.len() as f32,
    hll.estimate_cardinality()
);
assert!(
    hll.estimate_cardinality() < 1.1 * data.len() as f32,
    concat!(
    "The estimate is too high, expected ",
    "at most {}, got {}",
   ),
    1.1 * data.len() as f32,
    hll.estimate_cardinality()
);
Source§

impl<P: Precision + WordType<BITS>, const BITS: usize> HyperLogLogTrait<P, BITS> for HyperLogLogWithMultiplicities<P, BITS>

Source§

fn estimate_cardinality(&self) -> f32

Returns the number of registers in the counter.

§Implementation details

This function is overriding the estimate_cardinality function of the HyperLogLogTrait trait as we can compute the cardinality of the counter using the multiplicities instead of the registers. This is much faster as we do not need to compute the harmonic mean of the registers.

Source§

fn get_words(&self) -> &P::Words

Returns a reference to the words vector.

Source§

fn get_number_of_zero_registers(&self) -> usize

Returns the number of registers with zero values. This value is used for computing a small correction when estimating the cardinality of a small set.

§Examples

// Create a new HyperLogLog counter with precision 14 and 5 bits per register.
let mut hll = HyperLogLogWithMultiplicities::<Precision14, 5>::default();

// Add some elements to the counter.
hll.insert(&1);
hll.insert(&2);
hll.insert(&3);

// Get the number of zero registers.
let number_of_zero_registers = hll.get_number_of_zero_registers();

assert_eq!(number_of_zero_registers, 16381);
Source§

const INTERMEDIATE_RANGE_CORRECTION_THRESHOLD: f32 = _

The threshold value used in the small range correction of the HyperLogLog algorithm.
Source§

const LINEAR_COUNT_THRESHOLD: f32 = _

Source§

const LOWER_REGISTER_MASK: u32 = _

The mask used to obtain the lower register bits in the HyperLogLog algorithm.
Source§

const LOWER_PRECISION_MASK: usize = _

The mask used to obtain the lower precision bits in the HyperLogLog algorithm.
Source§

const NOT_LOWER_PRECISION_MASK: u64 = _

Source§

const NUMBER_OF_PADDING_BITS: usize = _

Source§

const PADDING_BITS_MASK: u32 = _

The mask representing the bits that are never used in the u32 word in the cases where the number of bits is not a divisor of 32, such as 5 or 6. We set the LEADING bits as the padding bits, the unused one, so the leftmost bits.
Source§

const NUMBER_OF_PADDING_REGISTERS: usize = _

Source§

const LAST_WORD_PADDING_BITS_MASK: u32 = _

The mask representing the bits that are never used in the last u32 word in the cases where the number of registers is not a multiple of the number of registers in a word.
Source§

const UPPER_PRECISION_MASK: usize = _

The mask used to obtain the upper precision bits in the HyperLogLog algorithm.
Source§

const NUMBER_OF_REGISTERS_IN_WORD: usize = _

The number of registers that can fit in a single 32-bit word in the HyperLogLog algorithm.
Source§

fn adjust_estimate(raw_estimate: f32) -> f32

Source§

fn adjust_estimate_with_zeros(raw_estimate: f32, number_of_zeros: usize) -> f32

Source§

fn use_small_range_correction(&self) -> bool

Returns whether the cardinality of this HLL will be computed using the small-range correction. Read more
Source§

fn estimate_union_cardinality(&self, other: &Self) -> f32

Returns an estimate of the cardinality of the union of two HyperLogLog counters. Read more
Source§

fn estimate_union_and_sets_cardinality<F: Primitive<f32> + MaxMin>( &self, other: &Self, ) -> EstimatedUnionCardinalities<F>

Returns an estimate of the cardinality of the two HLL counters union.
Source§

fn estimate_intersection_cardinality<F: Primitive<f32>>( &self, other: &Self, ) -> F

Returns an estimate of the cardinality of the intersection of two HyperLogLog counters. Read more
Source§

fn estimate_jaccard_index(&self, other: &Self) -> f32

Returns an estimate of the Jaccard index between two HyperLogLog counters. Read more
Source§

fn estimate_difference_cardinality<F: Primitive<f32> + One>( &self, other: &Self, ) -> F

Returns an estimate of the cardinality of the current HyperLogLog counter minus the provided one. Read more
Source§

fn len(&self) -> usize

Returns the number of registers in the HLL counter. Read more
Source§

fn is_empty(&self) -> bool

Returns whether no element was yet added to the HLL counter. Read more
Source§

fn get_number_of_padding_registers() -> usize

Returns the number of extra registers that are not actually used. Read more
Source§

fn get_number_of_non_zero_registers(&self) -> usize

Source§

fn get_registers(&self) -> P::Registers

Returns an array of registers of the HyperLogLog counter. Read more
Source§

fn may_contain<T: Hash>(&self, rhs: &T) -> bool

Returns true if the HyperLogLog counter may contain the given element. Read more
Source§

fn may_contain_all(&self, rhs: &Self) -> bool

Returns whether the provided HyperLogLog counter may be fully contained in the current HyperLogLog counter. Read more
Source§

fn get_hash_and_index<T: Hash>(&self, value: &T) -> (u64, usize)

Returns the hash value and the corresponding register’s index for a given value. Read more
Source§

fn estimate_cardinality_from_multiplicities( multiplicities: &P::RegisterMultiplicities, ) -> f32

Returns the number of registers in the counter. Read more

Auto Trait Implementations§

§

impl<P, const BITS: usize> Freeze for HyperLogLogWithMultiplicities<P, BITS>
where <P as WordType<BITS>>::Words: Freeze, <P as WordType<BITS>>::RegisterMultiplicities: Freeze,

§

impl<P, const BITS: usize> RefUnwindSafe for HyperLogLogWithMultiplicities<P, BITS>

§

impl<P, const BITS: usize> Send for HyperLogLogWithMultiplicities<P, BITS>
where <P as WordType<BITS>>::RegisterMultiplicities: Send,

§

impl<P, const BITS: usize> Sync for HyperLogLogWithMultiplicities<P, BITS>
where <P as WordType<BITS>>::RegisterMultiplicities: Sync,

§

impl<P, const BITS: usize> Unpin for HyperLogLogWithMultiplicities<P, BITS>
where <P as WordType<BITS>>::Words: Unpin, <P as WordType<BITS>>::RegisterMultiplicities: Unpin,

§

impl<P, const BITS: usize> UnwindSafe for HyperLogLogWithMultiplicities<P, BITS>
where <P as WordType<BITS>>::Words: UnwindSafe, <P as WordType<BITS>>::RegisterMultiplicities: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.