Crate hyperloglog_rs

Expand description

§HyperLogLog

This crate provides an implementation of the HyperLogLog algorithm, which is a probabilistic algorithm used to estimate the number of distinct elements in a set. The algorithm uses a fixed amount of memory and is able to estimate the number of distinct elements with a small relative error.

The HyperLogLog struct provided by this crate is parametrized by two constants: PRECISION and BITS. PRECISION determines the number of bits used to index a register, and BITS determines the number of bits used to represent the hashed value of an element. The optimal values of these constants depend on the expected number of distinct elements and the available memory.

This implementation already provides almost all the benefits available from HyperLogLog++. We do not intend to integrate the sparse registers feature, as the use cases for this library focus of cases where registers need to be a relatively small number and a dense set. Except for that, all other observations provided in the HLL++ paper are already implemented.

§No STD

This crate is designed to be as lightweight as possible and does not require any dependencies from the Rust standard library (std). As a result, it can be used in a bare metal or embedded context, where std may not be available.

All functionality of this crate can be used without std, and the prelude module provides easy access to all the relevant types and traits. If you encounter any issues using this crate in a no_std environment, please don’t hesitate to open an issue or submit a pull request on GitHub.

§Usage

Add this to your Cargo.toml:

[dependencies]
hyperloglog = "0.1"

and this to your crate root:

use hyperloglog_rs::prelude::*;

§Examples

use hyperloglog_rs::prelude::*;

let mut hll = HyperLogLog::<Precision14, 5>::default();
hll.insert(&1);
hll.insert(&2);

let mut hll2 = HyperLogLog::<Precision14, 5>::default();
hll2.insert(&2);
hll2.insert(&3);

let union = hll | hll2;

let estimated_cardinality = union.estimate_cardinality();
assert!(estimated_cardinality >= 3.0_f32 * 0.9 &&
        estimated_cardinality <= 3.0_f32 * 1.1);

§Fuzzing

Fuzzing is a technique for finding security vulnerabilities and bugs in software by providing random input to the code. It can be an effective way of uncovering issues that might not be discovered through other testing methods. In our library, we take fuzzing seriously, and we use the cargo fuzz tool to ensure our code is robust and secure. cargo fuzz automates the process of generating and running randomized test inputs, and it can help identify obscure bugs that would be difficult to detect through traditional testing methods. We make sure that our fuzz targets are continuously updated and run against the latest versions of the library to ensure that any vulnerabilities or bugs are quickly identified and addressed.

Learn more about how we fuzz here

§References

Flajolet, Philippe, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. “Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm.” In Proceedings of the 2007 conference on analysis of algorithms, pp. 127-146. 2007.

Re-exports§

pub use crate::estimated_union_cardinalities::EstimatedUnionCardinalities;
pub use crate::hyperloglog::HyperLogLog;

Modules§

bitand
bitor
estimated_union_cardinalities
hyper_spheres_sketch: Exact sketching algorithms.
hyperloglog
hyperloglog_array
iter: This module defines a trait and an implementation for estimating the cardinality of an iterator using a HyperLogLog data structure.
log
prelude
serde
utils: Utils