This crate provides an implementation of the HyperLogLog algorithm, which is a probabilistic algorithm used to estimate the number of distinct elements in a set. The algorithm uses a fixed amount of memory and is able to estimate the number of distinct elements with a small relative error.
HyperLogLog struct provided by this crate is parametrized by two constants:
determines the number of bits used to index a register, and
BITS determines the number of bits used to represent
the hashed value of an element. The optimal values of these constants depend on the expected number of distinct elements
and the available memory.
This implementation already provides almost all the benefits available from HyperLogLog++. We do not intend to integrate the sparse registers feature, as the use cases for this library focus of cases where registers need to be a relatively small number and a dense set. Except for that, all other observations provided in the HLL++ paper are already implemented.
This crate is designed to be as lightweight as possible and does not require any dependencies from the Rust standard library (std). As a result, it can be used in a bare metal or embedded context, where std may not be available.
All functionality of this crate can be used without std, and the prelude module provides easy access to all the relevant types and traits. If you encounter any issues using this crate in a no_std environment, please don’t hesitate to open an issue or submit a pull request on GitHub.
Add this to your
[dependencies] hyperloglog = "0.1"
and this to your crate root:
use hyperloglog_rs::prelude::*; let mut hll = HyperLogLog::<Precision14, 5>::default(); hll.insert(&1); hll.insert(&2); let mut hll2 = HyperLogLog::<Precision14, 5>::default(); hll2.insert(&2); hll2.insert(&3); let union = hll | hll2; let estimated_cardinality = union.estimate_cardinality(); assert!(estimated_cardinality >= 3.0_f32 * 0.9 && estimated_cardinality <= 3.0_f32 * 1.1);
Fuzzing is a technique for finding security vulnerabilities and bugs in software by providing random input to the code. It can be an effective way of uncovering issues that might not be discovered through other testing methods. In our library, we take fuzzing seriously, and we use the cargo fuzz tool to ensure our code is robust and secure. cargo fuzz automates the process of generating and running randomized test inputs, and it can help identify obscure bugs that would be difficult to detect through traditional testing methods. We make sure that our fuzz targets are continuously updated and run against the latest versions of the library to ensure that any vulnerabilities or bugs are quickly identified and addressed.
- Flajolet, Philippe, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. “Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm.” In Proceedings of the 2007 conference on analysis of algorithms, pp. 127-146. 2007.
pub use crate::estimated_union_cardinalities::EstimatedUnionCardinalities;
pub use crate::hyperloglog::HyperLogLog;
- Exact sketching algorithms.
- This module defines a trait and an implementation for estimating the cardinality of an iterator using a HyperLogLog data structure.