cardinality_estimator_safe/
lib.rs

1//! `cardinality-estimator` is a Rust crate designed to estimate the number of distinct elements in a stream or dataset in an efficient manner.
2//!
3//! This library uses HyperLogLog++ with an optimized low memory footprint and high accuracy approach, suitable for large-scale data analysis tasks.
4//!
5//! Cardinality estimator allows to estimate number of distinct elements in the stream or dataset and is defined with const `P` and `W` parameters:
6//! - `P`: precision parameter in [4..18] range, which defines
7//!   number of bits to use for HyperLogLog register indices.
8//! - `W`: width parameter in [4..6] range, which defines
9//!   number of bits to use for HyperLogLog register width.
10//!
11//! # Data-structure design rationale
12//!
13//! ## Low memory footprint
14//!
15//! For parameters P = 12, W = 6:
16//! - Cardinality in [0..2] range - 8 bytes (small representation)
17//! - Cardinality in [3..4] range - 24 bytes (array representation)
18//! - Cardinality in [5..8] range - 40 bytes (array representation)
19//! - Cardinality in [9..16] range - 72 bytes (array representation)
20//! - ...
21//! - Cardinality in [65..128] range - 520 bytes (array representation)
22//! - Cardinality in [129..] range - 3092 bytes (hyperloglog representation)
23//!
24//! ## Low latency
25//! - Auto-vectorization for slice operations via compiler hints
26//!   to use SIMD instructions when using `chunks_exact`.
27//! - Number of zero registers and registers' harmonic sum are
28//!   stored and updated dynamically as more data being inserted,
29//!   allowing to have truly constant `estimate` operations.
30//! - Efficient polynomial computation using Horner's method.
31//!
32//! ## High accuracy
33//! - For small cardinality range (<= 128 for P = 12, W = 6)
34//!   cardinality counted very accurately (within hash collisions chance)
35//! - For large cardinality range HyperLogLog++ is used with LogLog-Beta bias correction.
36//!   - Expected error (1.04 / sqrt(2^P)):
37//!     - P = 10, W = 5: 0.0325
38//!     - P = 12, W = 6: 0.0162
39//!     - P = 14, W = 6: 0.0081
40//!     - P = 18, W = 6: 0.0020
41//!
42//! # Data storage format
43//! Cardinality estimator stores data in one of the three representations:
44//! - `Small` representation - see `small` module for more details.
45//! - `Array` representation - see `array` module for more details.
46//! - `HyperLogLog` representation - see `hyperloglog` module for more details
47//!
48//! # Data Storage Format
49//! The cardinality estimator stores data in one of three formats: `Small`, `Array`, and `HyperLogLog`.
50//! See corresponding modules (`small`, `array`, `hyperloglog`) for more details.
51mod array;
52mod element;
53mod hyperloglog;
54#[cfg(feature = "with_serde")]
55mod serde;
56pub mod sketch;
57mod small;
58
59pub use element::Element;
60pub use sketch::Sketch;