UltraLogLog
Rust implementation of the UltraLogLog algorithm. Ultraloglog is more space efficient than the widely used HyperLogLog, but can be slower. FGRA estimator or MLE estimator can be used.
Usage
use ;
let mut ull = new.unwrap;
ull.add_value
.add_value
.add_value
.add_value;
let est = ull.get_distinct_count_estimate;
The serde feature can be activated so that the sketch can be saved to disk and then loaded.
use ;
use ;
use ;
let file_path = "test_ultraloglog.bin";
// Create UltraLogLog and add data
let mut ull = new.expect;
ull.add;
ull.add;
let original_estimate = ull.get_distinct_count_estimate;
// Save to file using writer
let file = create.expect;
let writer = new;
ull.save.expect;
// Load from file using reader
let file = open.expect;
let reader = new;
let loaded_ull = load.expect;
let loaded_estimate = loaded_ull.get_distinct_count_estimate;
Python Bindings
This crate also provides Python bindings for the UltraLogLog algorithm using PyO3. See example.py for usage.
# Create a new UltraLogLog sketch
= # precision parameter
# Add values
# Get estimated count
Installation
Using pip
This package is available as ultraloglog in PyPI. You can install it using:
From Source
uv is recommended to manage virtual environments.
- Install Rust, and maturin
pip install maturin - Build and install:
maturin develop --release
64-bit hash function
As mentioned in the paper, high quality 64-bit hash function is key to ultraloglog algorithm. We tested several modern 64-bit hash libraries and found that xxhash-rust (default) and wyhash-rs worked well. However, users can easily replace the default xxhash-rust with polymurhash, komihash, ahash and t1ha et.al. See testing section for details.
Reference
Ertl, O., 2024. UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting. Proceedings of the VLDB Endowment, 17(7), pp.1655-1668.