nano-hyperloglog
A high-performance HyperLogLog implementation in Rust for cardinality estimation with pluggable storage backends.
What is HyperLogLog?
HyperLogLog is a probabilistic data structure that estimates the cardinality (number of unique elements) of a dataset. It's incredibly space-efficient: count billions of unique items using just kilobytes of memory, with typical accuracy within 2% of the true count.
Use cases:
- Unique visitor counting for web analytics
- Distinct user tracking across distributed systems
- Database query optimization (cardinality estimation)
- Network traffic analysis
- A/B testing metrics
Features
- 🚀 Fixed memory usage - Count billions of items with ~16KB (configurable)
- 🎯 High accuracy - Typically within 0.8-2% of true count
- 🔀 Mergeable - Combine counts from multiple sources effortlessly
- 💾 Pluggable storage - File-based or Elasticsearch backends
- 🌐 HTTP server - Optional Redis-compatible REST API (PFADD/PFCOUNT/PFMERGE)
- ⚡ Zero-copy operations - Efficient serialization/deserialization
- 🦀 Type-safe - Leverage Rust's type system for compile-time guarantees
- ✅ Well-tested - Comprehensive test suite with edge cases
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
Basic Usage
use HyperLogLog;
Merging Counts
Perfect for distributed systems:
use HyperLogLog;
Persistent Storage
use ;
async
Precision and Memory Tradeoffs
| Precision | Memory | Standard Error | Use Case |
|---|---|---|---|
| 10 | 1 KB | ±1.625% | Quick estimates, tight memory |
| 12 | 4 KB | ±0.813% | Good balance |
| 14 | 16 KB | ±0.406% | Default - recommended |
| 16 | 64 KB | ±0.203% | High accuracy needed |
Feature Flags
Control what gets compiled:
[]
= { = "0.1", = ["elasticsearch-storage", "server"] }
Available features:
file-storage(default) - File-based persistenceelasticsearch-storage- Elasticsearch backendserver- HTTP server with Redis-compatible APIfull- Everything
HTTP Server
Run a Redis-compatible HTTP service:
API Endpoints
# Add elements (PFADD)
# Get count (PFCOUNT)
# {"count": 3}
# Merge multiple HLLs (PFMERGE)
# Check existence
# true
# List all keys
# ["daily_visitors", "all_visitors"]
Configuration
Via environment variables:
# Storage backend
STORAGE_BACKEND=file # or "elasticsearch"
FILE_STORAGE_PATH=./data # for file backend
ELASTICSEARCH_URL=http://localhost:9200
ELASTICSEARCH_INDEX=hyperloglog
# Server
BIND_ADDRESS=0.0.0.0:3000
Examples
Run examples to see it in action:
# Basic counting
# Merging from multiple sources
# File storage
# Compare precision values
# HTTP server
How It Works
HyperLogLog uses a clever trick based on probability theory:
- Hash each element to get a uniform bit pattern
- Count leading zeros in the hash (rare for few items, common for many)
- Partition into multiple "registers" to reduce variance
- Estimate cardinality from the average leading zero count
For technical details, see the algorithm explanation.
Performance
Benchmarks on M1 MacBook Pro:
| Operation | Time | Throughput |
|---|---|---|
| Add element | ~15 ns | 66M ops/sec |
| Count (precision 14) | ~8 μs | 125K ops/sec |
| Merge (16KB each) | ~12 μs | 83K ops/sec |
| Serialize to JSON | ~50 μs | 20K ops/sec |
Memory usage is fixed: 2^precision bytes (16KB for precision 14).
Why Rust?
- Zero-cost abstractions - Fast as C, safe as high-level languages
- Fearless concurrency - Share HyperLogLogs across threads safely
- No GC pauses - Predictable latency for real-time systems
- Small binaries - Server example compiles to ~8MB
Comparison with Redis
| Feature | nano-hyperloglog | Redis PFCOUNT |
|---|---|---|
| Precision configurable | ✅ 4-16 bits | ✅ 14 bits |
| Persistent storage | ✅ File/ES | ✅ RDB/AOF |
| HTTP API | ✅ Optional | ❌ |
| Type-safe API | ✅ | ❌ |
| Startup time | < 10ms | ~100ms |
| Memory footprint | ~8MB binary | ~10MB process |
When to use each:
- Redis: Already using Redis, need sub-millisecond latency, want battle-tested stability
- nano-hyperloglog: Want dedicated service, need custom storage, building in Rust, serverless deployments
Contributing
Contributions welcome! Please:
- Add tests for new features
- Run
cargo fmtandcargo clippy - Update documentation
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
References
- Original HyperLogLog paper by Flajolet et al. (2007)
- HyperLogLog++ improvements by Google (2013)
- Redis HyperLogLog implementation
Acknowledgments
Inspired by Redis's PFCOUNT and the excellent work of Philippe Flajolet on probabilistic counting algorithms.