aleph-filter 0.1.0

The Aleph Filter — an infinitely expandable probabilistic data structure with O(1) insert, query, and delete
Documentation

What is the Aleph Filter?

The Aleph Filter is a probabilistic data structure for approximate membership queries, introduced in the VLDB 2024 paper by Dayan, Bercea, and Pagh. It answers a deceptively simple question:

"Is this element in the set?"

Like Bloom and Cuckoo filters, it may produce false positives but never false negatives. Unlike them, it can expand indefinitely without rebuilding, sacrificing one fingerprint bit per expansion to double its capacity, all while maintaining O(1) operations.

Why does this matter?

Problem Traditional Filters Aleph Filter
Dataset grows beyond capacity Rebuild from scratch Expands automatically
Need to delete items ❌ (Bloom) or limited (Cuckoo) ✅ Full support
FPR degrades as load increases Yes Stable by design
Query cost grows with expansions O(log N) (InfiniFilter) O(1) always

Quick Start

use aleph_filter::AlephFilter;

// Create a filter for ~1000 items with 1% false positive rate
let mut filter = AlephFilter::new(1000, 0.01);

// Insert
filter.insert(b"hello");
filter.insert(b"world");

// Query
assert!(filter.contains(b"hello"));   // true  — definitely in the set
assert!(!filter.contains(b"nope"));   // false — definitely NOT in the set

// Delete
filter.delete(b"hello");
assert!(!filter.contains(b"hello"));  // gone

// The filter expands automatically as you insert more items.
// No need to pre-size or rebuild!
for i in 0..100_000 {
    filter.insert(format!("key_{}", i).as_bytes());
}

Performance

Benchmarked on Apple M2 (ARM64), Rust 1.93.0, --release mode. All filters targeting 1% FPR.

Throughput (ns/op, lower is better)

INSERT (N = 10,000)
  Aleph    ██████████████████████████████████████████ 71.9 ns    ← fastest dynamic
  Bloom    █████████████████████████████████████████████████████████████████████████████████████████████████████  193.7 ns
  Cuckoo   ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████  303.7 ns

NEGATIVE LOOKUP (N = 10,000)
  Aleph    ██████████████████████████  44.8 ns    ← fastest dynamic
  Bloom    ████████████████████████████████████████████████████████████████████████████████████████  150.4 ns
  Cuckoo   ██████████████████████████████████████████████████████████████████████████████████████████████████████  170.6 ns

DELETE (N = 10,000)
  Aleph    ██████████████████████████████████████████████████████  101.6 ns    ← 3× faster
  Cuckoo   █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████  287.6 ns
  Bloom    — (not supported)

False Positive Rate (target = 1.00%)

N Aleph Bloom Cuckoo Xor8
1K 0.39% 1.04% 2.98% ⚠️ 0.42%
10K 0.43% 0.99% 1.98% ⚠️ 0.39%
100K 0.62% 1.00% 2.47% ⚠️ 0.41%

Full benchmark results & methodology: docs/chapters/comparison-and-conclusion.tex

Run Benchmarks Yourself

# Quick report (~30s) — console output + JSON
cargo run --release --example benchmark_report

# Full Criterion statistical benchmarks (~15 min) — HTML reports
cargo bench --bench filter_comparison
open target/criterion/report/index.html

How It Works

The Aleph Filter is built on a quotient filter foundation with three key innovations:

┌─────────────────────────────────────────────────────────────────────┐
│                        Hash Function (xxHash3)                      │
│                              64-bit hash                            │
│                    ┌──────────┬────────────────┐                    │
│                    │ quotient │  fingerprint   │                    │
│                    │ (q bits) │   (r bits)     │                    │
│                    └────┬─────┴───────┬────────┘                    │
│                         │             │                             │
│                         ▼             ▼                             │
│                   ┌─────────┐  ┌──────────────┐                     │
│                   │  slot   │  │  stored in   │                     │
│                   │  index  │  │  slot data   │                     │
│                   └─────────┘  └──────────────┘                     │
└─────────────────────────────────────────────────────────────────────┘

1. Pivot-Bit Expansion

When the filter exceeds 80% load, it doubles its slot count by:

  • Incrementing quotient width by 1 bit (more address bits)
  • Sacrificing 1 fingerprint bit per entry (slightly higher FPR)
  • Using the sacrificed bit as a "pivot" to distribute entries to new slots

2. Void Entry Duplication

When all fingerprint bits are exhausted (after many expansions), entries become void markers. These are duplicated to both possible canonical slots during expansion, ensuring they're always findable in O(1).

3. Cluster-Aware Deletion

Deletion uses a full cluster-aware compaction algorithm that shifts entries backward to fill gaps, maintaining the quotient filter's structural invariants.

Architecture

src/
├── lib.rs            # Public API & module declarations
├── hash.rs           # xxHash3 → (quotient, fingerprint) splitting
├── metadata.rs       # Per-slot flags: occupied, continuation, shifted
├── slot.rs           # 64-bit packed slots: fingerprint + length + markers
└── aleph_filter.rs   # Core filter: insert, query, delete, expand
Module Responsibility
hash Single xxHash3 call → split into quotient (slot address) and remainder (fingerprint)
slot Pack fingerprint + length into 64 bits, with special void/tombstone markers
metadata 3-bit flags per slot tracking cluster/run structure for O(1) lookups
aleph_filter Quotient filter operations, run/cluster management, expansion logic

Feature Matrix

Feature Bloom Cuckoo Xor8 Aleph
Dynamic insert
Delete
Expand without rebuild
Stable FPR ✅*
O(1) lookup O(k)
Streaming/unbounded data

* Xor8 is immutable — FPR is fixed at construction time

Documentation

This project includes a comprehensive LaTeX technical report covering:

  • Background on probabilistic filters (Bloom, Quotient, Cuckoo)
  • Quotient filter expansion and fingerprint splitting
  • The Aleph Filter algorithm in detail
  • Our Rust implementation design decisions
  • Comparative benchmarks with analysis
# Build the PDF (requires LaTeX installation)
cd docs && latexmk -pdf aleph_filter_explained.tex

Pre-built PDF: docs/aleph_filter_explained.pdf

Testing

# Run all tests
cargo test

# Run with output (see FPR measurements, expansion traces)
cargo test -- --nocapture

# Run a specific test
cargo test test_no_false_negatives_10000

The test suite covers:

  • ✅ No false negatives (critical invariant)
  • ✅ FPR within target bounds
  • ✅ Deletion correctness
  • ✅ Expansion survival (all keys findable post-expand)
  • ✅ Extreme expansion (50K inserts from 4 slots)
  • ✅ Interleaved insert/delete/query
  • ✅ Edge cases (empty keys, long keys, binary keys, duplicates)

References

Resource Link
Paper Aleph Filter: To Infinity in Constant Time (VLDB 2024)
Authors Niv Dayan, Ioana Bercea, Rasmus Pagh
Java Reference github.com/nivdayan/AlephFilter
This Implementation github.com/NPX2218/aleph-filter

License

MIT © 2026 Neel Bansal