probabilistic-rs 0.6.3

Probabilistic data structures in Rust
Documentation

probabilistic-rs

Crates.io PyPI Documentation codecov

Probabilistic data structures in Rust with Python bindings and HTTP API.

Features

  • Bloom Filter — fast membership testing with bulk ops and optional Fjall persistence
  • Expiring Bloom Filter — auto-expires elements via sliding time windows
  • HTTP API — REST server with Swagger UI, managing multiple named filters
  • Python Bindings — native wheels via PyO3/maturin
  • CLI + TUI — interactive terminal interface

TUI Screenshot

Installation

Rust:

cargo add probabilistic-rs

Python:

pip install probabilistic-rs

Quick Start

Rust — Bloom Filter

use probabilistic_rs::bloom::{BloomFilter, BloomFilterConfigBuilder, BloomFilterOps};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = BloomFilterConfigBuilder::default()
        .capacity(10000)
        .false_positive_rate(0.01)
        .build()?;

    let filter = BloomFilter::create(config).await?;
    filter.insert(b"item1")?;
    assert!(filter.contains(b"item1")?);
    Ok(())
}

Rust — Expiring Bloom Filter

use probabilistic_rs::ebloom::{
    ExpiringBloomFilter, ExpiringFilterConfigBuilder, ExpiringBloomFilterOps,
};
use std::time::Duration;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ExpiringFilterConfigBuilder::default()
        .capacity(1000)
        .false_positive_rate(0.01)
        .level_duration(Duration::from_secs(60))
        .max_levels(3)
        .build()?;

    let mut filter = ExpiringBloomFilter::new(config)?;
    filter.insert(b"test_item")?;
    assert!(filter.query(b"test_item")?);
    Ok(())
}

Python

from probabilistic_rs import BloomFilter, ExpiringBloomFilter

bf = BloomFilter(capacity=10000, false_positive_rate=0.01)
bf.insert(b"item1")
assert bf.contains(b"item1")

ebf = ExpiringBloomFilter(capacity=1000, false_positive_rate=0.01, ttl_seconds=60)
ebf.insert(b"temp_item")
assert ebf.query(b"temp_item")

HTTP API

# Start server (default: localhost:3000)
probabilistic-server

# Create a filter, insert, query
curl -X POST http://localhost:3000/api/v1/bloom/create \
  -H "Content-Type: application/json" \
  -d '{"name":"my-filter","capacity":10000,"false_positive_rate":0.01}'

curl -X POST http://localhost:3000/api/v1/bloom/insert \
  -H "Content-Type: application/json" \
  -d '{"name":"my-filter","item":"hello"}'

curl -X POST http://localhost:3000/api/v1/bloom/contains \
  -H "Content-Type: application/json" \
  -d '{"name":"my-filter","item":"hello"}'

Endpoints: create, delete, insert, contains, bulk_insert, bulk_contains, clear, stats, list — available for both /api/v1/bloom and /api/v1/ebloom. Swagger UI at /swagger-ui.

CLI

# Create filter
expblf create --db-path myfilter.fjall --capacity 10000 --fpr 0.01

# Operations
expblf load --db-path myfilter.fjall insert --element "key"
expblf load --db-path myfilter.fjall check --element "key"

# Interactive TUI
expblf tui --db-path myfilter.fjall

Benchmarks

Measured on Apple M-series via cargo bench (criterion, 100 samples). Times are total for N operations.

Bloom Filter (in-memory)

Operation 1K elements 100K elements 1M elements
Insert 60.2 µs 6.15 ms 64.1 ms
Query 61.3 µs 6.17 ms 63.1 ms

Expiring Bloom Filter (in-memory, 3 levels — 5 levels nearly identical)

Operation 1K elements 100K elements 1M elements
Insert 63.8 µs 6.61 ms 68.0 ms
Query 63.4 µs 6.52 ms 67.0 ms
Bulk insert 59.7 µs 6.16 ms 63.8 ms
Bulk query 62.3 µs 6.41 ms 65.8 ms
Level rotate 224 µs 255 µs

Both filters sustain ~15–17M ops/s (60–65 ns/op) across all dataset sizes. The expiring filter adds ~5–10% overhead over plain bloom due to multi-level bookkeeping. Bulk operations match or slightly outperform single-item ops. Level rotation (TTL expiry) takes ~250 µs regardless of filter size.

Configuration

Parameter Description Default
capacity Max elements 1,000,000
false_positive_rate Desired FPR 0.01
level_duration TTL per level (expiring) 60s
max_levels Filter levels (expiring) 3

License

MIT