Skip to main content

probabilistic_rs/
lib.rs

1//! Time-Decaying Bloom Filter implementation with Fjall or InMemory storage backends.
2//!
3//! This crate provides a Bloom filter implementation that automatically expires
4//! elements after a configurable time period using a sliding window approach.
5//!
6//! HowTo:
7//!    * Sub-Filters: The main Bloom filter is divided into N sub-filters:  BF_1, BF_2, …, BF_N .
8//!    * Time Windows: Each sub-filter corresponds to a fixed time window  T  (e.g., 1 minute).
9//!    * Rotation Mechanism: Sub-filters are rotated in a circular manner to represent sliding
10//!      time intervals.
11//!
12//! Insertion:
13//!     * When an element is added at time  t , it is inserted into the current sub-filter  BF_{current}.
14//!     * Hash the element using the standard Bloom filter hash functions and set the bits in  BF_{current} .
15//! Query:
16//!     * To check if an element is in the filter, perform the query against all active sub-filters.
17//!     * If all the required bits are set in any sub-filter, the element is considered present.
18//! Expiration:
19//!     * At each time interval  T , the oldest sub-filter  BF_{oldest}  is cleared.
20//!     * The cleared sub-filter becomes the new  BF_{current}  for incoming elements.
21//!     * This effectively removes elements that were only in  BF_{oldest} , thus expiring them.
22//!
23//! Obvious problems:
24//!     * False Positives: As elements may exist in multiple sub-filters,
25//!       the probability of false positives can increase.
26//!     * Synchronization: In concurrent environments, care must be taken to synchronize
27//!       access during sub-filter rotation.
28//!     * Since 32 bit hashes used, max capacity would be 2**32-1 (Not sure)
29
30pub mod bloom;
31pub mod common;
32pub mod ebloom;
33mod hash;
34
35#[cfg(feature = "python")]
36mod python;
37
38#[cfg(feature = "server")]
39pub mod server;
40
41pub use bloom::error::{BloomError, BloomResult};
42pub use ebloom::error::{EbloomError, EbloomResult};
43pub use hash::{
44    HashFunction, default_hash_function, optimal_bit_vector_size,
45    optimal_num_hashes,
46};