pub struct CuckooFilter<H = DefaultHasher>{ /* private fields */ }Expand description
A highly concurrent lock-free probabilistic data structure for set membership testing.
§What Makes It “Cuckoo”
Named after the cuckoo bird’s behavior of displacing other birds’ eggs, this filter uses cuckoo hashing where each item can be stored in one of two possible locations. When both locations are full, existing items are “evicted” (like cuckoo eggs) and relocated to their alternate position, creating eviction chains.
§Algorithm Overview
-
Fingerprints: Items are reduced to small fingerprints (4-32 bits) instead of storing full keys, providing excellent space efficiency.
-
Dual Hashing: Each item has two possible bucket locations computed from its hash. This provides better space efficiency and flexibility when inserting and removing items.
-
Eviction Chains: When both buckets are full, a random item is evicted from one bucket and moved to its alternate location, potentially triggering a chain of evictions.
-
Lock-Free Concurrency: All operations use atomic compare-exchange loops instead of traditional locks, with optimistic concurrency control for read operations. The only exception is when inserting with evictions, where a FullyExclusive lock is used to ensure consistency.
§Key Advantages Over Bloom Filters
- Deletions supported: Items can be removed without false negatives
- Better space efficiency: ~20-30% less memory for same false positive rate
- Bounded lookup time: Always at most 2 bucket checks, never more
- High concurrency: Lock-free design enables excellent parallel performance
§Concurrency Model
- Reads: Optimistic, can proceed concurrently with most operations
- Simple writes: Use atomic compare-exchange loops without blocking other operations
- WriterExclusive locks: Used for removing items, and for unique insertions
- Complex evictions: Use FullyExclusive locks to ensure consistency
§Time Complexity
- Lookup: O(1)
- Deletion: O(1)
- Insertion: Amortized O(1) due to eviction chains, but the number of evictions is bounded
Implementations§
Source§impl<H: Hasher + Default> CuckooFilter<H>
impl<H: Hasher + Default> CuckooFilter<H>
Sourcepub fn insert<T: ?Sized + Hash>(&self, item: &T) -> Result<(), Error>
pub fn insert<T: ?Sized + Hash>(&self, item: &T) -> Result<(), Error>
Insert an item into the filter
This operation first attempts a direct insertion without acquiring a lock. If that fails due to bucket collisions, it falls back to the eviction-based insertion algorithm which may require a write lock.
Concurrent operations are safely handled through atomic operations.
Returns Ok(()) if the item was inserted, or Error::NotEnoughSpace if the filter is full
Sourcepub fn insert_unique<T: ?Sized + Hash>(&self, item: &T) -> Result<bool, Error>
pub fn insert_unique<T: ?Sized + Hash>(&self, item: &T) -> Result<bool, Error>
Check if an item is in the filter and insert it if is not present (atomically)
This method combines lookup and insert into a single atomic operation, ensuring thread safety and consistency even with concurrent operations.
Returns Ok(true) if the item was inserted, Ok(false) if it was already present, or Error::NotEnoughSpace if the filter is full
Sourcepub fn count<T: ?Sized + Hash>(&self, item: &T) -> usize
pub fn count<T: ?Sized + Hash>(&self, item: &T) -> usize
Counts the number of occurrences of an item in the filter.
§Notes
- This is not a counting filter; it simply counts matching fingerprints in both candidate buckets.
- Useful for detecting duplicates or hash collisions, not for precise multiset membership.
- The count is limited by the filter’s structure: at most
bucket_size * 2per item. - This method may count false positives due to hash collisions.
Sourcepub fn remove<T: ?Sized + Hash>(&self, item: &T) -> bool
pub fn remove<T: ?Sized + Hash>(&self, item: &T) -> bool
Attempts to remove an item from the filter.
Returns true if the item was successfully removed, or false if it was not found.
Note:
- An item should only be removed if it was previously added. Removing a non-existent item may inadvertently remove a different item due to hash collisions inherent to cuckoo filters.
Sourcepub fn contains<T: ?Sized + Hash>(&self, item: &T) -> bool
pub fn contains<T: ?Sized + Hash>(&self, item: &T) -> bool
Check if an item is in the filter
Returns true if the item is possibly in the filter (may have false positives),
false if it is definitely not in the filter
Sourcepub fn to_bytes(&self) -> Vec<u8> ⓘ
pub fn to_bytes(&self) -> Vec<u8> ⓘ
Serialize the filter into a byte vector.
The serialized format is versioned and includes the filter configuration, element count, and all bucket contents. A FullyExclusive lock is acquired (when supported) to guarantee a consistent snapshot.
Sourcepub fn from_bytes(bytes: &[u8]) -> Result<Self, DeserializeError>
pub fn from_bytes(bytes: &[u8]) -> Result<Self, DeserializeError>
Deserialize a filter from the provided byte slice.
Sourcepub fn lock(&self, kind: LockKind) -> Option<Lock<'_>>
pub fn lock(&self, kind: LockKind) -> Option<Lock<'_>>
Acquires a lock on the filter, if necessary.
A lock is only required when evictions are enabled (i.e., max_evictions > 0).
If max_evictions is set to 0, no lock is acquired.
Returns Some(Lock) if a lock is needed, or None if no locking is required.
Source§impl CuckooFilter<DefaultHasher>
impl CuckooFilter<DefaultHasher>
Sourcepub fn builder() -> CuckooFilterBuilder<DefaultHasher>
pub fn builder() -> CuckooFilterBuilder<DefaultHasher>
Create a new CuckooFilterBuilder with default settings
Sourcepub fn new() -> CuckooFilter<DefaultHasher>
pub fn new() -> CuckooFilter<DefaultHasher>
Create a new CuckooFilter with default settings
Sourcepub fn with_capacity(capacity: usize) -> CuckooFilter<DefaultHasher>
pub fn with_capacity(capacity: usize) -> CuckooFilter<DefaultHasher>
Create a new CuckooFilter with the specified capacity