Skip to main content

CuckooFilter

Struct CuckooFilter 

Source
pub struct CuckooFilter<H = DefaultHasher>
where H: Hasher + Default,
{ /* private fields */ }
Expand description

A highly concurrent lock-free probabilistic data structure for set membership testing.

§What Makes It “Cuckoo”

Named after the cuckoo bird’s behavior of displacing other birds’ eggs, this filter uses cuckoo hashing where each item can be stored in one of two possible locations. When both locations are full, existing items are “evicted” (like cuckoo eggs) and relocated to their alternate position, creating eviction chains.

§Algorithm Overview

  1. Fingerprints: Items are reduced to small fingerprints (4-32 bits) instead of storing full keys, providing excellent space efficiency.

  2. Dual Hashing: Each item has two possible bucket locations computed from its hash. This provides better space efficiency and flexibility when inserting and removing items.

  3. Eviction Chains: When both buckets are full, a random item is evicted from one bucket and moved to its alternate location, potentially triggering a chain of evictions.

  4. Lock-Free Concurrency: All operations use atomic compare-exchange loops instead of traditional locks, with optimistic concurrency control for read operations. The only exception is when inserting with evictions, where a FullyExclusive lock is used to ensure consistency.

§Key Advantages Over Bloom Filters

  • Deletions supported: Items can be removed without false negatives
  • Better space efficiency: ~20-30% less memory for same false positive rate
  • Bounded lookup time: Always at most 2 bucket checks, never more
  • High concurrency: Lock-free design enables excellent parallel performance

§Concurrency Model

  • Reads: Optimistic, can proceed concurrently with most operations
  • Simple writes: Use atomic compare-exchange loops without blocking other operations
  • WriterExclusive locks: Used for removing items, and for unique insertions
  • Complex evictions: Use FullyExclusive locks to ensure consistency

§Time Complexity

  • Lookup: O(1)
  • Deletion: O(1)
  • Insertion: Amortized O(1) due to eviction chains, but the number of evictions is bounded

Implementations§

Source§

impl<H: Hasher + Default> CuckooFilter<H>

Source

pub fn insert<T: ?Sized + Hash>(&self, item: &T) -> Result<(), Error>

Insert an item into the filter

This operation first attempts a direct insertion without acquiring a lock. If that fails due to bucket collisions, it falls back to the eviction-based insertion algorithm which may require a write lock.

Concurrent operations are safely handled through atomic operations.

Returns Ok(()) if the item was inserted, or Error::NotEnoughSpace if the filter is full

Source

pub fn insert_unique<T: ?Sized + Hash>(&self, item: &T) -> Result<bool, Error>

Check if an item is in the filter and insert it if is not present (atomically)

This method combines lookup and insert into a single atomic operation, ensuring thread safety and consistency even with concurrent operations.

Returns Ok(true) if the item was inserted, Ok(false) if it was already present, or Error::NotEnoughSpace if the filter is full

Source

pub fn count<T: ?Sized + Hash>(&self, item: &T) -> usize

Counts the number of occurrences of an item in the filter.

§Notes
  • This is not a counting filter; it simply counts matching fingerprints in both candidate buckets.
  • Useful for detecting duplicates or hash collisions, not for precise multiset membership.
  • The count is limited by the filter’s structure: at most bucket_size * 2 per item.
  • This method may count false positives due to hash collisions.
Source

pub fn remove<T: ?Sized + Hash>(&self, item: &T) -> bool

Attempts to remove an item from the filter.

Returns true if the item was successfully removed, or false if it was not found.

Note:

  • An item should only be removed if it was previously added. Removing a non-existent item may inadvertently remove a different item due to hash collisions inherent to cuckoo filters.
Source

pub fn contains<T: ?Sized + Hash>(&self, item: &T) -> bool

Check if an item is in the filter

Returns true if the item is possibly in the filter (may have false positives), false if it is definitely not in the filter

Source

pub fn len(&self) -> usize

Get the number of elements in the filter

Source

pub fn is_empty(&self) -> bool

Check if the filter is empty

Source

pub fn capacity(&self) -> usize

Get the capacity of the filter

Source

pub fn to_bytes(&self) -> Vec<u8>

Serialize the filter into a byte vector.

The serialized format is versioned and includes the filter configuration, element count, and all bucket contents. A FullyExclusive lock is acquired (when supported) to guarantee a consistent snapshot.

Source

pub fn from_bytes(bytes: &[u8]) -> Result<Self, DeserializeError>

Deserialize a filter from the provided byte slice.

Source

pub fn clear(&self)

Clear the filter, removing all elements

Source

pub fn lock(&self, kind: LockKind) -> Option<Lock<'_>>

Acquires a lock on the filter, if necessary.

A lock is only required when evictions are enabled (i.e., max_evictions > 0). If max_evictions is set to 0, no lock is acquired.

Returns Some(Lock) if a lock is needed, or None if no locking is required.

Source§

impl CuckooFilter<DefaultHasher>

Source

pub fn builder() -> CuckooFilterBuilder<DefaultHasher>

Create a new CuckooFilterBuilder with default settings

Source

pub fn new() -> CuckooFilter<DefaultHasher>

Create a new CuckooFilter with default settings

Source

pub fn with_capacity(capacity: usize) -> CuckooFilter<DefaultHasher>

Create a new CuckooFilter with the specified capacity

Trait Implementations§

Source§

impl<H> Debug for CuckooFilter<H>
where H: Hasher + Default + Debug,

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for CuckooFilter<DefaultHasher>

Source§

fn default() -> Self

Create a new CuckooFilter with default settings

Source§

impl<'de, H> Deserialize<'de> for CuckooFilter<H>
where H: Hasher + Default,

Source§

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl<H: Hasher + Default> Serialize for CuckooFilter<H>

Source§

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl<H = DefaultHasher> !Freeze for CuckooFilter<H>

§

impl<H> RefUnwindSafe for CuckooFilter<H>
where H: RefUnwindSafe,

§

impl<H> Send for CuckooFilter<H>
where H: Send,

§

impl<H> Sync for CuckooFilter<H>
where H: Sync,

§

impl<H> Unpin for CuckooFilter<H>
where H: Unpin,

§

impl<H> UnwindSafe for CuckooFilter<H>
where H: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,