Crate probabilistic_collections

Source
Expand description

§probabilistic-collections-rs

probabilistic-collections Documentation License: MIT License: Apache 2.0 Pipeline Status Coverage Report

probabilistic-collections contains various implementations of collections that use approximations to improve on running time or memory, but introduce a certain amount of error. The error can be controlled under a certain threshold which makes these data structures extremely useful for big data and streaming applications.

The following types of collections are implemented:

  • Approximate Membership in Set: BloomFilter, PartitionedBloomFilter, CuckooFilter, QuotientFilter
  • Scalable Approximate Membership in Set: ScalableBloomFilter, ScalableCuckooFilter
  • Approximate Membership in Stream: BSBloomFilter, BSSDBloomFilter, RLBSBloomFilter
  • Approximate Item Count: CountMinSketch
  • Approximate Distinct Item Count: HyperLogLog
  • Set similarity: MinHash, SimHash

§Usage

Add this to your Cargo.toml:

[dependencies]
probabilistic-collections = "*"

For serde support, include the serde feature:

[dependencies]
probabilistic-collections = { version = "*", features = ["serde"] }

Add this to your crate root if you are using Rust 2015:

extern crate probabilistic_collections;

§Changelog

See CHANGELOG for more details.

§References

§License

probabilistic-collections-rs is dual-licensed under the terms of either the MIT License or the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for more details.

Modules§

bit_array_vec
Growable list of bit arrays.
bit_vec
Growable list of bits.
bloom
Space-efficient probabilistic data structure for approximate membership queries in a set.
count_min_sketch
Space-efficient probabilistic data structure for estimating the number of item occurrences.
cuckoo
Space-efficient probabilistic data structure for approximate membership queries in a set with the ability to remove items.
hyperloglog
Space-efficient probabilistic data structure for estimating the number of distinct items in a multiset.
quotient
Space-efficient probabilistic data structure for approximate membership queries in a set.
similarity
Locality-sensitive hashing schemes for measuring similarities between sets.

Structs§

SipHasherBuilder
The default hash builder for all collections.