[][src]Crate probabilistic_collections

probabilistic-collections-rs

probabilistic-collections Documentation License: MIT License: Apache 2.0 Pipeline Status Coverage Report

probabilistic-collections contains various implementations of collections that use approximations to improve on running time or memory, but introduce a certain amount of error. The error can be controlled under a certain threshold which makes these data structures extremely useful for big data and streaming applications.

The following types of collections are implemented:

  • Approximate Membership in Set: BloomFilter, PartitionedBloomFilter, CuckooFilter, QuotientFilter
  • Scalable Approximate Membership in Set: ScalableBloomFilter, ScalableCuckooFilter
  • Approximate Membership in Stream: BSBloomFilter, BSSDBloomFilter, RLBSBloomFilter
  • Approximate Item Count: CountMinSketch
  • Approximate Distinct Item Count: HyperLogLog
  • Set similarity: MinHash, SimHash

Usage

Add this to your Cargo.toml:

[dependencies]
probabilistic-collections = "*"

For serde support, include the serde feature:

[dependencies]
probabilistic-collections = { version = "*", features = ["serde"] }

Add this to your crate root if you are using Rust 2015:

extern crate probabilistic_collections;

Changelog

See CHANGELOG for more details.

References

License

probabilistic-collections-rs is dual-licensed under the terms of either the MIT License or the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for more details.

Modules

bit_array_vec

Growable list of bit arrays.

bit_vec

Growable list of bits.

bloom

Space-efficient probabilistic data structure for approximate membership queries in a set.

count_min_sketch

Space-efficient probabilistic data structure for estimating the number of item occurrences.

cuckoo

Space-efficient probabilistic data structure for approximate membership queries in a set with the ability to remove items.

hyperloglog

Space-efficient probabilistic data structure for estimating the number of distinct items in a multiset.

quotient

Space-efficient probabilistic data structure for approximate membership queries in a set.

similarity

Locality-sensitive hashing schemes for measuring similarities between sets.

Structs

SipHasherBuilder

The default hash builder for all collections.