Crate probabilistic_collections[][src]

probabilistic-collections-rs

probabilistic-collections Documentation License: MIT Build Status codecov

probabilistic-collections contains various implementations of collections that use randomization to improve on running time or memory, but introduce a certain amount of error. The error can be controlled under a certain threshold which makes these data structures extremely useful for big data and streaming applications.

Usage

Add this to your Cargo.toml:

[dependencies]
probabilistic-collections = "*"

and this to your crate root:

extern crate probabilistic_collections;

References

Almeida, Paulo Sérgio, Carlos Baquero, Nuno Preguiça, and David Hutchison. 2007. “Scalable Bloom Filters.” Inf. Process. Lett. 101 (6). Amsterdam, The Netherlands, The Netherlands: Elsevier North-Holland, Inc.: 255–61. doi:10.1016/j.ipl.2006.10.007.

Bera, Suman K., Sourav Dutta, Ankur Narang, and Souvik Bhattacherjee. 2012. “Advanced Bloom Filter Based Algorithms for Efficient Approximate Data de-Duplication in Streams.” CoRR abs/1212.3964. http://arxiv.org/abs/1212.3964.

Fan, Bin, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. “Cuckoo Filter: Practically Better Than Bloom.” In Proceedings of the 10th Acm International on Conference on Emerging Networking Experiments and Technologies, 75–88. CoNEXT ’14. New York, NY, USA: ACM. doi:10.1145/2674005.2674994.

Flajolet, Philippe, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. “Hyperloglog: The Analysis of a Near-Optimal Cardinality Estimation Algorithm.” In IN Aofa ’07: PROCEEDINGS of the 2007 International Conference on Analysis of Algorithms.

Heule, Stefan, Marc Nunkesser, and Alexander Hall. 2013. “HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm.” In Proceedings of the 16th International Conference on Extending Database Technology, 683–92. EDBT ’13. New York, NY, USA: ACM. doi:10.1145/2452376.2452456.

Kirsch, Adam, and Michael Mitzenmacher. 2008. “Less Hashing, Same Performance: Building a Better Bloom Filter.” Random Struct. Algorithms 33 (2). New York, NY, USA: John Wiley & Sons, Inc.: 187–218. doi:10.1002/rsa.v33:2.

Modules

bit_array_vec

Growable list of bit arrays.

bit_vec

Growable list of bits.

bloom

Space-efficient probabilistic data structure to test for membership in a set.

count_min_sketch

Space-efficient probabilistic data structure for estimating the number of item occurrences.

cuckoo

Space-efficient probabilistic data structure to test for membership in a set with the ability to remove items.

hyperloglog

Space-efficient probabilistic data structure to count the number of distinct items in a multiset.

similarity

Module for measuring similarities between sets.