Crate pdatastructs
source · [−]Expand description
pdatastructs
A collection of data structures that are based probability theory and therefore only provide the correct answer if a certain probability. In exchange they have a better runtime and memory complexity compared to traditional data structures.
The following data structures are implemented:
- CountMinSketch
- Filters:
- BloomFilter
- CuckooFilter
- QuotientFilter
- HyperLogLog
- ReservoirSampling
- T-Digest
- Top-K:
- CMSHeap
- LossyCounter
License
Licensed under either of these:
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or https://opensource.org/licenses/MIT)
Contributing
Unless you explicitly state otherwise, any contribution you intentionally submit for inclusion in the work, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.
Changelog
0.7 — Misc Improvements & Fixes
All
- support
?Sized
where possible - implement
Send
where possible
Filters
- proper union support
HyperLogLog
- fix potential out-of-bounds access (#74)
- improve performance
T-Digest
- more scale functions
Dependencies
- replace
Void
withInfallible
- update dependencies
- make all dependencies optional
0.6 — T-Digest, Lossy Counter
T-Digest
- new and shiny
Filters
- improve documentation of BloomFilter
Top-K
- rename former Top-K to CMS Heap
- add Lossy Counter implementation
CountMinSketch
add
andadd_n
now return count after addition operation
Dependencies
- update bytecount to 0.5
0.5 — Performance, Filter Trait, Rust 2018
Global
- benchmarking system
- improved hash performance
- various tiny documentation improvements
- misc performance improvements
- use stdlib
BuildHasherDefault
instead of own version - make all containers typed and implement
AnyHash
for dynamically typed containers - enforce formatting in CI
- Rust 2018
Dependencies
- update bytecount to 0.4
- update rand to 0.6
HyperLogLog
- implement
relative_error
- enhanced bias correction
- extend value range for
b
Filters
- unified trait-based filter interface
- add QuotientFilter
0.4 — CuckooFilter, API Improvements, Docs
- CountMinSketch: use
num-traits
as counters,add_n
takes element by reference - CuckooFilter: new!
- HyperLogLog: faster byte-count
- TopK:
.values()
->.iter()
- all: tests for asserts, way nicer docs
- license: change to MIT+Apache2
0.3 — Reservoir Sampling, Top-K, useful traits
- ReservoirSampling: first implementation
- TopK: first implementation
- all: implement Extend trait
0.2 — CountMinSketch, rusty getters
- Counter: a safe counter trait
- CountMinSketch: first implementation
- BloomFilter, HyperLogLog: remove
get_
prefix from getters
0.1 — BloomFilter, HyperLogLog
This is the very first release with the following data structures:
- BloomFilter
- HyperLogLog
Re-exports
pub use num_traits;
pub use rand;
Modules
CountMinSketch implementation.
Filters, Approximate Membership Queries (AMQs).
Hash-related utils.
HyperLogLog implementation.
ReservoirSampling implementation.
TDigest implementation.
Top-K, store k
most frequent data points in stream.