Crate streaming_algorithms[−][src]
Performant implementations of various streaming algorithms.
This library is a work in progress. PRs are very welcome! Currently implemented algorithms include:
- Count–min sketch
- Top k (Count–min sketch plus a doubly linked hashmap to track heavy hitters / top k keys when ordered by aggregated value)
- HyperLogLog
- Reservoir sampling
A goal of this library is to enable composition of these algorithms; for example Top k + HyperLogLog to enable roughly SELECT key FROM table GROUP BY key ORDER BY COUNT(DISTINCT value) DESC LIMIT k
.
See this gist for a good list of further algorithms to be implemented. Other resources are Probabilistic data structures – Wikipedia, DataSketches – A similar Java library originating at Yahoo, and Algebird – A similar Java library originating at Twitter.
As these implementations are often in hot code paths, unsafe is used, albeit only when necessary to a) achieve the asymptotically optimal algorithm or b) mitigate an observed bottleneck.
Structs
CountMinSketch |
An implementation of a count-min sketch data structure with conservative updating for increased accuracy. |
HyperLogLog |
An implementation of the HyperLogLog data structure with bias correction. |
SampleTotal |
Given population and sample sizes, returns true if this element is in the sample. Without replacement. |
SampleUnstable |
Reservoir sampling. Without replacement, and the returned order is unstable. |
Top |
This probabilistic data structure tracks the |
TopIter |
An iterator over the entries and counts in a |
Traits
Intersect |
Intersect zero or more |
New |
New instances are instantiable given a specified input of |
UnionAssign |
Union |