streaming_algorithms 0.1.0

Performant implementations of various streaming algorithms, including Count–min sketch; Top k; HyperLogLog; Reservoir sampling.
Documentation

streaming_algorithms

Crates.io Apache-2.0 licensed Build Status Build Status Build Status

Docs

Performant implementations of various streaming algorithms.

This library is a work in progress. PRs are very welcome! Currently implemented algorithms include:

  • Count–min sketch
  • Top k (Count–min sketch plus a doubly linked hashmap to track heavy hitters / top k keys when ordered by aggregated value)
  • HyperLogLog
  • Reservoir sampling

A goal of this library is to enable composition of these algorithms; for example Top k + HyperLogLog to enable roughly SELECT key FROM table GROUP BY key ORDER BY COUNT(DISTINCT value) DESC LIMIT k.

See this gist for a good list of further algorithms to be implemented. Other resources are Probabilistic data structures – Wikipedia, DataSketches – A similar Java library originating at Yahoo, and Algebird – A similar Java library originating at Twitter.

As these implementations are often in hot code paths, unsafe is used, albeit only when necessary to a) achieve the asymptotically optimal algorithm or b) mitigate an observed bottleneck.

License

Licensed under Apache License, Version 2.0, (LICENSE.txt or http://www.apache.org/licenses/LICENSE-2.0).

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions.