Crate dsrs[][src]

Expand description

dsrs contains bindings for a subset of Apache DataSketches.


Stateful reducers which maintain distinct count and heavy hitters sketches, aimed at servicing the dsrs command-line tool for deduplicating byte lines of input.

A small abstraction for reducing over byte lines from a stream, used for the command line tool dsrs.


The Compressed Probability Counting (CPC) sketch is a dynamically resizing (but still bounded-size) distinct count sketch. Some differences between CPC and the more typical HLL++ are:

The Heavy Hitter (HH) sketch computes an approximate set of the heavy hitters, the items in a data stream which appear most often. Along with each proposed approximate heavy hitter, the sketch can provide an estimate of the number of its appearances.

The Theta sketch is, essentially, an adaptive random sample of a stream. As a result, it can be used to estimate distinct counts and the sketches can be combined to estimate distinct counts of unions and and intersections and differences of streams.