Expand description
dsrs
contains bindings for a subset of Apache DataSketches.
Modules§
- counters
- Stateful reducers which maintain distinct count and heavy
hitters sketches, aimed at servicing the
dsrs
command-line tool for deduplicating byte lines of input. - stream_
reducer - A small abstraction for reducing over byte lines from a stream,
used for the command line tool
dsrs
.
Structs§
- CpcSketch
- The Compressed Probability Counting (CPC) sketch is a dynamically resizing (but still bounded-size) distinct count sketch. Some differences between CPC and the more typical HLL++ are:
- CpcUnion
- HhSketch
- The Heavy Hitter (HH) sketch computes an approximate set of the heavy hitters, the items in a data stream which appear most often. Along with each proposed approximate heavy hitter, the sketch can provide an estimate of the number of its appearances.
- Static
Theta Sketch - Theta
Intersection - Theta
Sketch - The Theta sketch is, essentially, an adaptive random sample of a stream. As a result, it can be used to estimate distinct counts and the sketches can be combined to estimate distinct counts of unions and and intersections and differences of streams.
- Theta
Union