Crate line_cardinality

Crate line_cardinality 

Source
Expand description

line_cardinality provides utilities to count or estimate unique lines from input data. It can read from a [BufRead] (such as stdin) or a file using optimized file reading functions.

Note line_cardinality only supports newline (\n) delimited input and does not perform any UTF-8 validation: all lines are compared by byte value alone.

Examples of counting total distinct lines can be found in CountUnique.

Examples of reporting occurrences of each distinct line can be found in ReportUniqueLineHash.

Structs§

Error
Errors returned by line_cardinality
HashingLineCounterIntoIter
An owned iter over report entries.
HashingLineCounterIter
A borrowing iter over report entries.
HyperLogLog
Estimates the unique count and holds necessary state.
LosslessHashingLineCounter
Calculates the unique count and holds necessary state.
LossyHashingLineCounter
Calculates the unique count and holds necessary state.
LossySortingLineCounter
Calculates the unique count and holds necessary state.

Enums§

ErrorCause
Contains the cause of an Error

Traits§

CountUnique
Functionality to count total unique lines.
CountUniqueHash
A CountUnique that only stores hash and not line information. This enables algorithms that have increasing memory-efficiency in exchange for decreasing precision.
CountUniqueLineHash
A CountUnique that stores line and hash information. This enables lossless handling of hash collisions and reporting of counts per-line, but incurs an extra memory cost.
EmitLines
Functionality to emit lines from a CountUnique
Increment
A type that can count occurrences of a line
Merge
A CountUnique that can be cheaply merged with another CountUnique of the same type. Notably, this allows simple parallel implementations as the states can be merged at the end of the counting phase.
ReportUniqueLineHash
Functionality to count occurrences of each line. T is the counter type used.