Expand description
line_cardinality provides utilities to count or estimate unique lines from input data. It can read from a
[BufRead] (such as stdin) or a file using optimized file reading functions.
Note line_cardinality only supports newline (\n) delimited input and does not perform any
UTF-8 validation: all lines are compared by byte value alone.
Examples of counting total distinct lines can be found in CountUnique.
Examples of reporting occurrences of each distinct line can be found in ReportUniqueLineHash.
Structs§
- Error
- Errors returned by line_cardinality
- Hashing
Line Counter Into Iter - An owned iter over report entries.
- Hashing
Line Counter Iter - A borrowing iter over report entries.
- Hyper
LogLog - Estimates the unique count and holds necessary state.
- Lossless
Hashing Line Counter - Calculates the unique count and holds necessary state.
- Lossy
Hashing Line Counter - Calculates the unique count and holds necessary state.
- Lossy
Sorting Line Counter - Calculates the unique count and holds necessary state.
Enums§
- Error
Cause - Contains the cause of an
Error
Traits§
- Count
Unique - Functionality to count total unique lines.
- Count
Unique Hash - A
CountUniquethat only stores hash and not line information. This enables algorithms that have increasing memory-efficiency in exchange for decreasing precision. - Count
Unique Line Hash - A
CountUniquethat stores line and hash information. This enables lossless handling of hash collisions and reporting of counts per-line, but incurs an extra memory cost. - Emit
Lines - Functionality to emit lines from a
CountUnique - Increment
- A type that can count occurrences of a line
- Merge
- A
CountUniquethat can be cheaply merged with anotherCountUniqueof the same type. Notably, this allows simple parallel implementations as the states can be merged at the end of the counting phase. - Report
Unique Line Hash - Functionality to count occurrences of each line.
Tis the counter type used.