Module aggfunc

Module aggfunc 

Source
Expand description

The aggfunc module is the central module for computing statistics from a stream of records.

The central component of this is a trait called Accumulate that implements a new function on initialization, an update function to add a new record, and a compute function to compute the final value of the aggregation. This trait requires two types, an input type (which is used by the new and update functions) and an output type.

Internally, all of the structs implementing this trait are used in the main aggregation module with the input type bounded by FromStr so the tool can convert from string records to the internal data types that these aggregation types manipulate. And the output type is bounded by Display so the tool can write the outputs to standard output.

Structs§

Count
The total number of records added to the accumulator.
CountUnique
The total number of unique records.
Maximum
The largest value (or the value that would appear last in a sorted array)
Mean
The mean. This is only implemented for DecimalWrapper, though it could probably be extended for floating point types.
Median
The median value. I’ve stored values in a BTreeMap in order to minimize memory usage. As a result, this is the least performant of all the functions (running at Nlog(m), rather than the N of all the other algorithms (where m is the number of unique values in the accumulator).
MinMax
A combination of the minimum and maximum values, producing a string concatenating the minimum value and the maximum value together, separated by a hyphen.
Minimum
The minimum value
Mode
The most commonly appearing item.
Range
The range, or the difference between the minimum and maximum values (where the minimum value is subtracted from the maximum value).
StdDev
Computes the sample variance in a single pass, using Welford’s algorithm. The attributes in this method refer to the same ones described in Accuracy and Stability of Numerical Algorithms by Higham (2nd Edition, page 11).
Sum
The running sum of a stream of values.

Traits§

Accumulate
Accumulates records from a stream, in order to allow functions to be optimized for minimal memory usage.