This is from Spark CheckOverflow expression. Spark CheckOverflow expression rounds decimals
to given scale and check if the decimals can fit in given precision. As cast kernel rounds
decimals already, Comet CheckOverflow expression only checks if the decimals can fit in the
precision.
CORR aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
COVAR_SAMP and COVAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count.
STDDEV and STDDEV_SAMP (standard deviation) aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
This is similar to UnKnownColumn in DataFusion, but it has data type.
This is only used when the column is not bound to a schema, for example, the
inputs to aggregation functions in final aggregation. In the case, we cannot
bind the aggregation functions to the input schema which is grouping columns
and aggregate buffer attributes in Spark (DataFusion has different design).
But when creating certain aggregation functions, we need to know its input
data types. As UnKnownColumn doesn’t have data type, we implement this
UnboundColumn to carry the data type.
VAR_SAMP and VAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
Spark supports three evaluation modes when evaluating expressions, which affect
the behavior when processing input values that are invalid or would result in an
error, such as divide by zero errors, and also affects behavior when converting
between types.
Spark-compatible cast implementation. Defers to DataFusion’s cast where that is known
to be compatible, and returns an error when a not supported and not DF-compatible cast
is requested.