AVG aggregate expression
AVG aggregate expression
BitwiseNot expression
This is from Spark CheckOverflow expression. Spark CheckOverflow expression rounds decimals
to given scale and check if the decimals can fit in given precision. As cast kernel rounds
decimals already, Comet CheckOverflow expression only checks if the decimals can fit in the
precision.
CORR aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
COVAR_SAMP and COVAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count.
IfExpr is a wrapper around CaseExpr, because IF(a, b, c) is semantically equivalent to
CASE WHEN a THEN b ELSE c END.
Negative expression
Implementation of RLIKE operator.
Spark cast options
An implementation of DataFusion’s SchemaAdapterFactory that uses a Spark-compatible
cast implementation.
STDDEV and STDDEV_SAMP (standard deviation) aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
to_json function
This is similar to UnKnownColumn in DataFusion, but it has data type.
This is only used when the column is not bound to a schema, for example, the
inputs to aggregation functions in final aggregation. In the case, we cannot
bind the aggregation functions to the input schema which is grouping columns
and aggregate buffer attributes in Spark (DataFusion has different design).
But when creating certain aggregation functions, we need to know its input
data types. As UnKnownColumn doesn’t have data type, we implement this
UnboundColumn to carry the data type.
VAR_SAMP and VAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.