AVG aggregate expression
AVG aggregate expression
BitwiseNot expression
This is from Spark CheckOverflow
expression. Spark CheckOverflow
expression rounds decimals
to given scale and check if the decimals can fit in given precision. As cast
kernel rounds
decimals already, Comet CheckOverflow
expression only checks if the decimals can fit in the
precision.
CORR aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count
,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
COVAR_SAMP and COVAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count,
while Spark has Double for count.
IfExpr is a wrapper around CaseExpr, because IF(a, b, c)
is semantically equivalent to
CASE WHEN a THEN b ELSE c END
.
Negative expression
Implementation of RLIKE operator.
Spark cast options
An implementation of DataFusion’s SchemaAdapterFactory
that uses a Spark-compatible
cast
implementation.
STDDEV and STDDEV_SAMP (standard deviation) aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count
,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.
to_json function
This is similar to UnKnownColumn
in DataFusion, but it has data type.
This is only used when the column is not bound to a schema, for example, the
inputs to aggregation functions in final aggregation. In the case, we cannot
bind the aggregation functions to the input schema which is grouping columns
and aggregate buffer attributes in Spark (DataFusion has different design).
But when creating certain aggregation functions, we need to know its input
data types. As UnKnownColumn
doesn’t have data type, we implement this
UnboundColumn
to carry the data type.
VAR_SAMP and VAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field count
,
while Spark has Double for count. Also we have added null_on_divide_by_zero
to be consistent with Spark’s implementation.