Crate datafusion_comet_spark_expr

Source

Re-exports§

Modules§

Macros§

Structs§

  • AVG aggregate expression
  • AVG aggregate expression
  • BitwiseNot expression
  • This is from Spark CheckOverflow expression. Spark CheckOverflow expression rounds decimals to given scale and check if the decimals can fit in given precision. As cast kernel rounds decimals already, Comet CheckOverflow expression only checks if the decimals can fit in the precision.
  • CORR aggregate expression The implementation mostly is the same as the DataFusion’s implementation. The reason we have our own implementation is that DataFusion has UInt64 for state_field count, while Spark has Double for count. Also we have added null_on_divide_by_zero to be consistent with Spark’s implementation.
  • COVAR_SAMP and COVAR_POP aggregate expression The implementation mostly is the same as the DataFusion’s implementation. The reason we have our own implementation is that DataFusion has UInt64 for state_field count, while Spark has Double for count.
  • IfExpr is a wrapper around CaseExpr, because IF(a, b, c) is semantically equivalent to CASE WHEN a THEN b ELSE c END.
  • Negative expression
  • Implementation of RLIKE operator.
  • Spark cast options
  • An implementation of DataFusion’s SchemaAdapterFactory that uses a Spark-compatible cast implementation.
  • STDDEV and STDDEV_SAMP (standard deviation) aggregate expression The implementation mostly is the same as the DataFusion’s implementation. The reason we have our own implementation is that DataFusion has UInt64 for state_field count, while Spark has Double for count. Also we have added null_on_divide_by_zero to be consistent with Spark’s implementation.
  • to_json function
  • This is similar to UnKnownColumn in DataFusion, but it has data type. This is only used when the column is not bound to a schema, for example, the inputs to aggregation functions in final aggregation. In the case, we cannot bind the aggregation functions to the input schema which is grouping columns and aggregate buffer attributes in Spark (DataFusion has different design). But when creating certain aggregation functions, we need to know its input data types. As UnKnownColumn doesn’t have data type, we implement this UnboundColumn to carry the data type.
  • VAR_SAMP and VAR_POP aggregate expression The implementation mostly is the same as the DataFusion’s implementation. The reason we have our own implementation is that DataFusion has UInt64 for state_field count, while Spark has Double for count. Also we have added null_on_divide_by_zero to be consistent with Spark’s implementation.

Enums§

  • Spark supports three evaluation modes when evaluating expressions, which affect the behavior when processing input values that are invalid or would result in an error, such as divide by zero errors, and also affects behavior when converting between types.

Functions§

Type Aliases§