Crate datafusion_comet_spark_expr
SourceRe-exports§
pub use hash_funcs::*;
Modules§
Macros§
- create_
hashes_ internal - Creates hash values for every row, based on the values in the columns.
- downcast_
compute_ op - hash_
array - hash_
array_ boolean - hash_
array_ decimal - hash_
array_ primitive - hash_
array_ primitive_ float - hash_
array_ small_ decimal - test_
hashes_ internal - test_
hashes_ with_ nulls
Structs§
- Array
Insert - Avg
- AVG aggregate expression
- AvgDecimal
- AVG aggregate expression
- Cast
- Check
Overflow - This is from Spark
CheckOverflow
expression. SparkCheckOverflow
expression rounds decimals to given scale and check if the decimals can fit in given precision. Ascast
kernel rounds decimals already, CometCheckOverflow
expression only checks if the decimals can fit in the precision. - Contains
- Correlation
- CORR aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field
count
, while Spark has Double for count. Also we have addednull_on_divide_by_zero
to be consistent with Spark’s implementation. - Covariance
- COVAR_SAMP and COVAR_POP aggregate expression The implementation mostly is the same as the DataFusion’s implementation. The reason we have our own implementation is that DataFusion has UInt64 for state_field count, while Spark has Double for count.
- Create
Named Struct - Ends
With - GetArray
Struct Fields - GetStruct
Field - IfExpr
- IfExpr is a wrapper around CaseExpr, because
IF(a, b, c)
is semantically equivalent toCASE WHEN a THEN b ELSE c END
. - Like
- List
Extract - Negative
Expr - Negative expression
- Normalize
NaNAnd Zero - RLike
- Implementation of RLIKE operator.
- Rand
Expr - Spark
Bitwise Count - Spark
Bitwise Get - Spark
Bitwise Not - Spark
Cast Options - Spark cast options
- Spark
ChrFunc - Spark-compatible
chr
expression - Spark
Date Trunc - Spark
Hour - Spark
Minute - Spark
Second - Starts
With - Stddev
- STDDEV and STDDEV_SAMP (standard deviation) aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field
count
, while Spark has Double for count. Also we have addednull_on_divide_by_zero
to be consistent with Spark’s implementation. - String
Space Expr - Substring
Expr - SumDecimal
- Timestamp
Trunc Expr - ToJson
- to_json function
- Unbound
Column - This is similar to
UnKnownColumn
in DataFusion, but it has data type. This is only used when the column is not bound to a schema, for example, the inputs to aggregation functions in final aggregation. In the case, we cannot bind the aggregation functions to the input schema which is grouping columns and aggregate buffer attributes in Spark (DataFusion has different design). But when creating certain aggregation functions, we need to know its input data types. AsUnKnownColumn
doesn’t have data type, we implement thisUnboundColumn
to carry the data type. - Variance
- VAR_SAMP and VAR_POP aggregate expression
The implementation mostly is the same as the DataFusion’s implementation. The reason
we have our own implementation is that DataFusion has UInt64 for state_field
count
, while Spark has Double for count. Also we have addednull_on_divide_by_zero
to be consistent with Spark’s implementation.
Enums§
- Eval
Mode - Spark supports three evaluation modes when evaluating expressions, which affect the behavior when processing input values that are invalid or would result in an error, such as divide by zero errors, and also affects behavior when converting between types.
- Spark
Error
Functions§
- create_
comet_ physical_ fun - Create a physical scalar function.
- create_
negate_ expr - register_
all_ comet_ functions - Registers all custom UDFs
- spark_
array_ repeat - spark_
cast - Spark-compatible cast implementation. Defers to DataFusion’s cast where that is known to be compatible, and returns an error when a not supported and not DF-compatible cast is requested.
- spark_
ceil ceil
function that simulates Sparkceil
expression- spark_
date_ add - spark_
date_ sub - spark_
decimal_ div - spark_
decimal_ integral_ div - spark_
floor floor
function that simulates Sparkfloor
expression- spark_
hex - Spark-compatible
hex
function - spark_
isnan - Spark-compatible
isnan
expression - spark_
make_ decimal - Spark-compatible
MakeDecimal
expression (internal to Spark optimizer) - spark_
read_ side_ padding - Similar to DataFusion
rpad
, but not to truncate when the string is already longer than length - spark_
round round
function that simulates Sparkround
expression- spark_
rpad - Custom
rpad
because DataFusion’srpad
has differences in unicode handling - spark_
unhex - Spark-compatible
unhex
expression - spark_
unscaled_ value - Spark-compatible
UnscaledValue
expression (internal to Spark optimizer)