Expand description
Spark Expression packages for DataFusion.
This crate contains a collection of various Spark function packages for DataFusion, implemented using the extension API.
§Available Function Packages
See the list of modules in this crate for available packages.
§Example: using all function packages
You can register all the functions in all packages using the register_all
function as shown below. Any existing functions will be overwritten, with these
Spark functions taking priority.
// Create a new session context
let mut ctx = SessionContext::new();
// Register all Spark functions with the context
datafusion_spark::register_all(&mut ctx)?;
// Run a query using the `sha2` function which is now available and has Spark semantics
let df = ctx.sql("SELECT sha2('The input String', 256)").await?;§Example: calling a specific function in Rust
Each package also exports an expr_fn submodule that create Exprs for
invoking functions via rust using a fluent style. For example, to invoke the
sha2 function, you can use the following code:
use datafusion_spark::expr_fn::sha2;
// Create the expression `sha2(my_data, 256)`
let expr = sha2(col("my_data"), lit(256));§Example: using the Spark expression planner
The planner::SparkFunctionPlanner provides Spark-compatible expression
planning, such as mapping SQL EXTRACT expressions to Spark’s date_part
function. To use it, register it with your session context:
use std::sync::Arc;
use datafusion::prelude::SessionContext;
use datafusion_spark::planner::SparkFunctionPlanner;
let mut ctx = SessionContext::new();
// Register the Spark expression planner
ctx.register_expr_planner(Arc::new(SparkFunctionPlanner))?;
// Now EXTRACT expressions will use Spark semantics
let df = ctx.sql("SELECT EXTRACT(YEAR FROM timestamp_col) FROM my_table").await?;§Example: enabling Apache Spark features with SessionStateBuilder
The recommended way to enable Apache Spark compatibility is to use the
SessionStateBuilderSpark extension trait. This registers all
Apache Spark functions (scalar, aggregate, window, and table) as well as the Apache Spark
expression planner.
Enable the core feature in your Cargo.toml:
datafusion-spark = { version = "X", features = ["core"] }Then use the extension trait - see SessionStateBuilderSpark::with_spark_features
for an example.
Modules§
Traits§
- Session
State Builder Spark core - Extension trait for adding Apache Spark features to
SessionStateBuilder.
Functions§
- all_
default_ aggregate_ functions - Returns all default aggregate functions
- all_
default_ scalar_ functions - Returns all default scalar functions
- all_
default_ table_ functions - Returns all default table functions
- all_
default_ window_ functions - Returns all default window functions
- register_
all - Registers all enabled packages with a
FunctionRegistry, overriding any existing functions if there is a name clash.