pub trait HigherOrderUDFImpl:
Debug
+ DynEq
+ DynHash
+ Send
+ Sync
+ Any {
Show 13 methods
// Required methods
fn name(&self) -> &str;
fn signature(&self) -> &HigherOrderSignature;
fn lambda_parameters(
&self,
step: usize,
fields: &[ValueOrLambda<FieldRef, Option<FieldRef>>],
) -> Result<LambdaParametersProgress>;
fn return_field_from_args(
&self,
args: HigherOrderReturnFieldArgs<'_>,
) -> Result<FieldRef>;
fn invoke_with_args(
&self,
args: HigherOrderFunctionArgs,
) -> Result<ColumnarValue>;
// Provided methods
fn aliases(&self) -> &[String] { ... }
fn schema_name(&self, args: &[Expr]) -> Result<String> { ... }
fn coerce_values_for_lambdas(
&self,
_fields: &[ValueOrLambda<DataType, DataType>],
) -> Result<Option<Vec<DataType>>> { ... }
fn clear_null_values(&self) -> bool { ... }
fn short_circuits(&self) -> bool { ... }
fn conditional_arguments<'a>(
&self,
args: &'a [Expr],
) -> Option<(Vec<&'a Expr>, Vec<&'a Expr>)> { ... }
fn coerce_value_types(
&self,
_arg_types: &[DataType],
) -> Result<Vec<DataType>> { ... }
fn documentation(&self) -> Option<&Documentation> { ... }
}Expand description
Trait for implementing user defined higher order functions.
This trait exposes the full API for implementing user defined functions and can be used to implement any function.
New higher order functions typically implement this trait and are then
wrapped in a HigherOrderUDF for registration with DataFusion.
See array_transform.rs for a commented complete implementation
Required Methods§
Sourcefn signature(&self) -> &HigherOrderSignature
fn signature(&self) -> &HigherOrderSignature
Returns a HigherOrderSignature describing the argument types for which this
function has an implementation, and the function’s Volatility.
See HigherOrderSignature for more details on argument type handling
and Self::return_field_from_args for computing the return type.
Sourcefn lambda_parameters(
&self,
step: usize,
fields: &[ValueOrLambda<FieldRef, Option<FieldRef>>],
) -> Result<LambdaParametersProgress>
fn lambda_parameters( &self, step: usize, fields: &[ValueOrLambda<FieldRef, Option<FieldRef>>], ) -> Result<LambdaParametersProgress>
Return the field of all the parameters supported by the lambdas in fields.
If a lambda support multiple parameters, all should be returned, regardless of
whether they are used or not on a particular invocation
Tip: If you have a HigherOrderFunction invocation, you can call the helper
HigherOrderFunction::lambda_parameters instead of this method directly
The name of the returned fields are ignored.
This function is repeatedelly called until LambdaParametersProgress::Complete is returned, with
step increased by one at each invocation, starting at 0.
For functions which all lambda parameters depend only on the field of it’s value arguments,
this can return LambdaParametersProgress::Complete at step 0. Taking as an example a strict
array_reduce with the signature (arr: [V], initial_value: I, (I, V) -> I, (I) -> O) -> O, which
requires it’s initial value to be the exact same type of it’s merge output, which is also the
parameter of it’s finish lambda, the expression
array_reduce([1.2, 2.1], 0.0, (acc, v) -> acc + v + 1.5, v -> v > 5.1)
would result in this function being called as the following:
let lambda_parameters = array_reduce.lambda_parameters(
0,
&[
// the Field of the literal `[1.2, 2.1]`, the array being reduced
ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
// the Field of the literal `0.0`, the initial value
ValueOrLambda::Value(Arc::new(Field::new("", DataType::Float32, true))),
// the Field of the output of the merge lambda, which is unknown at this point because it depends
// on the return of this call
ValueOrLambda::Lambda(None),
// the Field of the output of the finish lambda, unknown for the same reason as above
ValueOrLambda::Lambda(None),
])?;
assert_eq!(
lambda_parameters,
LambdaParametersProgress::Complete(vec![
// the finish lambda supported parameters, regardless of how many are actually used
vec![
// the accumulator which is the field of the initial value
Arc::new(Field::new("ignored_name", DataType::Float32, true)),
// the array values being reduced
Arc::new(Field::new("", DataType::Float32, true)),
],
// the merge lambda supported parameters
vec![
// the reduced value which is the field of the initial value
Arc::new(Field::new("ignored_name", DataType::Float32, true)),
],
])
);For functions which lambda parameters depends on the output of other lambdas, or on their own lambda,
this can return LambdaParametersProgress::Partial until all dependencies are met. Note that for
lambda with cyclic dependencies, you likely want to use HigherOrderUDFImpl::coerce_values_for_lambdas too.
Take as an example a flexible array_reduce with the signature (arr: [V], initial_value: I, (ACC, V) -> ACC, (ACC) -> O) -> O.
It has a cyclic dependency in the merge lambda, and a dependency of the finish lambda in the merge lambda,
and only requires the initial value to be coercible to the output of the merge lambda, which is defined by
it’s HigherOrderUDFImpl::coerce_values_for_lambdas implementation. The expression
array_reduce([1.2, 2.1], 0, (acc, v) -> acc + v + 1.5, v -> v > 5.1)
would result in this function being called as the following:
let lambda_parameters = array_reduce.lambda_parameters(
0,
&[
// the Field of the literal `[1.2, 2.1]`, the array being reduced
ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
// the Field of the literal `0`, the initial value
ValueOrLambda::Value(Arc::new(Field::new("", DataType::Int32, true))),
// the Field of the output of the merge lambda, which is unknown at this point because it depends on
// the return this call
ValueOrLambda::Lambda(None),
// the Field of the output of the finish lambda, unknown for the same reason as above
ValueOrLambda::Lambda(None),
])?;
assert_eq!(
lambda_parameters,
LambdaParametersProgress::Partial(vec![
// the finish lambda supported parameters, regardless of how many are actually used
Some(vec![
// at step 0, use the field of the initial value
Arc::new(Field::new("ignored_name", DataType::Int32, true)),
// the array values being reduced
Arc::new(Field::new("", DataType::Float32, true)),
]),
// the merge lambda supported parameters, unknown at this point due to dependency on the merge output
None,
])
);
let lambda_parameters = array_reduce.lambda_parameters(
1,
&[
// the Field of the literal `[1.2, 2.1]`, the array being reduced
ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
// the Field of the literal `0`, the initial value
ValueOrLambda::Value(Arc::new(Field::new("", DataType::Int32, true))),
// the Field of the output of the merge lambda, which could be inferred to be a Float32 based on the
// returned values of the previous step
ValueOrLambda::Value(Arc::new(Field::new("", DataType::Float32, true))),
// the Field of the output of the finish lambda, which is unknown at this point because it depends
// on the return of this call
ValueOrLambda::Lambda(None),
])?;
assert_eq!(
lambda_parameters,
LambdaParametersProgress::Complete(vec![
// the finish lambda supported parameters, regardless of how many are actually used
vec![
// the finish lambda own output now used as it's accumulator
Arc::new(Field::new("ignored_name", DataType::Float32, true)),
// the array values being reduced
Arc::new(Field::new("", DataType::Float32, true)),
],
// the merge lambda supported parameters, which is the output of the merge lambda,
vec![
// the output of the merge lambda
Arc::new(Field::new("", DataType::Float32, true)),
],
])
);
let coerce_to = array_reduce.coerce_values_for_lambdas(&[
// the literal `[1.2, 2.1]` data type, the array being reduced
ValueOrLambda::Value(DataType::new_list(DataType::Float32, true)),
// the literal `0` data type, the initial value
ValueOrLambda::Value(DataType::Int32),
// the output data type of the merge lambda
ValueOrLambda::Lambda(DataType::Float32),
// the output data type of the finish lambda
ValueOrLambda::Lambda(DataType::Boolean),
])?;
assert_eq!(
coerce_to,
Some(vec![
// return the same type for the array being reduced
DataType::new_list(DataType::Float32, true),
// coerce the initial value to the output of the merge lambda
DataType::Float32,
])
);
Note this may also be called at step 0 with all lambda outputs already set, and in that case, LambdaParametersProgress::Complete must be returned
The implementation can assume that some other part of the code has coerced
the actual argument types to match Self::signature, except the coercion defined by
Self::coerce_values_for_lambdas.
Sourcefn return_field_from_args(
&self,
args: HigherOrderReturnFieldArgs<'_>,
) -> Result<FieldRef>
fn return_field_from_args( &self, args: HigherOrderReturnFieldArgs<'_>, ) -> Result<FieldRef>
What type will be returned by this function, given the arguments?
The implementation can assume that some other part of the code has coerced
the actual argument types to match Self::signature, including the coercion
defined by Self::coerce_values_for_lambdas.
§Example creating Field
Note the name of the Field is ignored, except for structured types such as
DataType::Struct.
fn return_field_from_args(&self, args: HigherOrderReturnFieldArgs) -> Result<FieldRef> {
let field = Arc::new(Field::new("ignored_name", DataType::Int32, true));
Ok(field)
}Sourcefn invoke_with_args(
&self,
args: HigherOrderFunctionArgs,
) -> Result<ColumnarValue>
fn invoke_with_args( &self, args: HigherOrderFunctionArgs, ) -> Result<ColumnarValue>
Invoke the function returning the appropriate result.
§Performance
For the best performance, the implementations should handle the common case
when one or more of their arguments are constant values (aka
ColumnarValue::Scalar).
ColumnarValue::values_to_arrays can be used to convert the arguments
to arrays, which will likely be simpler code, but be slower.
Provided Methods§
Sourcefn aliases(&self) -> &[String]
fn aliases(&self) -> &[String]
Returns any aliases (alternate names) for this function.
Aliases can be used to invoke the same function using different names.
For example in some databases now() and current_timestamp() are
aliases for the same function. This behavior can be obtained by
returning current_timestamp as an alias for the now function.
Note: aliases should only include names other than Self::name.
Defaults to [] (no aliases)
Sourcefn schema_name(&self, args: &[Expr]) -> Result<String>
fn schema_name(&self, args: &[Expr]) -> Result<String>
Returns the name of the column this expression would create
See Expr::schema_name for details
Sourcefn coerce_values_for_lambdas(
&self,
_fields: &[ValueOrLambda<DataType, DataType>],
) -> Result<Option<Vec<DataType>>>
fn coerce_values_for_lambdas( &self, _fields: &[ValueOrLambda<DataType, DataType>], ) -> Result<Option<Vec<DataType>>>
Coerce value arguments of a function call to types that the function can evaluate also taking into account the output type of it’s lambdas. This differs from HigherOrderUDFImpl::coerce_value_types that only has access to the type of it’s value arguments because it’s called before the output type of lambdas are known.
See the type coercion module documentation for more details on type coercion
§Parameters
fields: The argument types of the value arguments of this function, or the output type of lambdas
§Return value
If Some, contains a Vec with the same number of ValueOrLambda::Value in fields.
DataFusion will CAST the function call arguments to these specific types. If None, no
coercion will be applied beyond the one defined by the function signature.
For example, a flexible array_reduce implementation (see Self::lambda_parameters docs), when working
with the expression below, may want to coerce it’s initial value argument, the integer 0,
to match the output of it’s merge function, which is a float:
array_reduce([1.2, 2.1], 0, (acc, v) -> acc + v + 1.5, v -> v > 2.0)
Sourcefn clear_null_values(&self) -> bool
fn clear_null_values(&self) -> bool
Whether List or LargeList arguments should have it’s non-empty null sublists cleaned with remove_list_null_values before invoking this function
The default implementation always returns true and should only be implemented if you want to handle non-empty null sublists yourself
Sourcefn short_circuits(&self) -> bool
fn short_circuits(&self) -> bool
Returns true if some of this exprs subexpressions may not be evaluated
and thus any side effects (like divide by zero) may not be encountered.
Setting this to true prevents certain optimizations such as common subexpression elimination
When overriding this function to return true, HigherOrderUDFImpl::conditional_arguments can also be
overridden to report more accurately which arguments are eagerly evaluated and which ones
lazily.
Sourcefn conditional_arguments<'a>(
&self,
args: &'a [Expr],
) -> Option<(Vec<&'a Expr>, Vec<&'a Expr>)>
fn conditional_arguments<'a>( &self, args: &'a [Expr], ) -> Option<(Vec<&'a Expr>, Vec<&'a Expr>)>
Determines which of the arguments passed to this higher-order function are evaluated eagerly and which may be evaluated lazily. Note that this does not applies to the arguments that lambda functions pass to it’s body expression
If this function returns None, all arguments are eagerly evaluated.
Returning None is a micro optimization that saves a needless Vec
allocation.
If the function returns Some, returns (eager, lazy) where eager
are the arguments that are always evaluated, and lazy are the
arguments that may be evaluated lazily (i.e. may not be evaluated at all
in some cases).
Implementations must ensure that the two returned Vecs are disjunct,
and that each argument from args is present in one the two Vecs.
When overriding this function, HigherOrderUDFImpl::short_circuits must
be overridden to return true.
Sourcefn coerce_value_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>
fn coerce_value_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>
Coerce value arguments of a function call to types that the function can evaluate. Note that if you need to coerce values based on the output type of lambdas, you must use HigherOrderUDFImpl::coerce_values_for_lambdas, as this function is used before the output type of lambdas are known
See the type coercion module documentation for more details on type coercion
For example, if your function requires a contiguous list argument, but the user calls
it like my_func(c, v -> v+2) (i.e. with c as a ListView), coerce_types can return [DataType::List(..)]
to ensure the argument is converted to a List
§Parameters
arg_types: The argument types of the value arguments of this function, excluding lambdas
§Return value
A Vec the same length as arg_types. DataFusion will CAST the function call
arguments to these specific types.
Sourcefn documentation(&self) -> Option<&Documentation>
fn documentation(&self) -> Option<&Documentation>
Returns the documentation for this function.
Documentation can be accessed programmatically as well as generating publicly facing documentation.
Trait Implementations§
impl Eq for dyn HigherOrderUDFImpl
Source§impl Hash for dyn HigherOrderUDFImpl
impl Hash for dyn HigherOrderUDFImpl
Source§impl PartialEq for dyn HigherOrderUDFImpl
impl PartialEq for dyn HigherOrderUDFImpl
Source§impl PartialOrd for dyn HigherOrderUDFImpl
impl PartialOrd for dyn HigherOrderUDFImpl
Dyn Compatibility§
This trait is dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".