Skip to main content

HigherOrderUDFImpl

Trait HigherOrderUDFImpl 

Source
pub trait HigherOrderUDFImpl:
    Debug
    + DynEq
    + DynHash
    + Send
    + Sync
    + Any {
Show 13 methods // Required methods fn name(&self) -> &str; fn signature(&self) -> &HigherOrderSignature; fn lambda_parameters( &self, step: usize, fields: &[ValueOrLambda<FieldRef, Option<FieldRef>>], ) -> Result<LambdaParametersProgress>; fn return_field_from_args( &self, args: HigherOrderReturnFieldArgs<'_>, ) -> Result<FieldRef>; fn invoke_with_args( &self, args: HigherOrderFunctionArgs, ) -> Result<ColumnarValue>; // Provided methods fn aliases(&self) -> &[String] { ... } fn schema_name(&self, args: &[Expr]) -> Result<String> { ... } fn coerce_values_for_lambdas( &self, _fields: &[ValueOrLambda<DataType, DataType>], ) -> Result<Option<Vec<DataType>>> { ... } fn clear_null_values(&self) -> bool { ... } fn short_circuits(&self) -> bool { ... } fn conditional_arguments<'a>( &self, args: &'a [Expr], ) -> Option<(Vec<&'a Expr>, Vec<&'a Expr>)> { ... } fn coerce_value_types( &self, _arg_types: &[DataType], ) -> Result<Vec<DataType>> { ... } fn documentation(&self) -> Option<&Documentation> { ... }
}
Expand description

Trait for implementing user defined higher order functions.

This trait exposes the full API for implementing user defined functions and can be used to implement any function.

New higher order functions typically implement this trait and are then wrapped in a HigherOrderUDF for registration with DataFusion.

See array_transform.rs for a commented complete implementation

Required Methods§

Source

fn name(&self) -> &str

Returns this function’s name

Source

fn signature(&self) -> &HigherOrderSignature

Returns a HigherOrderSignature describing the argument types for which this function has an implementation, and the function’s Volatility.

See HigherOrderSignature for more details on argument type handling and Self::return_field_from_args for computing the return type.

Source

fn lambda_parameters( &self, step: usize, fields: &[ValueOrLambda<FieldRef, Option<FieldRef>>], ) -> Result<LambdaParametersProgress>

Return the field of all the parameters supported by the lambdas in fields. If a lambda support multiple parameters, all should be returned, regardless of whether they are used or not on a particular invocation

Tip: If you have a HigherOrderFunction invocation, you can call the helper HigherOrderFunction::lambda_parameters instead of this method directly

The name of the returned fields are ignored.

This function is repeatedelly called until LambdaParametersProgress::Complete is returned, with step increased by one at each invocation, starting at 0.

For functions which all lambda parameters depend only on the field of it’s value arguments, this can return LambdaParametersProgress::Complete at step 0. Taking as an example a strict array_reduce with the signature (arr: [V], initial_value: I, (I, V) -> I, (I) -> O) -> O, which requires it’s initial value to be the exact same type of it’s merge output, which is also the parameter of it’s finish lambda, the expression

array_reduce([1.2, 2.1], 0.0, (acc, v) -> acc + v + 1.5, v -> v > 5.1)

would result in this function being called as the following:

let lambda_parameters = array_reduce.lambda_parameters(
    0,
    &[
        // the Field of the literal `[1.2, 2.1]`, the array being reduced
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
        // the Field of the literal `0.0`, the initial value
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::Float32, true))),
        // the Field of the output of the merge lambda, which is unknown at this point because it depends
        // on the return of this call
        ValueOrLambda::Lambda(None),
        // the Field of the output of the finish lambda, unknown for the same reason as above
        ValueOrLambda::Lambda(None),
])?;

assert_eq!(
     lambda_parameters,
     LambdaParametersProgress::Complete(vec![
        // the finish lambda supported parameters, regardless of how many are actually used
        vec![
            // the accumulator which is the field of the initial value
            Arc::new(Field::new("ignored_name", DataType::Float32, true)),
            // the array values being reduced
            Arc::new(Field::new("", DataType::Float32, true)),
        ],
        // the merge lambda supported parameters
        vec![
            // the reduced value which is the field of the initial value
            Arc::new(Field::new("ignored_name", DataType::Float32, true)),
        ],
     ])
);

For functions which lambda parameters depends on the output of other lambdas, or on their own lambda, this can return LambdaParametersProgress::Partial until all dependencies are met. Note that for lambda with cyclic dependencies, you likely want to use HigherOrderUDFImpl::coerce_values_for_lambdas too. Take as an example a flexible array_reduce with the signature (arr: [V], initial_value: I, (ACC, V) -> ACC, (ACC) -> O) -> O. It has a cyclic dependency in the merge lambda, and a dependency of the finish lambda in the merge lambda, and only requires the initial value to be coercible to the output of the merge lambda, which is defined by it’s HigherOrderUDFImpl::coerce_values_for_lambdas implementation. The expression

array_reduce([1.2, 2.1], 0, (acc, v) -> acc + v + 1.5, v -> v > 5.1)

would result in this function being called as the following:

let lambda_parameters = array_reduce.lambda_parameters(
    0,
    &[
        // the Field of the literal `[1.2, 2.1]`, the array being reduced
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
        // the Field of the literal `0`, the initial value
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::Int32, true))),
        // the Field of the output of the merge lambda, which is unknown at this point because it depends on
        // the return this call
        ValueOrLambda::Lambda(None),
        // the Field of the output of the finish lambda, unknown for the same reason as above
        ValueOrLambda::Lambda(None),
])?;

assert_eq!(
     lambda_parameters,
     LambdaParametersProgress::Partial(vec![
        // the finish lambda supported parameters, regardless of how many are actually used
        Some(vec![
            // at step 0, use the field of the initial value
            Arc::new(Field::new("ignored_name", DataType::Int32, true)),
            // the array values being reduced
            Arc::new(Field::new("", DataType::Float32, true)),
        ]),
        // the merge lambda supported parameters, unknown at this point due to dependency on the merge output
        None,
     ])
);

let lambda_parameters = array_reduce.lambda_parameters(
    1,
    &[
        // the Field of the literal `[1.2, 2.1]`, the array being reduced
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::new_list(DataType::Float32, true), true))),
        // the Field of the literal `0`, the initial value
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::Int32, true))),
        // the Field of the output of the merge lambda, which could be inferred to be a Float32 based on the
        // returned values of the previous step
        ValueOrLambda::Value(Arc::new(Field::new("", DataType::Float32, true))),
        // the Field of the output of the finish lambda, which is unknown at this point because it depends
        // on the return of this call
        ValueOrLambda::Lambda(None),
])?;

assert_eq!(
     lambda_parameters,
     LambdaParametersProgress::Complete(vec![
        // the finish lambda supported parameters, regardless of how many are actually used
        vec![
            // the finish lambda own output now used as it's accumulator
            Arc::new(Field::new("ignored_name", DataType::Float32, true)),
            // the array values being reduced
            Arc::new(Field::new("", DataType::Float32, true)),
        ],
        // the merge lambda supported parameters, which is the output of the merge lambda,
        vec![
            // the output of the merge lambda
            Arc::new(Field::new("", DataType::Float32, true)),
        ],
     ])
);

let coerce_to = array_reduce.coerce_values_for_lambdas(&[
    // the literal `[1.2, 2.1]` data type, the array being reduced
    ValueOrLambda::Value(DataType::new_list(DataType::Float32, true)),
    // the literal `0` data type, the initial value
    ValueOrLambda::Value(DataType::Int32),
    // the output data type of the merge lambda
    ValueOrLambda::Lambda(DataType::Float32),
    // the output data type of the finish lambda
    ValueOrLambda::Lambda(DataType::Boolean),
])?;

assert_eq!(
    coerce_to,
    Some(vec![
        // return the same type for the array being reduced
        DataType::new_list(DataType::Float32, true),
        // coerce the initial value to the output of the merge lambda
        DataType::Float32,
    ])
);

Note this may also be called at step 0 with all lambda outputs already set, and in that case, LambdaParametersProgress::Complete must be returned

The implementation can assume that some other part of the code has coerced the actual argument types to match Self::signature, except the coercion defined by Self::coerce_values_for_lambdas.

Source

fn return_field_from_args( &self, args: HigherOrderReturnFieldArgs<'_>, ) -> Result<FieldRef>

What type will be returned by this function, given the arguments?

The implementation can assume that some other part of the code has coerced the actual argument types to match Self::signature, including the coercion defined by Self::coerce_values_for_lambdas.

§Example creating Field

Note the name of the Field is ignored, except for structured types such as DataType::Struct.

fn return_field_from_args(&self, args: HigherOrderReturnFieldArgs) -> Result<FieldRef> {
    let field = Arc::new(Field::new("ignored_name", DataType::Int32, true));
    Ok(field)
}
Source

fn invoke_with_args( &self, args: HigherOrderFunctionArgs, ) -> Result<ColumnarValue>

Invoke the function returning the appropriate result.

§Performance

For the best performance, the implementations should handle the common case when one or more of their arguments are constant values (aka ColumnarValue::Scalar).

ColumnarValue::values_to_arrays can be used to convert the arguments to arrays, which will likely be simpler code, but be slower.

Provided Methods§

Source

fn aliases(&self) -> &[String]

Returns any aliases (alternate names) for this function.

Aliases can be used to invoke the same function using different names. For example in some databases now() and current_timestamp() are aliases for the same function. This behavior can be obtained by returning current_timestamp as an alias for the now function.

Note: aliases should only include names other than Self::name. Defaults to [] (no aliases)

Source

fn schema_name(&self, args: &[Expr]) -> Result<String>

Returns the name of the column this expression would create

See Expr::schema_name for details

Source

fn coerce_values_for_lambdas( &self, _fields: &[ValueOrLambda<DataType, DataType>], ) -> Result<Option<Vec<DataType>>>

Coerce value arguments of a function call to types that the function can evaluate also taking into account the output type of it’s lambdas. This differs from HigherOrderUDFImpl::coerce_value_types that only has access to the type of it’s value arguments because it’s called before the output type of lambdas are known.

See the type coercion module documentation for more details on type coercion

§Parameters
  • fields: The argument types of the value arguments of this function, or the output type of lambdas
§Return value

If Some, contains a Vec with the same number of ValueOrLambda::Value in fields. DataFusion will CAST the function call arguments to these specific types. If None, no coercion will be applied beyond the one defined by the function signature.

For example, a flexible array_reduce implementation (see Self::lambda_parameters docs), when working with the expression below, may want to coerce it’s initial value argument, the integer 0, to match the output of it’s merge function, which is a float:

array_reduce([1.2, 2.1], 0, (acc, v) -> acc + v + 1.5, v -> v > 2.0)

Source

fn clear_null_values(&self) -> bool

Whether List or LargeList arguments should have it’s non-empty null sublists cleaned with remove_list_null_values before invoking this function

The default implementation always returns true and should only be implemented if you want to handle non-empty null sublists yourself

Source

fn short_circuits(&self) -> bool

Returns true if some of this exprs subexpressions may not be evaluated and thus any side effects (like divide by zero) may not be encountered.

Setting this to true prevents certain optimizations such as common subexpression elimination

When overriding this function to return true, HigherOrderUDFImpl::conditional_arguments can also be overridden to report more accurately which arguments are eagerly evaluated and which ones lazily.

Source

fn conditional_arguments<'a>( &self, args: &'a [Expr], ) -> Option<(Vec<&'a Expr>, Vec<&'a Expr>)>

Determines which of the arguments passed to this higher-order function are evaluated eagerly and which may be evaluated lazily. Note that this does not applies to the arguments that lambda functions pass to it’s body expression

If this function returns None, all arguments are eagerly evaluated. Returning None is a micro optimization that saves a needless Vec allocation.

If the function returns Some, returns (eager, lazy) where eager are the arguments that are always evaluated, and lazy are the arguments that may be evaluated lazily (i.e. may not be evaluated at all in some cases).

Implementations must ensure that the two returned Vecs are disjunct, and that each argument from args is present in one the two Vecs.

When overriding this function, HigherOrderUDFImpl::short_circuits must be overridden to return true.

Source

fn coerce_value_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>>

Coerce value arguments of a function call to types that the function can evaluate. Note that if you need to coerce values based on the output type of lambdas, you must use HigherOrderUDFImpl::coerce_values_for_lambdas, as this function is used before the output type of lambdas are known

See the type coercion module documentation for more details on type coercion

For example, if your function requires a contiguous list argument, but the user calls it like my_func(c, v -> v+2) (i.e. with c as a ListView), coerce_types can return [DataType::List(..)] to ensure the argument is converted to a List

§Parameters
  • arg_types: The argument types of the value arguments of this function, excluding lambdas
§Return value

A Vec the same length as arg_types. DataFusion will CAST the function call arguments to these specific types.

Source

fn documentation(&self) -> Option<&Documentation>

Returns the documentation for this function.

Documentation can be accessed programmatically as well as generating publicly facing documentation.

Trait Implementations§

Source§

impl Eq for dyn HigherOrderUDFImpl

Source§

impl Hash for dyn HigherOrderUDFImpl

Source§

fn hash<H: Hasher>(&self, state: &mut H)

Feeds this value into the given Hasher. Read more
Source§

impl PartialEq for dyn HigherOrderUDFImpl

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl PartialOrd for dyn HigherOrderUDFImpl

Source§

fn partial_cmp(&self, other: &Self) -> Option<Ordering>

This method returns an ordering between self and other values if one exists. Read more
1.0.0 (const: unstable) · Source§

fn lt(&self, other: &Rhs) -> bool

Tests less than (for self and other) and is used by the < operator. Read more
1.0.0 (const: unstable) · Source§

fn le(&self, other: &Rhs) -> bool

Tests less than or equal to (for self and other) and is used by the <= operator. Read more
1.0.0 (const: unstable) · Source§

fn gt(&self, other: &Rhs) -> bool

Tests greater than (for self and other) and is used by the > operator. Read more
1.0.0 (const: unstable) · Source§

fn ge(&self, other: &Rhs) -> bool

Tests greater than or equal to (for self and other) and is used by the >= operator. Read more

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementors§