Trait datafusion::logical_expr::AggregateUDFImpl

source ·
pub trait AggregateUDFImpl: Debug + Send + Sync {
    // Required methods
    fn as_any(&self) -> &(dyn Any + 'static);
    fn name(&self) -> &str;
    fn signature(&self) -> &Signature;
    fn return_type(
        &self,
        arg_types: &[DataType]
    ) -> Result<DataType, DataFusionError>;
    fn accumulator(
        &self,
        acc_args: AccumulatorArgs<'_>
    ) -> Result<Box<dyn Accumulator>, DataFusionError>;

    // Provided methods
    fn state_fields(
        &self,
        name: &str,
        value_type: DataType,
        ordering_fields: Vec<Field>
    ) -> Result<Vec<Field>, DataFusionError> { ... }
    fn groups_accumulator_supported(&self) -> bool { ... }
    fn create_groups_accumulator(
        &self
    ) -> Result<Box<dyn GroupsAccumulator>, DataFusionError> { ... }
    fn aliases(&self) -> &[String] { ... }
}
Expand description

Trait for implementing AggregateUDF.

This trait exposes the full API for implementing user defined aggregate functions and can be used to implement any function.

See advanced_udaf.rs for a full example with complete implementation and AggregateUDF for other available options.

§Basic Example

#[derive(Debug, Clone)]
struct GeoMeanUdf {
  signature: Signature
};

impl GeoMeanUdf {
  fn new() -> Self {
    Self {
      signature: Signature::uniform(1, vec![DataType::Float64], Volatility::Immutable)
     }
  }
}

/// Implement the AggregateUDFImpl trait for GeoMeanUdf
impl AggregateUDFImpl for GeoMeanUdf {
   fn as_any(&self) -> &dyn Any { self }
   fn name(&self) -> &str { "geo_mean" }
   fn signature(&self) -> &Signature { &self.signature }
   fn return_type(&self, args: &[DataType]) -> Result<DataType> {
     if !matches!(args.get(0), Some(&DataType::Float64)) {
       return plan_err!("add_one only accepts Float64 arguments");
     }
     Ok(DataType::Float64)
   }
   // This is the accumulator factory; DataFusion uses it to create new accumulators.
   fn accumulator(&self, _acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> { unimplemented!() }
   fn state_fields(&self, _name: &str, value_type: DataType, _ordering_fields: Vec<Field>) -> Result<Vec<Field>> {
       Ok(vec![
            Field::new("value", value_type, true),
            Field::new("ordering", DataType::UInt32, true)
       ])
   }
}

// Create a new AggregateUDF from the implementation
let geometric_mean = AggregateUDF::from(GeoMeanUdf::new());

// Call the function `geo_mean(col)`
let expr = geometric_mean.call(vec![col("a")]);

Required Methods§

source

fn as_any(&self) -> &(dyn Any + 'static)

Returns this object as an Any trait object

source

fn name(&self) -> &str

Returns this function’s name

source

fn signature(&self) -> &Signature

Returns the function’s Signature for information about what input types are accepted and the function’s Volatility.

source

fn return_type( &self, arg_types: &[DataType] ) -> Result<DataType, DataFusionError>

What DataType will be returned by this function, given the types of the arguments

source

fn accumulator( &self, acc_args: AccumulatorArgs<'_> ) -> Result<Box<dyn Accumulator>, DataFusionError>

Return a new Accumulator that aggregates values for a specific group during query execution.

acc_args: AccumulatorArgs contains information about how the aggregate function was called.

Provided Methods§

source

fn state_fields( &self, name: &str, value_type: DataType, ordering_fields: Vec<Field> ) -> Result<Vec<Field>, DataFusionError>

Return the fields used to store the intermediate state of this accumulator.

§Arguments:
  1. name: the name of the expression (e.g. AVG, SUM, etc)
  2. value_type: Aggregate’s aggregate’s output (returned by Self::return_type)
  3. ordering_fields: the fields used to order the input arguments, if any. Empty if no ordering expression is provided.
§Notes:

The default implementation returns a single state field named name with the same type as value_type. This is suitable for aggregates such as SUM or MIN where partial state can be combined by applying the same aggregate.

For aggregates such as AVG where the partial state is more complex (e.g. a COUNT and a SUM), this method is used to define the additional fields.

The name of the fields must be unique within the query and thus should be derived from name. See format_state_name for a utility function to generate a unique name.

source

fn groups_accumulator_supported(&self) -> bool

If the aggregate expression has a specialized GroupsAccumulator implementation. If this returns true, [Self::create_groups_accumulator] will be called.

§Notes

Even if this function returns true, DataFusion will still use Self::accumulator for certain queries, such as when this aggregate is used as a window function or when there no GROUP BY columns in the query.

source

fn create_groups_accumulator( &self ) -> Result<Box<dyn GroupsAccumulator>, DataFusionError>

Return a specialized GroupsAccumulator that manages state for all groups.

For maximum performance, a GroupsAccumulator should be implemented in addition to Accumulator.

source

fn aliases(&self) -> &[String]

Returns any aliases (alternate names) for this function.

Note: aliases should only include names other than Self::name. Defaults to [] (no aliases)

Implementors§