Trait datafusion::logical_expr::AggregateUDFImpl
source · pub trait AggregateUDFImpl: Debug + Send + Sync {
// Required methods
fn as_any(&self) -> &(dyn Any + 'static);
fn name(&self) -> &str;
fn signature(&self) -> &Signature;
fn return_type(
&self,
arg_types: &[DataType]
) -> Result<DataType, DataFusionError>;
fn accumulator(
&self,
acc_args: AccumulatorArgs<'_>
) -> Result<Box<dyn Accumulator>, DataFusionError>;
// Provided methods
fn state_fields(
&self,
name: &str,
value_type: DataType,
ordering_fields: Vec<Field>
) -> Result<Vec<Field>, DataFusionError> { ... }
fn groups_accumulator_supported(&self) -> bool { ... }
fn create_groups_accumulator(
&self
) -> Result<Box<dyn GroupsAccumulator>, DataFusionError> { ... }
fn aliases(&self) -> &[String] { ... }
}
Expand description
Trait for implementing AggregateUDF
.
This trait exposes the full API for implementing user defined aggregate functions and can be used to implement any function.
See advanced_udaf.rs
for a full example with complete implementation and
AggregateUDF
for other available options.
§Basic Example
#[derive(Debug, Clone)]
struct GeoMeanUdf {
signature: Signature
};
impl GeoMeanUdf {
fn new() -> Self {
Self {
signature: Signature::uniform(1, vec![DataType::Float64], Volatility::Immutable)
}
}
}
/// Implement the AggregateUDFImpl trait for GeoMeanUdf
impl AggregateUDFImpl for GeoMeanUdf {
fn as_any(&self) -> &dyn Any { self }
fn name(&self) -> &str { "geo_mean" }
fn signature(&self) -> &Signature { &self.signature }
fn return_type(&self, args: &[DataType]) -> Result<DataType> {
if !matches!(args.get(0), Some(&DataType::Float64)) {
return plan_err!("add_one only accepts Float64 arguments");
}
Ok(DataType::Float64)
}
// This is the accumulator factory; DataFusion uses it to create new accumulators.
fn accumulator(&self, _acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> { unimplemented!() }
fn state_fields(&self, _name: &str, value_type: DataType, _ordering_fields: Vec<Field>) -> Result<Vec<Field>> {
Ok(vec![
Field::new("value", value_type, true),
Field::new("ordering", DataType::UInt32, true)
])
}
}
// Create a new AggregateUDF from the implementation
let geometric_mean = AggregateUDF::from(GeoMeanUdf::new());
// Call the function `geo_mean(col)`
let expr = geometric_mean.call(vec![col("a")]);
Required Methods§
sourcefn signature(&self) -> &Signature
fn signature(&self) -> &Signature
Returns the function’s Signature
for information about what input
types are accepted and the function’s Volatility.
sourcefn return_type(
&self,
arg_types: &[DataType]
) -> Result<DataType, DataFusionError>
fn return_type( &self, arg_types: &[DataType] ) -> Result<DataType, DataFusionError>
What DataType
will be returned by this function, given the types of
the arguments
sourcefn accumulator(
&self,
acc_args: AccumulatorArgs<'_>
) -> Result<Box<dyn Accumulator>, DataFusionError>
fn accumulator( &self, acc_args: AccumulatorArgs<'_> ) -> Result<Box<dyn Accumulator>, DataFusionError>
Return a new Accumulator
that aggregates values for a specific
group during query execution.
acc_args: AccumulatorArgs
contains information about how the
aggregate function was called.
Provided Methods§
sourcefn state_fields(
&self,
name: &str,
value_type: DataType,
ordering_fields: Vec<Field>
) -> Result<Vec<Field>, DataFusionError>
fn state_fields( &self, name: &str, value_type: DataType, ordering_fields: Vec<Field> ) -> Result<Vec<Field>, DataFusionError>
Return the fields used to store the intermediate state of this accumulator.
§Arguments:
name
: the name of the expression (e.g. AVG, SUM, etc)value_type
: Aggregate’s aggregate’s output (returned bySelf::return_type
)ordering_fields
: the fields used to order the input arguments, if any. Empty if no ordering expression is provided.
§Notes:
The default implementation returns a single state field named name
with the same type as value_type
. This is suitable for aggregates such
as SUM
or MIN
where partial state can be combined by applying the
same aggregate.
For aggregates such as AVG
where the partial state is more complex
(e.g. a COUNT and a SUM), this method is used to define the additional
fields.
The name of the fields must be unique within the query and thus should
be derived from name
. See format_state_name
for a utility function
to generate a unique name.
sourcefn groups_accumulator_supported(&self) -> bool
fn groups_accumulator_supported(&self) -> bool
If the aggregate expression has a specialized
GroupsAccumulator
implementation. If this returns true,
[Self::create_groups_accumulator]
will be called.
§Notes
Even if this function returns true, DataFusion will still use
Self::accumulator
for certain queries, such as when this aggregate is
used as a window function or when there no GROUP BY columns in the
query.
sourcefn create_groups_accumulator(
&self
) -> Result<Box<dyn GroupsAccumulator>, DataFusionError>
fn create_groups_accumulator( &self ) -> Result<Box<dyn GroupsAccumulator>, DataFusionError>
Return a specialized GroupsAccumulator
that manages state
for all groups.
For maximum performance, a GroupsAccumulator
should be
implemented in addition to Accumulator
.
sourcefn aliases(&self) -> &[String]
fn aliases(&self) -> &[String]
Returns any aliases (alternate names) for this function.
Note: aliases
should only include names other than Self::name
.
Defaults to []
(no aliases)