pub trait Accumulator: Send + Sync + Debug {
    // Required methods
    fn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>;
    fn update_batch(
        &mut self,
        values: &[Arc<dyn Array + 'static>]
    ) -> Result<(), DataFusionError>;
    fn merge_batch(
        &mut self,
        states: &[Arc<dyn Array + 'static>]
    ) -> Result<(), DataFusionError>;
    fn evaluate(&self) -> Result<ScalarValue, DataFusionError>;
    fn size(&self) -> usize;

    // Provided method
    fn retract_batch(
        &mut self,
        _values: &[Arc<dyn Array + 'static>]
    ) -> Result<(), DataFusionError> { ... }
}
Expand description

An accumulator represents a stateful object that lives throughout the evaluation of multiple rows and generically accumulates values.

An accumulator knows how to:

  • update its state from inputs via update_batch
  • retract an update to its state from given inputs via retract_batch
  • convert its internal state to a vector of aggregate values
  • update its state from multiple accumulators’ states via merge_batch
  • compute the final value from its internal state via evaluate

Required Methods§

source

fn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>

Returns the partial intermediate state of the accumulator. This partial state is serialied as Arrays and then combined with other partial states from different instances of this accumulator (that ran on different partitions, for example).

The state can be and often is a different type than the output type of the Accumulator.

See Self::merge_batch for more details on the merging process.

Some accumulators can return multiple values for their intermediate states. For example average, tracks sum and n, and this function should return a vector of two values, sum and n.

ScalarValue::List can also be used to pass multiple values if the number of intermediate values is not known at planning time (e.g. median)

source

fn update_batch( &mut self, values: &[Arc<dyn Array + 'static>] ) -> Result<(), DataFusionError>

Updates the accumulator’s state from a vector of arrays.

source

fn merge_batch( &mut self, states: &[Arc<dyn Array + 'static>] ) -> Result<(), DataFusionError>

Updates the accumulator’s state from an Array containing one or more intermediate values.

The states array passed was formed by concatenating the results of calling [state] on zero or more other accumulator instances.

states is an array of the same types as returned by Self::state

source

fn evaluate(&self) -> Result<ScalarValue, DataFusionError>

Returns the final aggregate value based on its current state.

source

fn size(&self) -> usize

Allocated size required for this accumulator, in bytes, including Self. Allocated means that for internal containers such as Vec, the capacity should be used not the len

Provided Methods§

source

fn retract_batch( &mut self, _values: &[Arc<dyn Array + 'static>] ) -> Result<(), DataFusionError>

Retracts an update (caused by the given inputs) to accumulator’s state.

This is the inverse operation of Self::update_batch and is used to incrementally calculate window aggregates where the OVER clause defines a bounded window.

Implementors§