Trait datafusion::physical_plan::Accumulator
source · pub trait Accumulator: Send + Sync + Debug {
// Required methods
fn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>;
fn update_batch(
&mut self,
values: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>;
fn merge_batch(
&mut self,
states: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>;
fn evaluate(&self) -> Result<ScalarValue, DataFusionError>;
fn size(&self) -> usize;
// Provided method
fn retract_batch(
&mut self,
_values: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError> { ... }
}
Expand description
An accumulator represents a stateful object that lives throughout the evaluation of multiple rows and generically accumulates values.
An accumulator knows how to:
- update its state from inputs via
update_batch
- retract an update to its state from given inputs via
retract_batch
- convert its internal state to a vector of aggregate values
- update its state from multiple accumulators’ states via
merge_batch
- compute the final value from its internal state via
evaluate
Required Methods§
sourcefn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>
fn state(&self) -> Result<Vec<ScalarValue, Global>, DataFusionError>
Returns the partial intermediate state of the accumulator. This
partial state is serialied as Arrays
and then combined with
other partial states from different instances of this
accumulator (that ran on different partitions, for
example).
The state can be and often is a different type than the output
type of the Accumulator
.
See Self::merge_batch
for more details on the merging process.
Some accumulators can return multiple values for their
intermediate states. For example average, tracks sum
and
n
, and this function should return
a vector of two values, sum and n.
ScalarValue::List
can also be used to pass multiple values
if the number of intermediate values is not known at planning
time (e.g. median)
sourcefn update_batch(
&mut self,
values: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>
fn update_batch( &mut self, values: &[Arc<dyn Array + 'static>] ) -> Result<(), DataFusionError>
Updates the accumulator’s state from a vector of arrays.
sourcefn merge_batch(
&mut self,
states: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>
fn merge_batch( &mut self, states: &[Arc<dyn Array + 'static>] ) -> Result<(), DataFusionError>
Updates the accumulator’s state from an Array
containing one
or more intermediate values.
The states
array passed was formed by concatenating the
results of calling [state]
on zero or more other accumulator
instances.
states
is an array of the same types as returned by Self::state
sourcefn evaluate(&self) -> Result<ScalarValue, DataFusionError>
fn evaluate(&self) -> Result<ScalarValue, DataFusionError>
Returns the final aggregate value based on its current state.
Provided Methods§
sourcefn retract_batch(
&mut self,
_values: &[Arc<dyn Array + 'static>]
) -> Result<(), DataFusionError>
fn retract_batch( &mut self, _values: &[Arc<dyn Array + 'static>] ) -> Result<(), DataFusionError>
Retracts an update (caused by the given inputs) to accumulator’s state.
This is the inverse operation of Self::update_batch
and is used
to incrementally calculate window aggregates where the OVER
clause defines a bounded window.