pub struct RepartitionExec { /* private fields */ }
Expand description

Maps N input partitions to M output partitions based on a Partitioning scheme.

§Background

DataFusion, like most other commercial systems, with the notable exception of DuckDB, uses the “Exchange Operator” based approach to parallelism which works well in practice given sufficient care in implementation.

DataFusion’s planner picks the target number of partitions and then RepartionExec redistributes RecordBatches to that number of output partitions.

For example, given target_partitions=3 (trying to use 3 cores) but scanning an input with 2 partitions, RepartitionExec can be used to get 3 even streams of RecordBatches

        ▲                  ▲                  ▲
        │                  │                  │
        │                  │                  │
        │                  │                  │
┌───────────────┐  ┌───────────────┐  ┌───────────────┐
│    GroupBy    │  │    GroupBy    │  │    GroupBy    │
│   (Partial)   │  │   (Partial)   │  │   (Partial)   │
└───────────────┘  └───────────────┘  └───────────────┘
        ▲                  ▲                  ▲
        └──────────────────┼──────────────────┘
                           │
              ┌─────────────────────────┐
              │     RepartitionExec     │
              │   (hash/round robin)    │
              └─────────────────────────┘
                         ▲   ▲
             ┌───────────┘   └───────────┐
             │                           │
             │                           │
        .─────────.                 .─────────.
     ,─'           '─.           ,─'           '─.
    ;      Input      :         ;      Input      :
    :   Partition 0   ;         :   Partition 1   ;
     ╲               ╱           ╲               ╱
      '─.         ,─'             '─.         ,─'
         `───────'                   `───────'

§Output Ordering

If more than one stream is being repartitioned, the output will be some arbitrary interleaving (and thus unordered) unless Self::with_preserve_order specifies otherwise.

§Footnote

The “Exchange Operator” was first described in the 1989 paper Encapsulation of parallelism in the Volcano query processing system Paper which uses the term “Exchange” for the concept of repartitioning data across threads.

Implementations§

source§

impl RepartitionExec

source

pub fn input(&self) -> &Arc<dyn ExecutionPlan>

Input execution plan

source

pub fn partitioning(&self) -> &Partitioning

Partitioning scheme to use

source

pub fn preserve_order(&self) -> bool

Get preserve_order flag of the RepartitionExecutor true means SortPreservingRepartitionExec, false means RepartitionExec

source

pub fn name(&self) -> &str

Get name used to display this Exec

source§

impl RepartitionExec

source

pub fn try_new( input: Arc<dyn ExecutionPlan>, partitioning: Partitioning ) -> Result<RepartitionExec, DataFusionError>

Create a new RepartitionExec, that produces output partitioning, and does not preserve the order of the input (see Self::with_preserve_order for more details)

source

pub fn with_preserve_order(self) -> RepartitionExec

Specify if this reparititoning operation should preserve the order of rows from its input when producing output. Preserving order is more expensive at runtime, so should only be set if the output of this operator can take advantage of it.

If the input is not ordered, or has only one partition, this is a no op, and the node remains a RepartitionExec.

Trait Implementations§

source§

impl Debug for RepartitionExec

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
source§

impl DisplayAs for RepartitionExec

source§

fn fmt_as( &self, t: DisplayFormatType, f: &mut Formatter<'_> ) -> Result<(), Error>

Format according to DisplayFormatType, used when verbose representation looks different from the default one Read more
source§

impl ExecutionPlan for RepartitionExec

source§

fn as_any(&self) -> &(dyn Any + 'static)

Return a reference to Any that can be used for downcasting

source§

fn name(&self) -> &'static str

Short name for the ExecutionPlan, such as ‘ParquetExec’.
source§

fn properties(&self) -> &PlanProperties

Return properties of the output of the ExecutionPlan, such as output ordering(s), partitioning information etc. Read more
source§

fn children(&self) -> Vec<Arc<dyn ExecutionPlan>>

Get a list of children ExecutionPlans that act as inputs to this plan. The returned list will be empty for leaf nodes such as scans, will contain a single value for unary nodes, or two values for binary nodes (such as joins).
source§

fn with_new_children( self: Arc<RepartitionExec>, children: Vec<Arc<dyn ExecutionPlan>> ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError>

Returns a new ExecutionPlan where all existing children were replaced by the children, in order
source§

fn benefits_from_input_partitioning(&self) -> Vec<bool>

Specifies whether the ExecutionPlan benefits from increased parallelization at its input for each child. Read more
source§

fn maintains_input_order(&self) -> Vec<bool>

Returns false if this ExecutionPlan’s implementation may reorder rows within or between partitions. Read more
source§

fn execute( &self, partition: usize, context: Arc<TaskContext> ) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>

Begin execution of partition, returning a Stream of RecordBatches. Read more
source§

fn metrics(&self) -> Option<MetricsSet>

Return a snapshot of the set of Metrics for this ExecutionPlan. If no Metrics are available, return None. Read more
source§

fn statistics(&self) -> Result<Statistics, DataFusionError>

Returns statistics for this ExecutionPlan node. If statistics are not available, should return Statistics::new_unknown (the default), not an error.
source§

fn static_name() -> &'static str
where Self: Sized,

Short name for the ExecutionPlan, such as ‘ParquetExec’. Like name but can be called without an instance.
source§

fn schema(&self) -> Arc<Schema>

Get the schema for this execution plan
source§

fn required_input_distribution(&self) -> Vec<Distribution>

Specifies the data distribution requirements for all the children for this ExecutionPlan, By default it’s [Distribution::UnspecifiedDistribution] for each child,
source§

fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>

Specifies the ordering required for all of the children of this ExecutionPlan. Read more
source§

fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions ) -> Result<Option<Arc<dyn ExecutionPlan>>, DataFusionError>

If supported, attempt to increase the partitioning of this ExecutionPlan to produce target_partitions partitions. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> Same for T

§

type Output = T

Should always be Self
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V