Struct RepartitionExec

Source

pub struct RepartitionExec { /* private fields */ }

Expand description

Maps N input partitions to M output partitions based on a Partitioning scheme.

§Background

DataFusion, like most other commercial systems, with the notable exception of DuckDB, uses the “Exchange Operator” based approach to parallelism which works well in practice given sufficient care in implementation.

DataFusion’s planner picks the target number of partitions and then RepartitionExec redistributes RecordBatches to that number of output partitions.

For example, given target_partitions=3 (trying to use 3 cores) but scanning an input with 2 partitions, RepartitionExec can be used to get 3 even streams of RecordBatches

        ▲                  ▲                  ▲
        │                  │                  │
        │                  │                  │
        │                  │                  │
 ┌───────────────┐  ┌───────────────┐  ┌───────────────┐
 │    GroupBy    │  │    GroupBy    │  │    GroupBy    │
 │   (Partial)   │  │   (Partial)   │  │   (Partial)   │
 └───────────────┘  └───────────────┘  └───────────────┘
        ▲                  ▲                  ▲
        └──────────────────┼──────────────────┘
                           │
              ┌─────────────────────────┐
              │     RepartitionExec     │
              │   (hash/round robin)    │
              └─────────────────────────┘
                         ▲   ▲
             ┌───────────┘   └───────────┐
             │                           │
             │                           │
        .─────────.                 .─────────.
     ,─'           '─.           ,─'           '─.
    ;      Input      :         ;      Input      :
    :   Partition 0   ;         :   Partition 1   ;
     ╲               ╱           ╲               ╱
      '─.         ,─'             '─.         ,─'
         `───────'                   `───────'

§Error Handling

If any of the input partitions return an error, the error is propagated to all output partitions and inputs are not polled again.

§Output Ordering

If more than one stream is being repartitioned, the output will be some arbitrary interleaving (and thus unordered) unless Self::with_preserve_order specifies otherwise.

§Spilling Architecture

RepartitionExec uses SpillPool channels to handle memory pressure during repartitioning. Each (input partition, output partition) pair gets its own SpillPool channel for FIFO ordering.

Input Partitions (N)          Output Partitions (M)
────────────────────          ─────────────────────

   Input 0 ──┐                      ┌──▶ Output 0
             │  ┌──────────────┐    │
             ├─▶│ SpillPool    │────┤
             │  │ [In0→Out0]   │    │
   Input 1 ──┤  └──────────────┘    ├──▶ Output 1
             │                       │
             │  ┌──────────────┐    │
             ├─▶│ SpillPool    │────┤
             │  │ [In1→Out0]   │    │
   Input 2 ──┤  └──────────────┘    ├──▶ Output 2
             │                      │
             │       ... (N×M SpillPools total)
             │                      │
             │  ┌──────────────┐    │
             └─▶│ SpillPool    │────┘
                │ [InN→OutM]   │
                └──────────────┘

Each SpillPool maintains FIFO order for its (input, output) pair.
See `RepartitionBatch` for details on the memory/spill decision logic.

§Footnote

The “Exchange Operator” was first described in the 1989 paper Encapsulation of parallelism in the Volcano query processing system Paper which uses the term “Exchange” for the concept of repartitioning data across threads.

Struct RepartitionExec Copy item path

§Background

§Error Handling

§Output Ordering

§Spilling Architecture

§Footnote

Implementations§

impl RepartitionExec

pub fn input(&self) -> &Arc<dyn ExecutionPlan>

pub fn partitioning(&self) -> &Partitioning

pub fn preserve_order(&self) -> bool

pub fn name(&self) -> &str

impl RepartitionExec

pub fn try_new( input: Arc<dyn ExecutionPlan>, partitioning: Partitioning, ) -> Result<Self>

pub fn with_preserve_order(self) -> Self

Trait Implementations§

impl Clone for RepartitionExec

fn clone(&self) -> RepartitionExec

fn clone_from(&mut self, source: &Self)

impl Debug for RepartitionExec

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl DisplayAs for RepartitionExec

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result

impl ExecutionPlan for RepartitionExec

fn as_any(&self) -> &dyn Any

fn name(&self) -> &'static str

fn properties(&self) -> &PlanProperties

fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>

fn benefits_from_input_partitioning(&self) -> Vec<bool>

fn maintains_input_order(&self) -> Vec<bool>

fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>

fn metrics(&self) -> Option<MetricsSet>

fn statistics(&self) -> Result<Statistics>

fn partition_statistics(&self, partition: Option<usize>) -> Result<Statistics>

fn cardinality_effect(&self) -> CardinalityEffect

fn try_swapping_with_projection( &self, projection: &ProjectionExec, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

fn gather_filters_for_pushdown( &self, _phase: FilterPushdownPhase, parent_filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterDescription>

fn handle_child_pushdown_result( &self, _phase: FilterPushdownPhase, child_pushdown_result: ChildPushdownResult, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn ExecutionPlan>>>

fn repartitioned( &self, target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

fn static_name() -> &'static strwhere Self: Sized,

fn schema(&self) -> SchemaRef

fn check_invariants(&self, check: InvariantLevel) -> Result<()>

fn required_input_distribution(&self) -> Vec<Distribution>

fn required_input_ordering(&self) -> Vec<Option<OrderingRequirements>>

fn reset_state(self: Arc<Self>) -> Result<Arc<dyn ExecutionPlan>>

fn supports_limit_pushdown(&self) -> bool

fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>

fn fetch(&self) -> Option<usize>

fn with_new_state( &self, _state: Arc<dyn Any + Send + Sync>, ) -> Option<Arc<dyn ExecutionPlan>>

Auto Trait Implementations§

impl Freeze for RepartitionExec

impl !RefUnwindSafe for RepartitionExec

impl Send for RepartitionExec

impl Sync for RepartitionExec

impl Unpin for RepartitionExec

impl !UnwindSafe for RepartitionExec

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Struct RepartitionExec

fn static_name() -> &'static str
where Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

impl<T> ErasedDestructor for T
where T: 'static,