pub struct EnforceDistribution {}
Expand description

The EnforceDistribution rule ensures that distribution requirements are met. In doing so, this rule will increase the parallelism in the plan by introducing repartitioning operators to the physical plan.

For example, given an input such as:

┌─────────────────────────────────┐
│                                 │
│          ExecutionPlan          │
│                                 │
└─────────────────────────────────┘
            ▲         ▲
            │         │
      ┌─────┘         └─────┐
      │                     │
      │                     │
      │                     │
┌───────────┐         ┌───────────┐
│           │         │           │
│ batch A1  │         │ batch B1  │
│           │         │           │
├───────────┤         ├───────────┤
│           │         │           │
│ batch A2  │         │ batch B2  │
│           │         │           │
├───────────┤         ├───────────┤
│           │         │           │
│ batch A3  │         │ batch B3  │
│           │         │           │
└───────────┘         └───────────┘

     Input                 Input
       A                     B

This rule will attempt to add a RepartitionExec to increase parallelism (to 3, in this case) and create the following arrangement:

    ┌─────────────────────────────────┐
    │                                 │
    │          ExecutionPlan          │
    │                                 │
    └─────────────────────────────────┘
              ▲      ▲       ▲            Input now has 3
              │      │       │             partitions
      ┌───────┘      │       └───────┐
      │              │               │
      │              │               │
┌───────────┐  ┌───────────┐   ┌───────────┐
│           │  │           │   │           │
│ batch A1  │  │ batch A3  │   │ batch B3  │
│           │  │           │   │           │
├───────────┤  ├───────────┤   ├───────────┤
│           │  │           │   │           │
│ batch B2  │  │ batch B1  │   │ batch A2  │
│           │  │           │   │           │
└───────────┘  └───────────┘   └───────────┘
      ▲              ▲               ▲
      │              │               │
      └─────────┐    │    ┌──────────┘
                │    │    │
                │    │    │
    ┌─────────────────────────────────┐   batches are
    │       RepartitionExec(3)        │   repartitioned
    │           RoundRobin            │
    │                                 │
    └─────────────────────────────────┘
                ▲         ▲
                │         │
          ┌─────┘         └─────┐
          │                     │
          │                     │
          │                     │
    ┌───────────┐         ┌───────────┐
    │           │         │           │
    │ batch A1  │         │ batch B1  │
    │           │         │           │
    ├───────────┤         ├───────────┤
    │           │         │           │
    │ batch A2  │         │ batch B2  │
    │           │         │           │
    ├───────────┤         ├───────────┤
    │           │         │           │
    │ batch A3  │         │ batch B3  │
    │           │         │           │
    └───────────┘         └───────────┘


     Input                 Input
       A                     B

The EnforceDistribution rule

  • is idempotent; i.e. it can be applied multiple times, each time producing the same result.
  • always produces a valid plan in terms of distribution requirements. Its input plan can be valid or invalid with respect to distribution requirements, but the output plan will always be valid.
  • produces a valid plan in terms of ordering requirements, if its input is a valid plan in terms of ordering requirements. If the input plan is invalid, this rule does not attempt to fix it as doing so is the responsibility of the EnforceSorting rule.

Note that distribution requirements are met in the strictest way. This may result in more than strictly necessary RepartitionExecs in the plan, but meeting the requirements in the strictest way may help avoid possible data skew in joins.

For example for a hash join with keys (a, b, c), the required Distribution(a, b, c) can be satisfied by several alternative partitioning ways: (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c) and ( ).

This rule only chooses the exact match and satisfies the Distribution(a, b, c) by a HashPartition(a, b, c).

Implementations§

Trait Implementations§

source§

impl Default for EnforceDistribution

source§

fn default() -> EnforceDistribution

Returns the “default value” for a type. Read more
source§

impl PhysicalOptimizerRule for EnforceDistribution

source§

fn optimize( &self, plan: Arc<dyn ExecutionPlan>, config: &ConfigOptions ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError>

Rewrite plan to an optimized form
source§

fn name(&self) -> &str

A human readable name for this optimizer rule
source§

fn schema_check(&self) -> bool

A flag to indicate whether the physical planner should valid the rule will not change the schema of the plan after the rewriting. Some of the optimization rules might change the nullable properties of the schema and should disable the schema check.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T> Instrument for T

source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<Unshared, Shared> IntoShared<Shared> for Unshared
where Shared: FromUnshared<Unshared>,

source§

fn into_shared(self) -> Shared

Creates a shared type from an unshared type.
source§

impl<T> Same for T

§

type Output = T

Should always be Self
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V

source§

impl<T> WithSubscriber for T

source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

source§

impl<T> Ungil for T
where T: Send,