Struct datafusion::physical_plan::joins::NestedLoopJoinExec

source ·
pub struct NestedLoopJoinExec { /* private fields */ }
Expand description

NestedLoopJoinExec is build-probe join operator, whose main task is to perform joins without any equijoin conditions in ON clause.

Execution consists of following phases:

§1. Build phase

Collecting build-side data in memory, by polling all available data from build-side input. Due to the absence of equijoin conditions, it’s not possible to partition build-side data across multiple threads of the operator, so build-side is always collected in a single batch shared across all threads. The operator always considers LEFT input as build-side input, so it’s crucial to adjust smaller input to be the LEFT one. Normally this selection is handled by physical optimizer.

§2. Probe phase

Sequentially polling batches from the probe-side input and processing them according to the following logic:

  • apply join filter (ON clause) to Cartesian product of probe batch and build side data – filter evaluation is executed once per build-side data row
  • update shared bitmap of joined (“visited”) build-side row indices, if required – allows to produce unmatched build-side data in case of e.g. LEFT/FULL JOIN after probing phase completed
  • perform join index alignment is required – depending on JoinType
  • produce output join batch

Probing phase is executed in parallel, according to probe-side input partitioning – one thread per partition. After probe input is exhausted, each thread ATTEMPTS to produce unmatched build-side data.

§3. Producing unmatched build-side data

Producing unmatched build-side data as an output batch, after probe input is exhausted. This step is also executed in parallel (once per probe input partition), and to avoid duplicate output of unmatched data (due to shared nature build-side data), each thread “reports” about probe phase completion (which means that “visited” bitmap won’t be updated anymore), and only the last thread, reporting about completion, will return output.

Implementations§

source§

impl NestedLoopJoinExec

source

pub fn try_new( left: Arc<dyn ExecutionPlan>, right: Arc<dyn ExecutionPlan>, filter: Option<JoinFilter>, join_type: &JoinType, ) -> Result<NestedLoopJoinExec, DataFusionError>

Try to create a nwe NestedLoopJoinExec

source

pub fn left(&self) -> &Arc<dyn ExecutionPlan>

left side

source

pub fn right(&self) -> &Arc<dyn ExecutionPlan>

right side

source

pub fn filter(&self) -> Option<&JoinFilter>

Filters applied before join output

source

pub fn join_type(&self) -> &JoinType

How the join is performed

Trait Implementations§

source§

impl Debug for NestedLoopJoinExec

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
source§

impl DisplayAs for NestedLoopJoinExec

source§

fn fmt_as( &self, t: DisplayFormatType, f: &mut Formatter<'_>, ) -> Result<(), Error>

Format according to DisplayFormatType, used when verbose representation looks different from the default one Read more
source§

impl ExecutionPlan for NestedLoopJoinExec

source§

fn name(&self) -> &'static str

Short name for the ExecutionPlan, such as ‘ParquetExec’.
source§

fn as_any(&self) -> &(dyn Any + 'static)

Returns the execution plan as Any so that it can be downcast to a specific implementation.
source§

fn properties(&self) -> &PlanProperties

Return properties of the output of the ExecutionPlan, such as output ordering(s), partitioning information etc. Read more
source§

fn required_input_distribution(&self) -> Vec<Distribution>

Specifies the data distribution requirements for all the children for this ExecutionPlan, By default it’s [Distribution::UnspecifiedDistribution] for each child,
source§

fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>

Get a list of children ExecutionPlans that act as inputs to this plan. The returned list will be empty for leaf nodes such as scans, will contain a single value for unary nodes, or two values for binary nodes (such as joins).
source§

fn with_new_children( self: Arc<NestedLoopJoinExec>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError>

Returns a new ExecutionPlan where all existing children were replaced by the children, in order
source§

fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>

Begin execution of partition, returning a Stream of RecordBatches. Read more
source§

fn metrics(&self) -> Option<MetricsSet>

Return a snapshot of the set of Metrics for this ExecutionPlan. If no Metrics are available, return None. Read more
source§

fn statistics(&self) -> Result<Statistics, DataFusionError>

Returns statistics for this ExecutionPlan node. If statistics are not available, should return Statistics::new_unknown (the default), not an error.
source§

fn static_name() -> &'static str
where Self: Sized,

Short name for the ExecutionPlan, such as ‘ParquetExec’. Like name but can be called without an instance.
source§

fn schema(&self) -> Arc<Schema>

Get the schema for this execution plan
source§

fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirement>>>

Specifies the ordering required for all of the children of this ExecutionPlan. Read more
source§

fn maintains_input_order(&self) -> Vec<bool>

Returns false if this ExecutionPlan’s implementation may reorder rows within or between partitions. Read more
source§

fn benefits_from_input_partitioning(&self) -> Vec<bool>

Specifies whether the ExecutionPlan benefits from increased parallelization at its input for each child. Read more
source§

fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>, DataFusionError>

If supported, attempt to increase the partitioning of this ExecutionPlan to produce target_partitions partitions. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> Same for T

§

type Output = T

Should always be Self
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

source§

fn vzip(self) -> V