Struct NestedLoopJoinExec

Source

pub struct NestedLoopJoinExec { /* private fields */ }

Expand description

NestedLoopJoinExec is a build-probe join operator designed for joins that do not have equijoin keys in their ON clause.

§Execution Flow

                                               Incoming right batch
               Left Side Buffered Batches
                      ┌───────────┐              ┌───────────────┐
                      │ ┌───────┐ │              │               │
                      │ │       │ │              │               │
 Current Left Row ───▶│ ├───────├─┤──────────┐   │               │
                      │ │       │ │          │   └───────────────┘
                      │ │       │ │          │           │
                      │ │       │ │          │           │
                      │ └───────┘ │          │           │
                      │ ┌───────┐ │          │           │
                      │ │       │ │          │     ┌─────┘
                      │ │       │ │          │     │
                      │ │       │ │          │     │
                      │ │       │ │          │     │
                      │ │       │ │          │     │
                      │ └───────┘ │          ▼     ▼
                      │   ......  │  ┌──────────────────────┐
                      │           │  │X (Cartesian Product) │
                      │           │  └──────────┬───────────┘
                      └───────────┘             │
                                                │
                                                ▼
                                     ┌───────┬───────────────┐
                                     │       │               │
                                     │       │               │
                                     │       │               │
                                     └───────┴───────────────┘
                                       Intermediate Batch
                                 (For join predicate evaluation)

The execution follows a two-phase design:

§1. Buffering Left Input

The operator eagerly buffers all left-side input batches into memory, util a memory limit is reached. Currently, an out-of-memory error will be thrown if all the left-side input batches cannot fit into memory at once. In the future, it’s possible to make this case finish execution. (see ‘Memory-limited Execution’ section)
The rationale for buffering the left side is that scanning the right side can be expensive (e.g., decoding Parquet files), so buffering more left rows reduces the number of right-side scan passes required.

§2. Probing Right Input

Right-side input is streamed batch by batch.
For each right-side batch:
- It evaluates the join filter against the full buffered left input. This results in a Cartesian product between the right batch and each left row – with the join predicate/filter applied – for each inner loop iteration.
- Matched results are accumulated into an output buffer. (see more in Output Buffering Strategy section)
This process continues until all right-side input is consumed.

§Producing unmatched build-side data

For special join types like left/full joins, it’s required to also output unmatched pairs. During execution, bitmaps are kept for both left and right sides of the input; they’ll be handled by dedicated states in NLJStream.
The final output of the left side unmatched rows is handled by a single partition for simplicity, since it only counts a small portion of the execution time. (e.g. if probe side has 10k rows, the final output of unmatched build side only roughly counts for 1/10k of the total time)

§Output Buffering Strategy

The operator uses an intermediate output buffer to accumulate results. Once the output threshold is reached (currently set to the same value as batch_size in the configuration), the results will be eagerly output.

§Extra Notes

The operator always considers the left side as the build (buffered) side. Therefore, the physical optimizer should assign the smaller input to the left.
The design try to minimize the intermediate data size to approximately 1 batch, for better cache locality and memory efficiency.

§TODO: Memory-limited Execution

If the memory budget is exceeded during left-side buffering, fallback strategies such as streaming left batches and re-scanning the right side may be implemented in the future.

Tracking issue: https://github.com/apache/datafusion/issues/15760

§Clone / Shared State

Note this structure includes a [OnceAsync] that is used to coordinate the loading of the left side with the processing in each output stream. Therefore it can not be Clone

Struct NestedLoopJoinExec Copy item path

§Execution Flow

§1. Buffering Left Input

§2. Probing Right Input

§Producing unmatched build-side data

§Output Buffering Strategy

§Extra Notes

§TODO: Memory-limited Execution

§Clone / Shared State

Implementations§

impl NestedLoopJoinExec

pub fn try_new( left: Arc<dyn ExecutionPlan>, right: Arc<dyn ExecutionPlan>, filter: Option<JoinFilter>, join_type: &JoinType, projection: Option<Vec<usize>>, ) -> Result<Self>

pub fn left(&self) -> &Arc<dyn ExecutionPlan>

pub fn right(&self) -> &Arc<dyn ExecutionPlan>

pub fn filter(&self) -> Option<&JoinFilter>

pub fn join_type(&self) -> &JoinType

pub fn projection(&self) -> Option<&Vec<usize>>

pub fn contains_projection(&self) -> bool

pub fn with_projection(&self, projection: Option<Vec<usize>>) -> Result<Self>

pub fn swap_inputs(&self) -> Result<Arc<dyn ExecutionPlan>>

§Notes:

Trait Implementations§

impl Debug for NestedLoopJoinExec

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl DisplayAs for NestedLoopJoinExec

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result

impl EmbeddedProjection for NestedLoopJoinExec

fn with_projection(&self, projection: Option<Vec<usize>>) -> Result<Self>

impl ExecutionPlan for NestedLoopJoinExec

fn try_swapping_with_projection( &self, projection: &ProjectionExec, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

fn name(&self) -> &'static str

fn as_any(&self) -> &dyn Any

fn properties(&self) -> &PlanProperties

fn required_input_distribution(&self) -> Vec<Distribution>

fn maintains_input_order(&self) -> Vec<bool>

fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>>

fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<SendableRecordBatchStream>

fn metrics(&self) -> Option<MetricsSet>

fn statistics(&self) -> Result<Statistics>

fn partition_statistics(&self, partition: Option<usize>) -> Result<Statistics>

fn static_name() -> &'static strwhere Self: Sized,

fn schema(&self) -> SchemaRef

fn check_invariants(&self, check: InvariantLevel) -> Result<()>

fn required_input_ordering(&self) -> Vec<Option<OrderingRequirements>>

fn benefits_from_input_partitioning(&self) -> Vec<bool>

fn reset_state(self: Arc<Self>) -> Result<Arc<dyn ExecutionPlan>>

fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

fn supports_limit_pushdown(&self) -> bool

fn with_fetch(&self, _limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>

fn fetch(&self) -> Option<usize>

fn cardinality_effect(&self) -> CardinalityEffect

fn gather_filters_for_pushdown( &self, _phase: FilterPushdownPhase, parent_filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterDescription>

fn handle_child_pushdown_result( &self, _phase: FilterPushdownPhase, child_pushdown_result: ChildPushdownResult, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn ExecutionPlan>>>

fn with_new_state( &self, _state: Arc<dyn Any + Send + Sync>, ) -> Option<Arc<dyn ExecutionPlan>>

Auto Trait Implementations§

impl !Freeze for NestedLoopJoinExec

impl !RefUnwindSafe for NestedLoopJoinExec

impl Send for NestedLoopJoinExec

impl Sync for NestedLoopJoinExec

impl Unpin for NestedLoopJoinExec

impl !UnwindSafe for NestedLoopJoinExec

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

Struct NestedLoopJoinExec

fn static_name() -> &'static str
where Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

impl<T> ErasedDestructor for T
where T: 'static,