Expand description

This module contains a query optimizer that operates against a physical plan and applies rules to a physical plan, such as “Repartition”.

Re-exports

pub use optimizer::PhysicalOptimizerRule;

Modules

Utilizing exact statistics from sources to avoid scanning data
CoalesceBatches optimizer that groups batches together rows in bigger batches to avoid overhead with small batches
EnforceDistribution optimizer rule inspects the physical plan with respect to distribution requirements and adds RepartitionExecs to satisfy them when necessary.
Select the efficient global sort implementation based on sort details.
Select the proper PartitionMode and build side based on the avaliable statistics for hash join.
Physical optimizer traits
The PipelineChecker rule ensures that a given plan can accommodate its infinite sources, if there are any. It will reject non-runnable query plans that use pipeline-breaking operators on infinite input(s).
The PipelineFixer rule tries to modify a given plan so that it can accommodate its infinite sources, if there are any. In other words, it tries to obtain a runnable query (with the given infinite sources) from an non-runnable query by transforming pipeline-breaking operations to pipeline-friendly ones. If this can not be done, the rule emits a diagnostic error message.
This module contains code to prune “containers” of row groups based on statistics prior to execution. This can lead to significant performance improvements by avoiding the need to evaluate a plan on entire containers (e.g. an entire file)
Repartition optimizer that introduces repartition nodes to increase the level of parallelism available
EnforceSorting optimizer rule inspects the physical plan with respect to local sorting requirements and does the following: