Module datafusion::physical_optimizer
source · Expand description
This module contains a query optimizer that operates against a physical plan and applies rules to a physical plan, such as “Repartition”.
Re-exports
pub use optimizer::PhysicalOptimizerRule;
Modules
Utilizing exact statistics from sources to avoid scanning data
CoalesceBatches optimizer that groups batches together rows
in bigger batches to avoid overhead with small batches
EnforceDistribution optimizer rule inspects the physical plan with respect
to distribution requirements and adds RepartitionExecs to satisfy them
when necessary.
Select the efficient global sort implementation based on sort details.
Select the proper PartitionMode and build side based on the avaliable statistics for hash join.
Physical optimizer traits
The PipelineChecker rule ensures that a given plan can accommodate its
infinite sources, if there are any. It will reject non-runnable query plans
that use pipeline-breaking operators on infinite input(s).
The PipelineFixer rule tries to modify a given plan so that it can
accommodate its infinite sources, if there are any. In other words,
it tries to obtain a runnable query (with the given infinite sources)
from an non-runnable query by transforming pipeline-breaking operations
to pipeline-friendly ones. If this can not be done, the rule emits a
diagnostic error message.
This module contains code to prune “containers” of row groups
based on statistics prior to execution. This can lead to
significant performance improvements by avoiding the need
to evaluate a plan on entire containers (e.g. an entire file)
Repartition optimizer that introduces repartition nodes to increase the level of parallelism available
EnforceSorting optimizer rule inspects the physical plan with respect
to local sorting requirements and does the following: