Module datafusion::physical_optimizer
source · Expand description
This module contains a query optimizer that operates against a physical plan and applies rules to a physical plan, such as “Repartition”.
Re-exports
pub use optimizer::PhysicalOptimizerRule;
Modules
- Utilizing exact statistics from sources to avoid scanning data
- CoalesceBatches optimizer that groups batches together rows in bigger batches to avoid overhead with small batches
- CombinePartialFinalAggregate optimizer rule checks the adjacent Partial and Final AggregateExecs and try to combine them if necessary
- EnforceDistribution optimizer rule inspects the physical plan with respect to distribution requirements and adds RepartitionExecs to satisfy them when necessary.
- Select the efficient global sort implementation based on sort details.
- Select the proper PartitionMode and build side based on the avaliable statistics for hash join.
- Physical optimizer traits
- The PipelineChecker rule ensures that a given plan can accommodate its infinite sources, if there are any. It will reject non-runnable query plans that use pipeline-breaking operators on infinite input(s).
- The PipelineFixer rule tries to modify a given plan so that it can accommodate its infinite sources, if there are any. In other words, it tries to obtain a runnable query (with the given infinite sources) from an non-runnable query by transforming pipeline-breaking operations to pipeline-friendly ones. If this can not be done, the rule emits a diagnostic error message.
- This module contains code to prune “containers” of row groups based on statistics prior to execution. This can lead to significant performance improvements by avoiding the need to evaluate a plan on entire containers (e.g. an entire file)
- Repartition optimizer that introduces repartition nodes to increase the level of parallelism available
- EnforceSorting optimizer rule inspects the physical plan with respect to local sorting requirements and does the following: