Module datafusion::physical_optimizer
source · Expand description
Optimizer that rewrites ExecutionPlan
s.
These rules take advantage of physical plan properties , such as “Repartition” or “Sortedness”
Re-exports§
pub use optimizer::PhysicalOptimizerRule;
Modules§
- Utilizing exact statistics from sources to avoid scanning data
- CoalesceBatches optimizer that groups batches together rows in bigger batches to avoid overhead with small batches
- CombinePartialFinalAggregate optimizer rule checks the adjacent Partial and Final AggregateExecs and try to combine them if necessary
- EnforceDistribution optimizer rule inspects the physical plan with respect to distribution requirements and adds
RepartitionExec
s to satisfy them when necessary. If increasing parallelism is beneficial (and also desirable according to the configuration), this rule increases partition counts in the physical plan. - EnforceSorting optimizer rule inspects the physical plan with respect to local sorting requirements and does the following:
- The
JoinSelection
rule tries to modify a given plan so that it can accommodate infinite sources and utilize statistical information (if there is any) to obtain more performant plans. To achieve the first goal, it tries to transform a non-runnable query (with the given infinite sources) into a runnable query by replacing pipeline-breaking join operations with pipeline-friendly ones. To achieve the second goal, it selects the properPartitionMode
and the build side using the available statistics for hash joins. - A special-case optimizer rule that pushes limit into a grouped aggregation which has no aggregate expressions or sorting requirements
- Physical optimizer traits
- The GlobalOrderRequire optimizer rule either:
- The PipelineChecker rule ensures that a given plan can accommodate its infinite sources, if there are any. It will reject non-runnable query plans that use pipeline-breaking operators on infinite input(s).
PruningPredicate
to apply filterExpr
to prune “containers” based on statistics (e.g. Parquet Row Groups)- Optimizer rule that replaces executors that lose ordering with their order-preserving variants when it is helpful; either in terms of performance or to accommodate unbounded streams by fixing the pipeline.
- An optimizer rule that detects aggregate operations that could use a limited bucket count