Expand description

This module contains a query optimizer that operates against a physical plan and applies rules to a physical plan, such as “Repartition”.

Re-exports

Modules

  • Utilizing exact statistics from sources to avoid scanning data
  • CoalesceBatches optimizer that groups batches together rows in bigger batches to avoid overhead with small batches
  • CombinePartialFinalAggregate optimizer rule checks the adjacent Partial and Final AggregateExecs and try to combine them if necessary
  • EnforceDistribution optimizer rule inspects the physical plan with respect to distribution requirements and adds RepartitionExecs to satisfy them when necessary.
  • Select the efficient global sort implementation based on sort details.
  • Select the proper PartitionMode and build side based on the avaliable statistics for hash join.
  • Physical optimizer traits
  • The PipelineChecker rule ensures that a given plan can accommodate its infinite sources, if there are any. It will reject non-runnable query plans that use pipeline-breaking operators on infinite input(s).
  • The PipelineFixer rule tries to modify a given plan so that it can accommodate its infinite sources, if there are any. In other words, it tries to obtain a runnable query (with the given infinite sources) from an non-runnable query by transforming pipeline-breaking operations to pipeline-friendly ones. If this can not be done, the rule emits a diagnostic error message.
  • This module contains code to prune “containers” of row groups based on statistics prior to execution. This can lead to significant performance improvements by avoiding the need to evaluate a plan on entire containers (e.g. an entire file)
  • Repartition optimizer that introduces repartition nodes to increase the level of parallelism available
  • EnforceSorting optimizer rule inspects the physical plan with respect to local sorting requirements and does the following: