Skip to main content

Module pushdown_sort

Module pushdown_sort 

Source
Expand description

Sort Pushdown Optimization

This optimizer attempts to push sort requirements down through the execution plan tree to data sources that can natively handle them (e.g., by scanning files in reverse order).

§How it works

  1. Detects SortExec nodes in the plan
  2. Calls try_pushdown_sort() on the input to recursively push the sort requirement
  3. Each node type defines its own pushdown behavior:
    • Transparent nodes (CoalesceBatchesExec, RepartitionExec, etc.) delegate to their children and wrap the result
    • Data sources (DataSourceExec) check if they can optimize for the ordering
    • Blocking nodes return Unsupported to stop pushdown
  4. Based on the result:
    • Exact: Remove the Sort operator (data source guarantees perfect ordering)
    • Inexact: Keep Sort but use optimized input (enables early termination for TopK)
    • Unsupported: No change

§Current capabilities (Phase 1)

  • Reverse scan optimization: when required sort is the reverse of the data source’s natural ordering, enable reverse scanning (reading row groups in reverse order)
  • Supports prefix matching: if data has ordering [A DESC, B ASC] and query needs [A ASC], reversing gives [A ASC, B DESC] which satisfies the requirement

TODO Issue: https://github.com/apache/datafusion/issues/19329

§Future enhancements (Phase 2),

  • File reordering based on statistics
  • Return Exact when files are known to be perfectly sorted
  • Complete Sort elimination when ordering is guaranteed

Structs§

PushdownSort
A PhysicalOptimizerRule that attempts to push down sort requirements to data sources.