Expand description
Sort Pushdown Optimization
This optimizer attempts to push sort requirements down through the execution plan tree to data sources that can natively handle them (e.g., by scanning files in reverse order).
§How it works
- Detects
SortExecnodes in the plan - Calls
try_pushdown_sort()on the input to recursively push the sort requirement - Each node type defines its own pushdown behavior:
- Transparent nodes (CoalesceBatchesExec, RepartitionExec, etc.) delegate to their children and wrap the result
- Data sources (DataSourceExec) check if they can optimize for the ordering
- Blocking nodes return
Unsupportedto stop pushdown
- Based on the result:
Exact: Remove the Sort operator (data source guarantees perfect ordering)Inexact: Keep Sort but use optimized input (enables early termination for TopK)Unsupported: No change
§Current capabilities (Phase 1)
- Reverse scan optimization: when required sort is the reverse of the data source’s natural ordering, enable reverse scanning (reading row groups in reverse order)
- Supports prefix matching: if data has ordering [A DESC, B ASC] and query needs [A ASC], reversing gives [A ASC, B DESC] which satisfies the requirement
TODO Issue: https://github.com/apache/datafusion/issues/19329
§Future enhancements (Phase 2),
- File reordering based on statistics
- Return
Exactwhen files are known to be perfectly sorted - Complete Sort elimination when ordering is guaranteed
Structs§
- Pushdown
Sort - A PhysicalOptimizerRule that attempts to push down sort requirements to data sources.