Skip to main content

Module topk_executor

Module topk_executor 

Source
Expand description

ORDER BY + LIMIT Optimization with Streaming Top-K

This module fixes a critical semantic bug in the query execution path and provides efficient top-K query support for queue-like access patterns.

§The Bug: Incorrect ORDER BY + LIMIT Semantics

The previous implementation applied LIMIT during scan collection, then sorted:

❌ WRONG:
   for row in scan:
       output.push(row)
       if output.len() >= limit:
           break
   output.sort(order_by)  # BUG: Sorting wrong subset!

This is NOT semantically equivalent to ORDER BY ... LIMIT K unless the scan order matches the requested order (generally false).

Example of the bug:

Table: [priority=5, priority=1, priority=3, priority=2, priority=4]
Query: ORDER BY priority ASC LIMIT 1

Wrong (limit-then-sort): Returns priority=5 (first row)
Correct: Returns priority=1 (minimum)

§The Fix: Three Strategies

StrategyTimeSpaceWhen to Use
IndexPushdownO(log N+K)O(K)Ordered index on ORDER BY col
StreamingTopKO(N log K)O(K)No index, K << N
FullSortO(N log N)O(N)No index, K ≈ N

§Streaming Top-K Algorithm

For ORDER BY col ASC LIMIT K:

  1. Maintain a max-heap of size K
  2. For each row, if row.col < heap.max, evict max and insert row
  3. At end, drain heap in sorted order

This gives O(N log K) time and O(K) space, vs O(N log N) and O(N) for full sort.

§Queue Optimization

For “get highest priority task” (ORDER BY priority ASC LIMIT 1):

  • With 10K tasks: ~10K comparisons with O(1) memory
  • With ordered index: ~14 comparisons (log₂ 10000)

Structs§

IndexAwareTopK
Top-K with index awareness
OrderByColumn
A single ORDER BY column specification
OrderByLimitExecutor
Executor for ORDER BY + LIMIT queries
OrderByLimitStats
Result statistics from ORDER BY + LIMIT execution
OrderBySpec
Full ORDER BY specification
SingleColumnTopK
Optimized top-K for single-column ORDER BY
TopKHeap
A bounded heap for streaming top-K selection

Enums§

ColumnRef
Reference to a column
ExecutionStrategy
Strategy for ORDER BY + LIMIT execution
SortDirection
Sort direction