Expand description
ORDER BY + LIMIT Optimization with Streaming Top-K
This module fixes a critical semantic bug in the query execution path and provides efficient top-K query support for queue-like access patterns.
§The Bug: Incorrect ORDER BY + LIMIT Semantics
The previous implementation applied LIMIT during scan collection, then sorted:
❌ WRONG:
for row in scan:
output.push(row)
if output.len() >= limit:
break
output.sort(order_by) # BUG: Sorting wrong subset!This is NOT semantically equivalent to ORDER BY ... LIMIT K unless the
scan order matches the requested order (generally false).
Example of the bug:
Table: [priority=5, priority=1, priority=3, priority=2, priority=4]
Query: ORDER BY priority ASC LIMIT 1
Wrong (limit-then-sort): Returns priority=5 (first row)
Correct: Returns priority=1 (minimum)§The Fix: Three Strategies
| Strategy | Time | Space | When to Use |
|---|---|---|---|
| IndexPushdown | O(log N+K) | O(K) | Ordered index on ORDER BY col |
| StreamingTopK | O(N log K) | O(K) | No index, K << N |
| FullSort | O(N log N) | O(N) | No index, K ≈ N |
§Streaming Top-K Algorithm
For ORDER BY col ASC LIMIT K:
- Maintain a max-heap of size K
- For each row, if row.col < heap.max, evict max and insert row
- At end, drain heap in sorted order
This gives O(N log K) time and O(K) space, vs O(N log N) and O(N) for full sort.
§Queue Optimization
For “get highest priority task” (ORDER BY priority ASC LIMIT 1):
- With 10K tasks: ~10K comparisons with O(1) memory
- With ordered index: ~14 comparisons (log₂ 10000)
Structs§
- Index
Aware TopK - Top-K with index awareness
- Order
ByColumn - A single ORDER BY column specification
- Order
ByLimit Executor - Executor for ORDER BY + LIMIT queries
- Order
ByLimit Stats - Result statistics from ORDER BY + LIMIT execution
- Order
BySpec - Full ORDER BY specification
- Single
Column TopK - Optimized top-K for single-column ORDER BY
- TopK
Heap - A bounded heap for streaming top-K selection
Enums§
- Column
Ref - Reference to a column
- Execution
Strategy - Strategy for ORDER BY + LIMIT execution
- Sort
Direction - Sort direction