sql-cli 1.73.1

SQL query tool for CSV/JSON with both interactive TUI and non-interactive CLI modes - perfect for exploration and automation
Documentation
# Parser Analysis: Extending ORDER BY to Support Expressions

## Current State

### AST Structure
```rust
// Current: ORDER BY only supports column names (strings)
pub struct OrderByColumn {
    pub column: String,        // Just a column name!
    pub direction: SortDirection,
}
```

### Parser Implementation
Location: `src/sql/recursive_parser.rs:1309` (`parse_order_by_list()`)

**Current behavior:**
- Only parses **identifiers** (column names)
- Supports qualified names (`table.column`)
- Supports quoted identifiers
- Handles ASC/DESC keywords
- Returns `Vec<OrderByColumn>`

**Limitation:**
When parsing `ORDER BY SUM(sales_amount)`:
- Sees `SUM` as identifier
- Stops at `(` token
- Never parses the function call

## Required Changes

### 1. Update AST Structure

**Current:**
```rust
pub struct OrderByColumn {
    pub column: String,
    pub direction: SortDirection,
}
```

**Proposed:**
```rust
pub struct OrderByItem {
    pub expr: SqlExpression,   // Can be Column, FunctionCall, BinaryOp, etc.
    pub direction: SortDirection,
}
```

**Why this works:**
- `SqlExpression` already exists and handles all expression types
- Can represent:
  - Simple columns: `SqlExpression::Column(ColumnRef { name: "total", ... })`
  - Aggregates: `SqlExpression::FunctionCall { name: "SUM", args: [...], ... }`
  - Arithmetic: `SqlExpression::BinaryOp { ... }`
  - CASE: `SqlExpression::Case { ... }`

### 2. Update Parser Method

**Current signature:**
```rust
fn parse_order_by_list(&mut self) -> Result<Vec<OrderByColumn>, String>
```

**Proposed signature:**
```rust
fn parse_order_by_list(&mut self) -> Result<Vec<OrderByItem>, String>
```

**Implementation strategy:**
```rust
fn parse_order_by_list(&mut self) -> Result<Vec<OrderByItem>, String> {
    let mut order_items = Vec::new();

    loop {
        // Parse ANY expression (not just column names)
        let expr = self.parse_expression()?;

        // Check for ASC/DESC
        let direction = match &self.current_token {
            Token::Asc => {
                self.advance();
                SortDirection::Asc
            }
            Token::Desc => {
                self.advance();
                SortDirection::Desc
            }
            _ => SortDirection::Asc,
        };

        order_items.push(OrderByItem { expr, direction });

        // Handle comma-separated list
        if matches!(self.current_token, Token::Comma) {
            self.advance();
        } else {
            break;
        }
    }

    Ok(order_items)
}
```

**Why this works:**
- `parse_expression()` already exists and is battle-tested
- Handles ALL expression types:
  - Columns
  - Function calls (including aggregates)
  - Binary operators (+, -, *, /, etc.)
  - CASE expressions
  - Parentheses
  - Literals

### 3. Update SelectStatement AST

**Current:**
```rust
pub struct SelectStatement {
    // ...
    pub order_by: Option<Vec<OrderByColumn>>,
    // ...
}
```

**Proposed:**
```rust
pub struct SelectStatement {
    // ...
    pub order_by: Option<Vec<OrderByItem>>,
    // ...
}
```

### 4. Update Query Executor

The executor currently expects strings. We'll need to:

**Current execution:**
```rust
// Expects order_by to be Vec<OrderByColumn> with column: String
for order_col in order_by {
    // Look up column by name
    let column_name = &order_col.column;
    // ...
}
```

**Updated execution:**
```rust
// Needs to evaluate expressions
for order_item in order_by {
    match &order_item.expr {
        SqlExpression::Column(col_ref) => {
            // Simple column reference - use column name
            // ...
        }
        _ => {
            // Complex expression - evaluate it
            let value = evaluate_expression(&order_item.expr, row, context)?;
            // Use value for sorting
        }
    }
}
```

## Impact Analysis

### Files That Need Changes

1. **AST Definition** (`src/sql/parser/ast.rs`)
   - Add `OrderByItem` struct
   - Update `SelectStatement.order_by` type
   - Keep `OrderByColumn` as deprecated alias for compatibility

2. **Parser** (`src/sql/recursive_parser.rs`)
   - Update `parse_order_by_list()` to use `parse_expression()`
   - ~10 lines changed

3. **Query Executor** (`src/data/query_executor.rs`)
   - Update sorting logic to evaluate expressions
   - ~30-50 lines changed

4. **AST Formatter** (`src/sql/parser/ast_formatter.rs`)
   - Update to format ORDER BY expressions
   - ~10-20 lines

5. **All existing transformers** - NO CHANGES NEEDED!
   - Transformers work on `SqlExpression` already
   - `OrderByAliasTransformer` will work automatically

### Backward Compatibility

**Breaking change:** Yes, but minimal impact

**Migration strategy:**
```rust
// Helper function to create OrderByItem from column name (compatibility)
impl OrderByItem {
    pub fn from_column_name(name: String, direction: SortDirection) -> Self {
        Self {
            expr: SqlExpression::Column(ColumnRef {
                name,
                quote_style: QuoteStyle::None,
                table_prefix: None,
            }),
            direction,
        }
    }
}
```

## Benefits Beyond ORDER BY

This pattern can be reused for other clauses:

### 1. GROUP BY Expressions (Already Partially Done)
```sql
-- Currently supported (aliased)
GROUP BY YEAR(order_date)

-- Could support directly with same pattern
```

### 2. PARTITION BY in Window Functions
```sql
-- Already uses expressions, but could be more consistent
PARTITION BY CASE WHEN amount > 1000 THEN 'High' ELSE 'Low' END
```

### 3. Future: DISTINCT ON (PostgreSQL)
```sql
-- Would need expression support
DISTINCT ON (customer_id, order_date)
```

## Testing Strategy

### Unit Tests (Parser)
```rust
#[test]
fn test_order_by_simple_column() {
    // ORDER BY region
}

#[test]
fn test_order_by_aggregate() {
    // ORDER BY SUM(sales_amount)
}

#[test]
fn test_order_by_expression() {
    // ORDER BY sales_amount * 1.1
}

#[test]
fn test_order_by_case() {
    // ORDER BY CASE WHEN region = 'North' THEN 1 ELSE 2 END
}

#[test]
fn test_order_by_multiple() {
    // ORDER BY region, SUM(amount) DESC
}
```

### Integration Tests (SQL)
```sql
-- Test aggregate in ORDER BY
SELECT region, SUM(sales_amount) AS total
FROM sales
GROUP BY region
ORDER BY SUM(sales_amount) DESC;

-- Test expression in ORDER BY
SELECT region, sales_amount
FROM sales
ORDER BY sales_amount * 1.1 DESC;

-- Test CASE in ORDER BY
SELECT region, product
FROM sales
ORDER BY CASE WHEN region = 'North' THEN 1 ELSE 2 END;
```

## Implementation Phases

### Phase 1: AST Changes (30 min)
- Add `OrderByItem` struct to `ast.rs`
- Update `SelectStatement.order_by` type
- Add compatibility helper

### Phase 2: Parser Changes (1 hour)
- Update `parse_order_by_list()` to call `parse_expression()`
- Update all call sites
- Add unit tests

### Phase 3: Executor Changes (2 hours)
- Update sorting logic to handle expressions
- Add expression evaluation in sort key generation
- Handle edge cases (NULL, type mismatches)

### Phase 4: Formatter Changes (30 min)
- Update AST formatter for ORDER BY
- Ensure pretty-printing works

### Phase 5: Testing (1 hour)
- Add comprehensive tests
- Test with `--show-transformations`
- Verify transformer works

**Total estimated time: ~5 hours**

## Risk Assessment

### Low Risk
- Parser already has `parse_expression()` - battle-tested
- Executor already evaluates expressions elsewhere
- Transformers already work with expressions

### Medium Risk
- Executor sorting logic needs careful handling of:
  - Type mismatches (can't compare string to number)
  - NULL values in expression results
  - Performance (evaluating expressions for every row)

### Mitigation
- Comprehensive testing
- Error messages for type mismatches
- Consider caching evaluated expressions

## Performance Considerations

### Concern: Expression Evaluation Overhead
Evaluating `SUM(sales_amount)` for every row during sorting.

### Solution 1: Recognize Aggregate Aliases (Transformer)
```sql
-- Input
ORDER BY SUM(sales_amount)

-- Transformer rewrites to
ORDER BY __agg_1  -- Reference the alias from SELECT
```

This is exactly what `OrderByAliasTransformer` does!

### Solution 2: Expression Caching
Cache evaluation results during sort:
```rust
// Evaluate once per row, cache result
let sort_key = row_cache.get_or_insert(row_id, || {
    evaluate_expression(&expr, row, context)
});
```

## Conclusion

**Recommendation: Implement this change**

**Benefits:**
1. ✅ Supports standard SQL (`ORDER BY SUM(...)`)
2. ✅ Minimal code changes (~100 lines total)
3. ✅ Reuses existing infrastructure (`parse_expression()`)
4. ✅ Enables future features (GROUP BY expressions, etc.)
5. ✅ OrderByAliasTransformer will optimize automatically

**Effort:** ~5 hours total

**Risk:** Low - leverages existing, tested code

**Next Step:** Implement Phase 1 (AST changes)