sql-cli 1.73.1

SQL query tool for CSV/JSON with both interactive TUI and non-interactive CLI modes - perfect for exploration and automation
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
# PIVOT/UNPIVOT Implementation Plan

**Feature**: PIVOT/UNPIVOT SQL syntax support
**Priority**: Medium
**Difficulty**: Medium
**Estimated Effort**: 3-5 days
**Approach**: Transformer-based (rewrite to CASE expressions + GROUP BY)

## Overview

PIVOT transforms rows into columns by aggregating values. It's a common data reshaping operation, especially useful for reporting and data analysis.

## Example Use Case (FoodEaten)

### Input Data
```sql
SELECT * FROM FoodEaten;
```

| Id | Date       | FoodName | AmountEaten |
|----|------------|----------|-------------|
| 1  | 2019-08-01 | Sammich  | 2           |
| 2  | 2019-08-01 | Pickle   | 3           |
| 3  | 2019-08-01 | Apple    | 1           |
| 4  | 2019-08-02 | Sammich  | 1           |
| 5  | 2019-08-02 | Pickle   | 1           |
| 6  | 2019-08-02 | Apple    | 4           |
| 7  | 2019-08-03 | Cake     | 2           |
| 8  | 2019-08-04 | Sammich  | 1           |
| 9  | 2019-08-04 | Pickle   | 2           |
| 10 | 2019-08-04 | Apple    | 3           |

### PIVOT Query
```sql
SELECT [Date] AS 'Day',
    [Sammich], [Pickle], [Apple], [Cake]
FROM (
    SELECT [Date], FoodName, AmountEaten FROM FoodEaten
) AS SourceTable
PIVOT (
    MAX(AmountEaten)
    FOR FoodName IN ([Sammich], [Pickle], [Apple], [Cake])
) AS PivotTable
```

### Expected Output

| Day        | Sammich | Pickle | Apple | Cake |
|------------|---------|--------|-------|------|
| 2019-08-01 | 2       | 3      | 1     | NULL |
| 2019-08-02 | 1       | 1      | 4     | NULL |
| 2019-08-03 | NULL    | NULL   | NULL  | 2    |
| 2019-08-04 | 1       | 2      | 3     | NULL |

### With NULL Handling (using COALESCE)
```sql
SELECT [Date] AS 'Day',
    COALESCE([Sammich], 0) AS Sammich,
    COALESCE([Pickle],  0) AS Pickle,
    COALESCE([Apple],   0) AS Apple,
    COALESCE([Cake],    0) AS Cake
FROM (
    SELECT [Date], FoodName, AmountEaten FROM FoodEaten
) AS SourceTable
PIVOT (
    MAX(AmountEaten)
    FOR FoodName IN ([Sammich], [Pickle], [Apple], [Cake])
) AS PivotTable
```

| Day        | Sammich | Pickle | Apple | Cake |
|------------|---------|--------|-------|------|
| 2019-08-01 | 2       | 3      | 1     | 0    |
| 2019-08-02 | 1       | 1      | 4     | 0    |
| 2019-08-03 | 0       | 0      | 0     | 2    |
| 2019-08-04 | 1       | 2      | 3     | 0    |

## Transformation Strategy

### Input SQL (PIVOT syntax)
```sql
SELECT Date, Sammich, Pickle, Apple, Cake
FROM (
    SELECT Date, FoodName, AmountEaten FROM FoodEaten
) AS SourceTable
PIVOT (
    MAX(AmountEaten)
    FOR FoodName IN (Sammich, Pickle, Apple, Cake)
) AS PivotTable
```

### Transformed SQL (Standard SQL)
```sql
SELECT Date,
    MAX(CASE WHEN FoodName = 'Sammich' THEN AmountEaten ELSE NULL END) AS Sammich,
    MAX(CASE WHEN FoodName = 'Pickle' THEN AmountEaten ELSE NULL END) AS Pickle,
    MAX(CASE WHEN FoodName = 'Apple' THEN AmountEaten ELSE NULL END) AS Apple,
    MAX(CASE WHEN FoodName = 'Cake' THEN AmountEaten ELSE NULL END) AS Cake
FROM (
    SELECT Date, FoodName, AmountEaten FROM FoodEaten
) AS SourceTable
GROUP BY Date
```

## Implementation Phases

### Phase 1: Parser Support (1 day)

**Goal**: Parse PIVOT syntax into AST

**Changes needed**:
1. Extend `TableSource` enum in `src/sql/parser/ast.rs`:
```rust
pub enum TableSource {
    Table { name: String, alias: Option<String> },
    Subquery { query: Box<SelectStatement>, alias: String },

    // NEW
    Pivot {
        source: Box<TableSource>,      // The input table/subquery
        aggregate: PivotAggregate,      // MAX(AmountEaten)
        pivot_column: String,           // FoodName
        pivot_values: Vec<String>,      // [Sammich, Pickle, Apple, Cake]
        alias: Option<String>,          // PivotTable
    },
}

pub struct PivotAggregate {
    pub function: String,      // "MAX", "SUM", "MIN", etc.
    pub column: String,         // "AmountEaten"
}
```

2. Add PIVOT keyword to lexer (`src/sql/lexer.rs`):
```rust
"PIVOT" => Token::Keyword(Keyword::Pivot),
```

3. Extend parser in `src/sql/recursive_parser.rs`:
```rust
fn parse_table_source(&mut self) -> Result<TableSource, String> {
    // ... existing code for Table, Subquery ...

    // After parsing subquery or table, check for PIVOT
    if self.peek_keyword("PIVOT") {
        return self.parse_pivot(base_source);
    }

    Ok(base_source)
}

fn parse_pivot(&mut self, source: TableSource) -> Result<TableSource, String> {
    self.consume_keyword("PIVOT")?;
    self.consume(Token::LeftParen)?;

    // Parse aggregate: MAX(AmountEaten)
    let aggregate = self.parse_pivot_aggregate()?;

    self.consume_keyword("FOR")?;

    // Parse pivot column: FoodName
    let pivot_column = self.parse_identifier()?;

    self.consume_keyword("IN")?;
    self.consume(Token::LeftParen)?;

    // Parse pivot values: Sammich, Pickle, Apple, Cake
    let pivot_values = self.parse_identifier_list()?;

    self.consume(Token::RightParen)?;
    self.consume(Token::RightParen)?;

    // Optional alias
    let alias = if self.peek_keyword("AS") {
        self.consume_keyword("AS")?;
        Some(self.parse_identifier()?)
    } else {
        None
    };

    Ok(TableSource::Pivot {
        source: Box::new(source),
        aggregate,
        pivot_column,
        pivot_values,
        alias,
    })
}
```

**Testing**:
- Parse PIVOT syntax without errors
- AST structure captures all components
- Test with various aggregate functions (MAX, SUM, MIN, AVG, COUNT)

---

### Phase 2: PivotExpander Transformer (2 days)

**Goal**: Transform PIVOT AST into standard SQL with CASE expressions

**Location**: `src/query_plan/pivot_expander.rs` (new file)

**Algorithm**:
1. Extract PIVOT specification from TableSource
2. Generate CASE expression for each pivot value
3. Wrap each CASE in the aggregate function
4. Determine GROUP BY columns (all non-pivot, non-aggregate columns)
5. Build new SelectStatement with GROUP BY

**Implementation**:
```rust
pub struct PivotExpander;

impl PivotExpander {
    pub fn transform(stmt: SelectStatement) -> Result<SelectStatement> {
        // Check if FROM contains a PIVOT
        if let Some(TableSource::Pivot { source, aggregate, pivot_column, pivot_values, .. }) = &stmt.from {
            return Self::expand_pivot(stmt, source, aggregate, pivot_column, pivot_values);
        }
        Ok(stmt)
    }

    fn expand_pivot(
        stmt: SelectStatement,
        source: &TableSource,
        aggregate: &PivotAggregate,
        pivot_column: &str,
        pivot_values: &[String],
    ) -> Result<SelectStatement> {
        // 1. Build CASE expressions for each pivot value
        let mut new_select_items = Vec::new();

        // Keep non-pivoted columns (for GROUP BY)
        let group_by_columns = Self::extract_grouping_columns(source, pivot_column, &aggregate.column);

        for col in &group_by_columns {
            new_select_items.push(SelectItem::Column {
                column: ColumnRef::unquoted(col.clone()),
                leading_comments: vec![],
                trailing_comment: None,
            });
        }

        // Generate pivoted columns
        for value in pivot_values {
            let case_expr = Self::build_case_expression(
                pivot_column,
                value,
                &aggregate.column,
            );

            let aggregate_expr = SqlExpression::FunctionCall {
                name: aggregate.function.clone(),
                args: vec![case_expr],
                distinct: false,
            };

            new_select_items.push(SelectItem::Expression {
                expr: aggregate_expr,
                alias: value.clone(),
                leading_comments: vec![],
                trailing_comment: None,
            });
        }

        // 2. Build new SELECT with GROUP BY
        Ok(SelectStatement {
            select_items: new_select_items,
            from: Some(*source.clone()),
            where_clause: stmt.where_clause,
            group_by: Some(group_by_columns.iter().map(|c|
                SqlExpression::Column(ColumnRef::unquoted(c.clone()))
            ).collect()),
            having: stmt.having,
            order_by: stmt.order_by,
            limit: stmt.limit,
            ..stmt
        })
    }

    fn build_case_expression(
        pivot_column: &str,
        pivot_value: &str,
        aggregate_column: &str,
    ) -> SqlExpression {
        // CASE WHEN FoodName = 'Sammich' THEN AmountEaten ELSE NULL END
        SqlExpression::CaseExpression {
            when_branches: vec![
                WhenBranch {
                    condition: Box::new(SqlExpression::BinaryOp {
                        left: Box::new(SqlExpression::Column(
                            ColumnRef::unquoted(pivot_column.to_string())
                        )),
                        op: "=".to_string(),
                        right: Box::new(SqlExpression::StringLiteral(pivot_value.to_string())),
                    }),
                    result: Box::new(SqlExpression::Column(
                        ColumnRef::unquoted(aggregate_column.to_string())
                    )),
                }
            ],
            else_branch: Some(Box::new(SqlExpression::Null)),
        }
    }

    fn extract_grouping_columns(
        source: &TableSource,
        pivot_column: &str,
        aggregate_column: &str,
    ) -> Vec<String> {
        // For now, we'll need to infer from the source
        // This requires schema information
        // Simplified: assume all columns except pivot_column and aggregate_column
        vec!["Date".to_string()] // TODO: Make dynamic
    }
}
```

**Challenges**:
1. **Schema inference**: Need to know which columns to GROUP BY
   - Solution: Analyze subquery SELECT list
   - Exclude pivot_column and aggregate_column
   - Use remaining columns for GROUP BY

2. **Column selection in outer query**:
   - User may select specific columns: `SELECT Date, Sammich FROM ... PIVOT ...`
   - Need to filter generated columns to match SELECT list

**Testing**:
- Transform FoodEaten example correctly
- Verify CASE expressions are generated properly
- Check GROUP BY columns are correct
- Test with different aggregates (SUM, MIN, AVG)

---

### Phase 3: Integration & Testing (1 day)

**Changes needed**:
1. Register transformer in `src/execution/statement_executor.rs`:
```rust
transformers.push(Box::new(PivotExpander));
```

2. Add to transformer pipeline (after subquery processing, before execution)

**Test Cases**:

1. **Basic PIVOT** (FoodEaten example)
2. **PIVOT with SUM** instead of MAX
3. **PIVOT with COUNT**
4. **PIVOT with NULL handling** (COALESCE in outer SELECT)
5. **PIVOT with WHERE clause** on source
6. **PIVOT with ORDER BY** on result
7. **Multiple grouping columns**

**Create test files**:
- `data/food_eaten.csv` - Test dataset
- `examples/pivot_demo.sql` - Comprehensive examples
- `tests/python_tests/test_pivot.py` - Formal tests

---

### Phase 4: UNPIVOT Support (Optional - 1 day)

**UNPIVOT** does the reverse: columns → rows

**Example**:
```sql
SELECT Date, FoodName, AmountEaten
FROM PivotedData
UNPIVOT (
    AmountEaten FOR FoodName IN (Sammich, Pickle, Apple, Cake)
) AS UnpivotTable
```

**Transformation**:
```sql
SELECT Date, 'Sammich' AS FoodName, Sammich AS AmountEaten FROM PivotedData
UNION ALL
SELECT Date, 'Pickle' AS FoodName, Pickle AS AmountEaten FROM PivotedData
UNION ALL
SELECT Date, 'Apple' AS FoodName, Apple AS AmountEaten FROM PivotedData
UNION ALL
SELECT Date, 'Cake' AS FoodName, Cake AS AmountEaten FROM PivotedData
```

**Effort**: ~1 day if PIVOT is working

---

## SQL Syntax Variants to Support

### Microsoft SQL Server Style (Primary)
```sql
SELECT * FROM source
PIVOT (
    MAX(value_column)
    FOR pivot_column IN (val1, val2, val3)
) AS alias
```

### Optional Extensions
```sql
-- Multiple aggregates (harder)
PIVOT (
    MAX(amount), SUM(quantity)
    FOR month IN ('Jan', 'Feb', 'Mar')
)
```

---

## NULL Handling

PIVOT naturally produces NULLs when no data exists for a combination.

**User can handle with COALESCE**:
```sql
SELECT Date,
    COALESCE(Sammich, 0) AS Sammich,
    COALESCE(Pickle, 0) AS Pickle
FROM ... PIVOT ...
```

**Note**: We already have COALESCE function, so this works!

---

## Edge Cases to Handle

1. **No data for a pivot value** → NULL (expected)
2. **Pivot column contains NULL** → Skip (or include as "NULL" value?)
3. **Empty result set** → Empty output
4. **Duplicate pivot values** → Should be caught by aggregate (MAX/MIN/SUM handle this)
5. **Pivot values not in data** → All NULL column
6. **Very large number of pivot values** → Performance concern (100s of CASE expressions)

---

## Performance Considerations

PIVOT becomes:
- N CASE expressions (one per pivot value)
- GROUP BY on remaining columns
- One table scan

**Performance**: Should be comparable to manual CASE + GROUP BY (which users would write anyway)

**Optimization opportunity**: If pivot values are known to be mutually exclusive, we could use simpler logic, but CASE/GROUP BY is safe and standard.

---

## Documentation Updates

1. **README.md** - Add PIVOT to feature list
2. **CHANGELOG.md** - Document new feature
3. **examples/pivot_demo.sql** - Comprehensive examples
4. **docs/SQL_FEATURE_GAPS_AND_ROADMAP.md** - Mark as completed

---

## Success Criteria

- ✅ FoodEaten example works correctly
- ✅ Output matches expected pivoted table
- ✅ COALESCE for NULL handling works
- ✅ Multiple aggregates supported (MAX, SUM, MIN, AVG, COUNT)
- ✅ Works in all execution modes (-q, -f, --execute-statement)
- ✅ --show-transformations shows PIVOT → CASE rewrite
- ✅ Formal Python test coverage
- ✅ Example file demonstrates use cases

---

## Timeline

**Day 1**: Parser support + basic AST
**Day 2**: PivotExpander transformer (core logic)
**Day 3**: Schema inference + GROUP BY column detection
**Day 4**: Testing + edge cases
**Day 5**: UNPIVOT (optional), documentation, examples

**Total**: 4-5 days

---

## Next Steps

1. Create FoodEaten test dataset
2. Implement parser support for PIVOT keyword
3. Build PivotExpander transformer
4. Test with FoodEaten example
5. Add comprehensive test coverage
6. Consider UNPIVOT if PIVOT goes smoothly

This feature would be a **significant addition** to SQL-CLI's data reshaping capabilities!