sql-cli 1.73.1

SQL query tool for CSV/JSON with both interactive TUI and non-interactive CLI modes - perfect for exploration and automation
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
# SQL Preprocessor Transformers

## Overview

SQL CLI uses AST (Abstract Syntax Tree) transformers to **rewrite queries before execution**. This allows us to support advanced SQL features that would be extremely complex or impossible to implement directly in the query executor.

**Key Principle:** Transformers are **enabled by default** (opt-out, not opt-in). This ensures queries "just work" without requiring users to enable flags.

## Philosophy: Opt-Out, Not Opt-In

All transformers are enabled by default because:

1. **Users expect standard SQL to work** - Features like `WHERE` with SELECT aliases should "just work"
2. **Complexity is hidden** - Users don't need to know about internal rewrites
3. **Compatibility first** - Only disable if a transformer breaks an existing query
4. **Future-proof** - New transformers can be added without breaking changes

If a transformer causes issues with specific queries, users can disable it with `--no-*` flags.

## Available Transformers

### 1. Expression Lifter (`ExpressionLifterTransformer`)

**Purpose:** Enables window functions and complex expressions in WHERE, GROUP BY, and HAVING clauses

**Flag:** `--no-expression-lifter` (disable)

**What it does:**
- Detects window functions or complex expressions in clauses where they can't be evaluated directly
- Creates a CTE that computes the expression
- Replaces the expression with a column reference to the CTE

**Example:**
```sql
-- Input
SELECT region, sales_amount,
       ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS rank
FROM sales
WHERE ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales_amount DESC) <= 3

-- Rewritten to (conceptually)
WITH _lifted_0 AS (
    SELECT region, sales_amount,
           ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS _expr_0
    FROM sales
)
SELECT region, sales_amount, _expr_0 AS rank
FROM _lifted_0
WHERE _expr_0 <= 3
```

**Why it matters:**
- Enables "top N per group" queries
- Allows filtering on window function results
- Supports complex CASE expressions in GROUP BY

**Status:** ✅ Implemented, enabled by default

---

### 2. WHERE Alias Expander (`WhereAliasExpander`)

**Purpose:** Allows using SELECT column aliases in WHERE clauses

**Flag:** `--no-where-expansion` (disable)

**What it does:**
- Finds alias references in WHERE clause
- Replaces them with the original expression from SELECT

**Example:**
```sql
-- Input
SELECT sales_amount * 1.1 AS adjusted, region
FROM sales
WHERE adjusted > 15000

-- Rewritten to
SELECT sales_amount * 1.1 AS adjusted, region
FROM sales
WHERE (sales_amount * 1.1) > 15000
```

**Why it matters:**
- Standard SQL feature that most engines don't support natively
- Makes queries more readable (define expression once, use multiple times)
- Reduces duplication

**Status:** ✅ Implemented, enabled by default

---

### 3. GROUP BY Alias Expander (`GroupByAliasExpander`)

**Purpose:** Allows using SELECT column aliases in GROUP BY clauses

**Flag:** `--no-group-by-expansion` (disable)

**What it does:**
- Finds alias references in GROUP BY clause
- Replaces them with the original expression from SELECT

**Example:**
```sql
-- Input
SELECT UPPER(region) AS region_code, SUM(sales_amount) AS total
FROM sales
GROUP BY region_code

-- Rewritten to
SELECT UPPER(region) AS region_code, SUM(sales_amount) AS total
FROM sales
GROUP BY UPPER(region)
```

**Why it matters:**
- SQL standard allows this (SELECT clause is conceptually evaluated first)
- Makes queries more maintainable
- Reduces expression duplication

**Status:** ✅ Implemented, enabled by default

---

### 4. HAVING Alias Transformer (`HavingAliasTransformer`)

**Purpose:** Automatically creates aliases for aggregates in SELECT, rewrites HAVING to use them

**Flag:** `--no-having-expansion` (disable)

**What it does:**
- Detects aggregate functions in HAVING clause
- Creates aliases for them in SELECT (if not already present)
- Rewrites HAVING to reference the aliases

**Example:**
```sql
-- Input
SELECT region, SUM(sales_amount)
FROM sales
GROUP BY region
HAVING SUM(sales_amount) > 50000

-- Rewritten to
SELECT region, SUM(sales_amount) AS _agg_0
FROM sales
GROUP BY region
HAVING _agg_0 > 50000
```

**Why it matters:**
- Avoids recomputing aggregates (compute once in SELECT, filter in HAVING)
- Simplifies executor logic
- Improves performance

**Status:** ✅ Implemented, enabled by default

---

### 5. CTE Hoister (`CTEHoisterTransformer`)

**Purpose:** Moves nested CTEs to the top level for cleaner AST structure

**Flag:** `--no-cte-hoister` (disable)

**What it does:**
- Finds CTEs in subqueries or nested contexts
- Hoists them to the top level
- Renames if necessary to avoid conflicts

**Example:**
```sql
-- Input (when we support subqueries)
SELECT * FROM (
    WITH regional AS (SELECT region, SUM(sales_amount) AS total FROM sales GROUP BY region)
    SELECT * FROM regional WHERE total > 1000
) AS filtered

-- Rewritten to
WITH regional AS (SELECT region, SUM(sales_amount) AS total FROM sales GROUP BY region)
SELECT * FROM (SELECT * FROM regional WHERE total > 1000) AS filtered
```

**Why it matters:**
- Simplifies AST structure
- Enables better optimization
- Prepares for future subquery support

**Status:** ✅ Implemented, enabled by default (ready for when we add subqueries)

---

### 6. IN Operator Lifter (`InOperatorLifterTransformer`)

**Purpose:** Optimizes large IN lists by converting to temporary tables or subqueries

**Flag:** `--no-in-lifter` (disable)

**What it does:**
- Detects IN clauses with many values (e.g., 100+ items)
- Converts to a temporary table or CTE
- Rewrites IN to use a subquery

**Example:**
```sql
-- Input (with 100+ values)
SELECT * FROM sales WHERE region IN ('East', 'West', ..., 'value_100')

-- Rewritten to (conceptually)
WITH _in_list_0 AS (
    SELECT 'East' AS value UNION ALL SELECT 'West' UNION ALL ...
)
SELECT * FROM sales WHERE region IN (SELECT value FROM _in_list_0)
```

**Why it matters:**
- Performance optimization for large IN lists
- Reduces memory pressure
- Enables index usage (in future)

**Status:** ✅ Implemented, enabled by default

---

## Transformer Pipeline

Transformers are applied in a **specific order** because some depend on others:

```
1. ExpressionLifter      (lifts complex expressions to CTEs)
   2. WhereAliasExpander    (expands aliases in WHERE)
   3. GroupByAliasExpander  (expands aliases in GROUP BY)
   4. HavingAliasTransformer (adds aliases for aggregates, rewrites HAVING)
   5. CTEHoister            (hoists nested CTEs to top level)
   6. InOperatorLifter      (optimizes large IN lists)
```

**Order matters!** For example:
- `ExpressionLifter` must run **before** `CTEHoister` (it creates CTEs that might need hoisting)
- `HavingAliasTransformer` must run **after** GROUP BY expansion (needs to see final GROUP BY structure)

## Configuration

### CLI Flags (Both `-q` and `-f` modes)

```bash
# Disable individual transformers
./target/release/sql-cli -q "SELECT ..." --no-expression-lifter
./target/release/sql-cli -q "SELECT ..." --no-where-expansion
./target/release/sql-cli -q "SELECT ..." --no-group-by-expansion
./target/release/sql-cli -q "SELECT ..." --no-having-expansion
./target/release/sql-cli -q "SELECT ..." --no-cte-hoister
./target/release/sql-cli -q "SELECT ..." --no-in-lifter

# Combine multiple disables
./target/release/sql-cli -q "SELECT ..." --no-expression-lifter --no-where-expansion

# Show preprocessing steps (currently limited output)
./target/release/sql-cli -q "SELECT ..." --show-preprocessing
```

### Programmatic Configuration

```rust
use crate::query_plan::TransformerConfig;

// All enabled (default)
let config = TransformerConfig::all_enabled();

// Custom configuration
let config = TransformerConfig {
    enable_expression_lifter: true,
    enable_where_expansion: true,
    enable_group_by_expansion: true,
    enable_having_expansion: true,
    enable_cte_hoister: false,  // Disable this one
    enable_in_lifter: true,
};

// Create pipeline with config
let mut pipeline = create_pipeline_with_config(false, config);
let transformed = pipeline.process(statement)?;
```

## Testing Transformers

### Test File
See `examples/expander_rewriters.sql` for comprehensive examples of all transformers.

### Quick Tests

```bash
# Test WHERE alias expansion
./target/release/sql-cli -q "SELECT sales_amount * 1.1 AS adj FROM sales WHERE adj > 15000" -d data/sales_data.csv

# Test GROUP BY alias expansion
./target/release/sql-cli -q "SELECT UPPER(region) AS r, SUM(sales_amount) AS t FROM sales GROUP BY r" -d data/sales_data.csv

# Test HAVING auto-aliasing
./target/release/sql-cli -q "SELECT region, SUM(sales_amount) FROM sales GROUP BY region HAVING SUM(sales_amount) > 50000" -d data/sales_data.csv

# Test expression lifter (window function in WHERE)
./target/release/sql-cli -q "SELECT region, ROW_NUMBER() OVER (ORDER BY sales_amount) AS rn FROM sales WHERE ROW_NUMBER() OVER (ORDER BY sales_amount) <= 5" -d data/sales_data.csv

# Run all examples
./target/release/sql-cli -f examples/expander_rewriters.sql
```

## Unified Execution Modes

**IMPORTANT:** As of the execution mode unification (Phases 0-3), both `-q` (single query) and `-f` (script) modes use the **same transformer pipeline**.

This means:
- ✅ All transformers work in both modes
- ✅ Same flags control both modes
- ✅ Consistent behavior everywhere
- ✅ Neovim plugin can use same transformers

**No more mode-specific features!**

## Adding New Transformers

To add a new transformer:

1. **Create transformer file** in `src/query_plan/your_transformer.rs`
2. **Implement `AstTransformer` trait**
3. **Add flag** to `NonInteractiveConfig` in `src/non_interactive.rs`
4. **Add to `TransformerConfig`** in `src/query_plan/mod.rs`
5. **Wire into pipeline** in `create_pipeline_with_config()`
6. **Add tests** in `tests/`
7. **Update this document**
8. **Add example** to `examples/expander_rewriters.sql`

**Remember:** New transformers should be **enabled by default** unless they have compatibility concerns.

## SQL Feature Gaps & Future Transformers

### Currently Missing (Could Add with Transformers)

1. **DISTINCT in aggregates** - `COUNT(DISTINCT column)`
   - Could lift to CTE with DISTINCT, then count

2. **Correlated subqueries** - `WHERE x IN (SELECT y FROM other WHERE other.z = main.z)`
   - Could rewrite as JOIN or EXISTS

3. **LATERAL joins** - `FROM table1, LATERAL (SELECT ... FROM table2 WHERE table2.id = table1.id)`
   - Could rewrite as correlated subquery or window function

4. **Recursive CTEs** - `WITH RECURSIVE ...`
   - Would need iterator/recursion support in executor

5. **PIVOT/UNPIVOT** - Transpose rows to columns
   - Could generate CASE expressions

6. **QUALIFY clause** - Filter on window functions (Snowflake syntax)
   - Essentially `WHERE` but for window functions (we have ExpressionLifter for this!)

7. **Computed columns in CTEs** - More aggressive lifting
   - Extract complex expressions to earlier CTEs

8. **Aggregate expressions in ORDER BY** - `ORDER BY SUM(amount) DESC`
   - Similar to HAVING transformer

### Priority Ranking

**High Priority** (standard SQL, high user value):
1. DISTINCT in aggregates
2. Correlated subqueries (basic)
3. Aggregate expressions in ORDER BY

**Medium Priority** (useful, but workarounds exist):
4. PIVOT/UNPIVOT (can be done with CASE)
5. QUALIFY clause (can use ExpressionLifter)

**Low Priority** (complex, limited use cases):
6. LATERAL joins
7. Recursive CTEs
8. Advanced correlated subqueries

## Performance Considerations

### Overhead
- Each transformer pass requires AST traversal
- Typical cost: ~1-2ms for complex queries
- Acceptable for interactive use

### Benefits
- Preprocessed queries often **faster** than naive execution
- Example: HAVING transformer avoids recomputing aggregates
- Example: IN lifter can enable index usage (future)

### When to Disable
- **Never disable for compatibility** - fix the transformer instead
- **Only disable for performance** - if profiling shows specific transformer is slow
- **Temporary debugging** - isolate which transformer causes an issue

## Debugging

### See Transformed SQL

Currently `--show-preprocessing` has limited output. To see the actual transformed AST:

```bash
# Use --query-plan to see the AST
./target/release/sql-cli -q "SELECT ..." --query-plan

# Enable debug logging
RUST_LOG=debug ./target/release/sql-cli -q "SELECT ..." 2>&1 | grep -i transform
```

### Isolate Transformer Issues

```bash
# Disable all except one
./target/release/sql-cli -q "SELECT ..." \
    --no-where-expansion \
    --no-group-by-expansion \
    --no-having-expansion \
    --no-cte-hoister \
    --no-in-lifter
    # Only ExpressionLifter is running

# Test with no transformers
./target/release/sql-cli -q "SELECT ..." \
    --no-expression-lifter \
    --no-where-expansion \
    --no-group-by-expansion \
    --no-having-expansion \
    --no-cte-hoister \
    --no-in-lifter
```

## Summary

Preprocessor transformers are the **secret sauce** that makes SQL CLI powerful:

- **Enabled by default** - queries "just work"
-**Fill feature gaps** - support SQL that executor can't handle
-**Unified across modes** - `-q`, `-f`, and TUI all use same pipeline
-**Extensible** - easy to add new transformers
-**Opt-out** - disable if needed, but shouldn't be necessary

**The goal:** Support as much standard SQL as possible by rewriting it into forms our executor can handle, without users needing to think about it.