sql-cli 1.73.1

SQL query tool for CSV/JSON with both interactive TUI and non-interactive CLI modes - perfect for exploration and automation
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
# SQL Feature Gaps & Transformer Roadmap

## Current State (Post-Unification)

As of the execution mode unification completion, we have:

✅ **Solid Foundation:**
- Unified execution path (`StatementExecutor` used by all modes: `-q`, `-f`, `--execute-statement`)
- 7 active preprocessor transformers (all enabled by default)
- WHERE/GROUP BY alias expansion
- HAVING auto-aliasing
- ORDER BY aggregate rewriting
- Expression lifting for window functions
- CTE hoisting (ready for subqueries)
- IN operator optimization

✅ **What Works Well:**
- Basic SELECT, WHERE, GROUP BY, HAVING, ORDER BY
- Window functions (ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, etc.)
- CTEs (WITH clauses)
- Aggregates (SUM, COUNT, AVG, MIN, MAX)
- String functions (UPPER, LOWER, TRIM, SUBSTR, etc.)
- Math functions
- Date functions
- Unit conversions (CONVERT)
- JSON data sources
- CSV data sources
- Template expansion
- Temp tables

## Feature Gaps (Ordered by Priority)

### HIGH PRIORITY - Standard SQL Features

#### 1. DISTINCT in Aggregates
**Status:** ✅ **COMPLETED!**
**Difficulty:** Medium
**Example:**
```sql
SELECT region, COUNT(DISTINCT salesperson) AS unique_salespeople
FROM sales
GROUP BY region
```

**Implementation:**
- ✅ Fully functional in executor (no transformer needed)
- ✅ Supports COUNT(DISTINCT x), SUM(DISTINCT x), AVG(DISTINCT x)
- ✅ Works with all aggregate functions
- ✅ Efficient deduplication within groups

**Completed:** Already working (v1.42.0)

---

#### 2. Aggregate Expressions in ORDER BY
**Status:** ✅ **COMPLETED!**
**Difficulty:** Easy
**Example:**
```sql
SELECT region, SUM(sales_amount) AS total
FROM sales
GROUP BY region
ORDER BY SUM(sales_amount) DESC  -- Works perfectly!
```

**Implementation:**
- **Transformer:** `OrderByAliasTransformer` (implemented)
- ✅ Extended parser to support expressions in ORDER BY
- ✅ Updated AST with `OrderByItem` struct
- ✅ Handles COUNT(*), all aggregates, auto-generates aliases
- ✅ Works in all execution modes (fixed dependency-aware path)
- ✅ See `examples/order_by_expressions.sql` with 8 examples
- ✅ Formal test coverage

**Completed:** 2025-11-01

---

#### 3. Basic Correlated Subqueries (EXISTS/NOT EXISTS)
**Status:** ❌ Not implemented
**Difficulty:** Hard
**Example:**
```sql
SELECT *
FROM sales s1
WHERE EXISTS (
    SELECT 1
    FROM sales s2
    WHERE s2.region = s1.region
    AND s2.sales_amount > s1.sales_amount
)
```

**Implementation Approach:**
- **Transformer:** `CorrelatedSubqueryRewriter`
- Detect correlated subqueries
- Rewrite as self-join with appropriate conditions
- Handle EXISTS → semi-join, NOT EXISTS → anti-join

**Rewrite:**
```sql
-- Input
SELECT * FROM sales s1
WHERE EXISTS (SELECT 1 FROM sales s2 WHERE s2.region = s1.region AND s2.sales_amount > s1.sales_amount)

-- Becomes
SELECT DISTINCT s1.*
FROM sales s1
INNER JOIN sales s2 ON s2.region = s1.region AND s2.sales_amount > s1.sales_amount
```

**Estimated Effort:** 1-2 weeks (requires JOIN support first)

---

### MEDIUM PRIORITY - Quality of Life Features

#### 4. QUALIFY Clause (Snowflake-style)
**Status:** ✅ **COMPLETED!**
**Difficulty:** Easy
**Example:**
```sql
SELECT region, sales_amount,
       ROW_NUMBER() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS rn
FROM sales
QUALIFY rn <= 3  -- Cleaner than WHERE with window function
```

**Implementation:**
- ✅ Full parser support for QUALIFY clause
- ✅ Executor handles window function filtering
- ✅ Works with all window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
- ✅ See examples in documentation

**Completed:** 2025-11-01 (v1.64.0)

---

#### 5. PIVOT/UNPIVOT
**Status:** ❌ Not implemented
**Difficulty:** Medium
**Example:**
```sql
SELECT *
FROM sales
PIVOT (SUM(sales_amount) FOR month IN ('2024-01', '2024-02', '2024-03'))
```

**Implementation Approach:**
- **Transformer:** `PivotExpander`
- Generate CASE expressions for each pivot value
- Create aggregates with CASE WHEN conditions

**Rewrite:**
```sql
-- Input
PIVOT (SUM(sales_amount) FOR month IN ('2024-01', '2024-02', '2024-03'))

-- Becomes
SELECT
    region,
    SUM(CASE WHEN month = '2024-01' THEN sales_amount ELSE 0 END) AS "2024-01",
    SUM(CASE WHEN month = '2024-02' THEN sales_amount ELSE 0 END) AS "2024-02",
    SUM(CASE WHEN month = '2024-03' THEN sales_amount ELSE 0 END) AS "2024-03"
FROM sales
GROUP BY region
```

**Estimated Effort:** 3-5 days

---

#### 6. Array/List Aggregate Functions
**Status:** ❌ Not implemented
**Difficulty:** Medium
**Example:**
```sql
SELECT region, ARRAY_AGG(salesperson) AS salespeople
FROM sales
GROUP BY region
```

**Implementation Approach:**
- Implement ARRAY_AGG, STRING_AGG, LIST, etc.
- Return as JSON array or delimited string
- May require new DataValue type

**Estimated Effort:** 3-4 days

---

### LOW PRIORITY - Advanced Features

#### 7. Recursive CTEs
**Status:** ❌ Not implemented
**Difficulty:** Very Hard
**Example:**
```sql
WITH RECURSIVE hierarchy AS (
    SELECT id, parent_id, 1 AS level FROM employees WHERE parent_id IS NULL
    UNION ALL
    SELECT e.id, e.parent_id, h.level + 1
    FROM employees e
    JOIN hierarchy h ON e.parent_id = h.id
)
SELECT * FROM hierarchy
```

**Implementation Approach:**
- **Executor-level change** (not a transformer)
- Requires iterative evaluation
- Track recursion depth
- Handle termination conditions

**Estimated Effort:** 2-3 weeks

---

#### 8. LATERAL Joins
**Status:** ❌ Not implemented
**Difficulty:** Hard
**Example:**
```sql
SELECT s1.region, top_sales.sales_amount
FROM sales s1,
LATERAL (
    SELECT sales_amount
    FROM sales s2
    WHERE s2.region = s1.region
    ORDER BY sales_amount DESC
    LIMIT 3
) AS top_sales
```

**Implementation Approach:**
- **Transformer:** Rewrite as correlated subquery or window function
- May require multiple passes

**Estimated Effort:** 1-2 weeks

---

#### 9. Multi-Value INSERT (for temp tables)
**Status:** ❌ Not implemented
**Difficulty:** Medium
**Example:**
```sql
INSERT INTO #temp_regions (region, target)
VALUES ('North', 100000), ('South', 120000), ('East', 90000), ('West', 130000)
```

**Implementation Approach:**
- Extend script parser to handle INSERT
- Support VALUES with multiple rows
- Wire into temp table system

**Estimated Effort:** 3-4 days

---

### NICE TO HAVE - Syntactic Sugar

#### 10. SELECT * EXCLUDE
**Status:** ✅ **COMPLETED!**
**Difficulty:** Easy
**Example:**
```sql
SELECT * EXCLUDE (id, internal_id) FROM sales
SELECT * EXCLUDE (password, ssn) FROM users WHERE active = true
```

**Implementation:**
- ✅ Parser support for `* EXCLUDE (column_list)` syntax
- ✅ AST extension with `SelectItem::StarExclude` variant
- ✅ Query engine expands at execution time (no transformer needed)
- ✅ Case-insensitive column matching
- ✅ Works with WHERE, ORDER BY, LIMIT, GROUP BY
- ✅ DuckDB compatibility

**Completed:** 2025-11-02 (v1.66.0)

---

#### 11. SELECT * REPLACE
**Status:** ❌ Not implemented
**Difficulty:** Easy
**Example:**
```sql
SELECT * REPLACE (UPPER(region) AS region) FROM sales
```

**Implementation Approach:**
- Parser support for `* REPLACE (expr AS column)` syntax
- Similar to EXCLUDE but replaces column values
- Expand at execution time in query engine

**Estimated Effort:** 1-2 days

---

#### 12. ILIKE (Case-Insensitive LIKE)
**Status:** ✅ **COMPLETED!**
**Difficulty:** Trivial
**Example:**
```sql
SELECT * FROM sales WHERE region ILIKE '%north%'
```

**Implementation:**
- **Transformer:** `ILikeToLikeTransformer` rewrites to `UPPER(column) LIKE UPPER(pattern)`
- ✅ Added LIKE operator support to arithmetic evaluator
- ✅ Full pattern matching with `%` (any chars) and `_` (single char) wildcards
- ✅ Works in all execution modes (-q, -f, --execute-statement)
- ✅ PostgreSQL compatibility

**Completed:** 2025-11-01 (v1.65.0)

---

## Implementation Priority Matrix

| Feature | User Value | Difficulty | SQL Standard | Priority | Status |
|---------|-----------|-----------|--------------|----------|--------|
| ~~ORDER BY aggregates~~ | High | Easy | Yes | ~~**1**~~ | ✅ Done (v1.64.0) |
| ~~DISTINCT in aggregates~~ | High | Medium | Yes | ~~**1**~~ | ✅ Done (v1.42.0) |
| ~~QUALIFY clause~~ | Medium | Easy | No (Snowflake) | ~~**2**~~ | ✅ Done (v1.64.0) |
| ~~ILIKE~~ | Low | Trivial | No (Postgres) | ~~**1**~~ | ✅ Done (v1.65.0) |
| ~~SELECT * EXCLUDE~~ | Low | Easy | No (DuckDB) | ~~**1**~~ | ✅ Done (v1.66.0) |
| SELECT * REPLACE | Low | Easy | No (DuckDB) | **1** | 📋 Next |
| PIVOT/UNPIVOT | Medium | Medium | Yes (SQL:2016) | **3** | 📋 Ready |
| ARRAY_AGG/STRING_AGG | Medium | Medium | Yes | **4** | 📋 Ready |
| Correlated subqueries | High | Hard | Yes | **5** | ⚠️ Complex |
| LATERAL joins | Low | Hard | Yes | **6** | ⚠️ Complex |
| Recursive CTEs | Low | Very Hard | Yes | **7** | ⚠️ Complex |

## Next Steps

### Immediate (1-2 weeks)
1. ✅ Complete execution mode unification (Phases 0-3) - **DONE!**
2. ✅ Document all transformers - **DONE!**
3. ✅ Create `examples/expander_rewriters.sql` - **DONE!**
4. ✅ Implement ORDER BY aggregate expansion - **DONE!** (2025-11-01)
5. ✅ Add QUALIFY clause support - **DONE!** (2025-11-01)
6. ✅ DISTINCT in aggregates - **Already working!** (v1.42.0)
7. **NEXT:** Quick wins with transformers (ILIKE, SELECT * EXCLUDE)

### Short-term (1-2 months)
1. ✅ Add ILIKE operator (case-insensitive LIKE) - **DONE!** (v1.65.0)
2. ✅ Implement SELECT * EXCLUDE - **DONE!** (v1.66.0)
3. Implement SELECT * REPLACE - **1-2 days** ⬅️ **NEXT**
4. Implement PIVOT/UNPIVOT - **3-5 days**
5. Add ARRAY_AGG and STRING_AGG - **3-4 days**
6. Improve window function support (PARTITION BY expressions)

### Medium-term (3-6 months)
1. Basic JOIN support (INNER, LEFT, RIGHT)
2. Correlated subqueries (EXISTS, NOT EXISTS, IN with subquery)
3. Subquery support in FROM clause
4. More window functions (NTILE, CUME_DIST, PERCENT_RANK)

### Long-term (6+ months)
1. LATERAL joins
2. Recursive CTEs
3. Full JOIN optimization
4. Query planner/optimizer

## Guidelines for Adding Transformers

When deciding whether to add a new transformer:

### ✅ Good Candidates for Transformers:
- **Syntactic sugar** (can be rewritten to simpler SQL)
- **Standard SQL features** that are complex to execute directly
- **High user value** (frequently requested)
- **Clear rewrite strategy** (deterministic transformation)
- **No executor changes needed** (pure AST manipulation)

### ❌ Bad Candidates for Transformers:
- **Fundamental execution changes** (e.g., JOIN algorithms)
- **Performance optimizations** (belongs in query planner)
- **Data type changes** (belongs in type system)
- **I/O operations** (reading files, network calls)

### Decision Framework:
1. **Can it be rewritten to simpler SQL?** → Transformer
2. **Requires new execution logic?** → Executor enhancement
3. **Needs performance optimization?** → Query planner (future)
4. **Syntactic convenience?** → Transformer
5. **Data manipulation?** → Executor or function

## Conclusion

The transformer-based approach gives us a **powerful lever** to add SQL features:

- **Proven:** 7 transformers working in production (v1.64.0)
-**Unified:** Same transformers in all execution modes (-q, -f, --execute-statement)
-**Extensible:** Easy to add new transformers
-**Maintainable:** Each transformer is independent
-**Testable:** Can test transformers in isolation
-**Debuggable:** --show-transformations flag shows entire pipeline

**Philosophy:** Use transformers to fill SQL feature gaps wherever possible, resort to executor changes only when necessary.

**Recent Wins:**
- ✅ v1.64.0: ORDER BY aggregates, QUALIFY clause, Unified execution
- ✅ v1.65.0: ILIKE operator - PostgreSQL compatibility
- ✅ v1.66.0: SELECT * EXCLUDE - DuckDB compatibility

**Transformer Count:** 8 active transformers in production pipeline

**Next Quick Win:**
1. SELECT * REPLACE (1-2 days) - Column value replacement ⬅️ **NEXT**
2. PIVOT/UNPIVOT (5 days) - CASE expression generation

This approach allows us to **incrementally improve** SQL support without major architectural changes, while keeping the executor simple and focused.