sql-cli 1.73.1

SQL query tool for CSV/JSON with both interactive TUI and non-interactive CLI modes - perfect for exploration and automation
Documentation
# HAVING Clause - Current Support & Limitations

## Executive Summary

**Current Status:** HAVING clause is **partially implemented** and works well for common use cases.

**What Works:** ✅ HAVING with column aliases
**What Doesn't:** ❌ HAVING with aggregate functions like `COUNT(*)`, `SUM(col)`

**Recommendation:** Two-track approach:
1. **Short-term (1-2 hours):** Document current limitations and workarounds
2. **Long-term (CTE Hoister):** Full aggregate expression support via AST rewriting

---

## Current Implementation

### Architecture
**Location:** `src/data/group_by_expressions.rs` lines 288-320

**How it works:**
1. After aggregation completes for each group
2. Creates a **temporary single-row table** containing:
   - GROUP BY column values
   - Computed aggregate results (with their aliases)
3. Evaluates HAVING expression against this temp table
4. Filters out groups that don't match

**Key limitation:** Only GROUP BY columns and aggregate **aliases** are available to HAVING - original columns are not accessible.

---

## What Works Today ✅

### 1. HAVING with Aggregate Aliases
```sql
SELECT trader, SUM(notional) as total
FROM trades
GROUP BY trader
HAVING total > 50000;  -- ✅ Works!
```

### 2. HAVING with COUNT Alias
```sql
SELECT trader, COUNT(*) as trade_count
FROM trades
GROUP BY trader
HAVING trade_count > 1;  -- ✅ Works!
```

### 3. HAVING with Multiple Conditions
```sql
SELECT trader, SUM(qty) as total_qty, AVG(price) as avg_price
FROM trades
GROUP BY trader
HAVING total_qty > 100 AND avg_price > 200;  -- ✅ Works!
```

### 4. HAVING with Expressions on Aliases
```sql
SELECT trader, SUM(qty) as total_qty, COUNT(*) as count
FROM trades
GROUP BY trader
HAVING total_qty / count > 120;  -- ✅ Works!
```

### 5. HAVING with Comparison Operators
```sql
SELECT trader, MAX(price) as max_price, MIN(price) as min_price
FROM trades
GROUP BY trader
HAVING max_price - min_price > 100;  -- ✅ Works!
```

---

## What Doesn't Work ❌

### 1. HAVING with Aggregate Functions
```sql
SELECT trader, COUNT(*) as trade_count
FROM trades
GROUP BY trader
HAVING COUNT(*) > 1;  -- ❌ Fails: can't re-evaluate COUNT(*)
```

**Error:** Varies depending on aggregate:
- `COUNT(*)`: Returns 0 rows (incorrect result)
- `SUM(column)`: Error: "Column 'column' not found"

**Why:** The temp table only contains the **result** of aggregates, not the underlying data to re-compute them.

### 2. HAVING with Original Column References
```sql
SELECT trader, SUM(notional) as total
FROM trades
GROUP BY trader
HAVING notional > 10000;  -- ❌ Error: Column 'notional' not found
```

**Why:** Original columns are not included in the HAVING evaluation temp table.

### 3. HAVING with Complex Aggregate Expressions
```sql
SELECT trader, SUM(qty * price) as value
FROM trades
GROUP BY trader
HAVING SUM(qty * price) > 50000;  -- ❌ Error: Column 'qty' not found
```

**Why:** Can't access `qty` or `price` to re-compute the expression.

---

## Standard SQL Comparison

### What SQL Standard Allows
```sql
-- All of these are valid in PostgreSQL, MySQL, SQL Server:
HAVING COUNT(*) > 1                    -- ❌ We don't support
HAVING SUM(amount) > 1000              -- ❌ We don't support
HAVING AVG(price) > MAX(price) * 0.8   -- ❌ We don't support
HAVING total_amount > 1000             -- ✅ We support (alias)
```

### Implications
- **Most SQL tutorials** show `HAVING COUNT(*) > 1` as the canonical example
- Our users must learn to use aliases instead: `HAVING trade_count > 1`
- This is a significant deviation from SQL standards

---

## Workarounds (Current Best Practice)

### Pattern: Always Use Aliases
```sql
-- ❌ Don't do this (doesn't work):
SELECT trader, COUNT(*)
FROM trades
GROUP BY trader
HAVING COUNT(*) > 2;

-- ✅ Do this instead (works):
SELECT trader, COUNT(*) as trade_count
FROM trades
GROUP BY trader
HAVING trade_count > 2;
```

### Pattern: Pre-compute Complex Expressions in SELECT
```sql
-- ❌ Don't do this:
SELECT trader, SUM(qty * price) as value
FROM trades
GROUP BY trader
HAVING SUM(qty * price) > 50000;

-- ✅ Do this instead:
SELECT trader, SUM(qty * price) as value
FROM trades
GROUP BY trader
HAVING value > 50000;
```

---

## Solutions Analysis

### Option 1: Quick Documentation Fix (1 hour)
**Approach:** Document the limitation and workarounds

**Pros:**
- No code changes
- Zero risk
- Users can work around it today

**Cons:**
- Deviation from SQL standard
- Confusing for new users
- Doesn't solve the problem

**Effort:** 1 hour (update docs, add examples)

---

### Option 2: Re-evaluate Aggregates in HAVING (2-4 hours)
**Approach:** When HAVING contains an aggregate function, re-run it against the group

**Implementation:**
```rust
// In group_by_expressions.rs, line 308:
if let Some(having_expr) = having {
    // Detect if HAVING contains aggregate functions
    if contains_aggregate_function(having_expr) {
        // Re-evaluate aggregates against the original group data
        let having_result = evaluate_with_group_data(
            having_expr,
            &group_rows,  // Original rows in this group
            group_by_exprs
        )?;
    } else {
        // Current path: use aliases from temp table
        let having_result = evaluator.evaluate(having_expr, 0)?;
    }
}
```

**Pros:**
- Solves the immediate problem
- Relatively straightforward
- No parser changes needed

**Cons:**
- Duplicates aggregation logic
- Performance impact (re-computing aggregates)
- Still doesn't handle `SUM(qty * price)` case (original columns not available)
- Medium complexity

**Effort:** 2-4 hours

**Limitations:** Still can't handle expressions like `SUM(qty * price)` because original columns aren't available in the aggregate phase.

---

### Option 3: CTE Hoister Approach (Long-term, 8+ hours)
**Approach:** AST rewriting to convert HAVING with aggregates into a CTE + WHERE

**Transformation:**
```sql
-- Input SQL:
SELECT trader, COUNT(*) as trade_count
FROM trades
GROUP BY trader
HAVING COUNT(*) > 1;

-- Rewritten to:
WITH grouped AS (
  SELECT trader, COUNT(*) as trade_count
  FROM trades
  GROUP BY trader
)
SELECT trader, trade_count
FROM grouped
WHERE trade_count > 1;
```

**Pros:**
- Clean, elegant solution
- Handles ALL cases (including complex expressions)
- Aligns with SQL standard behavior
- Reuses existing WHERE clause logic
- No performance penalty (same work, different order)

**Cons:**
- Requires CTE hoister infrastructure
- More complex implementation
- Longer timeline

**Effort:** 8+ hours (requires mature CTE hoister)

**This is the "correct" long-term solution**

---

## Recommendation

### Immediate Action (Today - 1 hour)
1. ✅ Document current behavior clearly
2. ✅ Add examples showing alias-based HAVING
3. ✅ Add note: "For aggregate functions in HAVING, use aliases"
4. ✅ Create example file: `examples/having_clause_patterns.sql`

### Short-term (Next Session - Optional, 2-4 hours)
- Implement Option 2 (re-evaluate aggregates) if critical for your work
- Only if alias workaround is causing significant pain

### Long-term (Future - 8+ hours)
- Resume CTE Hoister project
- Implement AST rewriting for HAVING → CTE + WHERE
- This unblocks multiple features, not just HAVING

---

## Impact Assessment

### Current State
**User Experience:** Confusing for new users, but workable for power users who know the pattern

**Your Workflow:** Likely **not a blocker** since:
- You're already a power user who understands SQL deeply
- Alias-based HAVING works for most real-world cases
- You can work around it easily

### Priority Ranking
1. **High:** CROSS JOIN (✅ Done!)
2. **High:** Better error messages (✅ Done!)
3. **Medium:** HAVING improvements (can wait)
4. **High (long-term):** CTE Hoister (enables HAVING + other features)

---

## Examples of What Users Can Do Today

### Risk Analysis (Works)
```sql
SELECT
  trader,
  COUNT(*) as trade_count,
  SUM(notional) as total_exposure,
  AVG(price) as avg_price
FROM trades
GROUP BY trader
HAVING total_exposure > 1000000
   AND trade_count > 10
   AND avg_price > 100;
```

### Counterparty Limits (Works)
```sql
SELECT
  counterparty,
  SUM(exposure) as total_exposure,
  MAX(exposure) as max_single_trade
FROM exposures
GROUP BY counterparty
HAVING total_exposure > risk_limit
   AND max_single_trade > single_trade_limit;
```

### Volume Analysis (Works)
```sql
SELECT
  symbol,
  SUM(volume) as total_volume,
  COUNT(DISTINCT trader) as unique_traders
FROM trades
GROUP BY symbol
HAVING total_volume > 100000
   AND unique_traders > 5;
```

---

## Conclusion

**HAVING clause works well for the majority of real-world use cases** when users follow the alias pattern.

**For your trade workflows:** Alias-based HAVING should handle most scenarios. The limitation is more of a "SQL standards purity" issue than a practical blocker.

**Suggested Path Forward:**
1. Document current behavior (1 hour) ✅
2. Continue with other priorities (CTE Hoister, etc.)
3. Revisit HAVING when CTE Hoister is mature (long-term)

This avoids the trap of "heavy lifting just to say we can" while still acknowledging it as a known limitation.