sql-cli 1.73.1 - Docs.rs

# Temporary Tables Design (#tmp Tables)

## Overview
This document outlines the design for temporary tables that persist across query batches within a script execution, enabling query result reuse and dynamic data injection into subsequent WEB CTEs.

## Use Cases

### Phase 1: Basic Persistence
```sql
-- Fetch trade data once, reuse multiple times
WITH WEB trades AS (
    URL 'https://api.example.com/trades'
    BODY { "date": "2024-01-15" }
    FORMAT JSON
)
SELECT * FROM trades INTO #tmp_trades;
GO

-- Later queries can reference #tmp_trades without refetching
SELECT COUNT(*) FROM #tmp_trades WHERE status = 'completed';
GO

SELECT symbol, AVG(price) FROM #tmp_trades GROUP BY symbol;
GO
```

### Phase 2: Dynamic Query Injection
```sql
-- Extract trade IDs from first query
WITH WEB trades AS (...)
SELECT trade_id FROM trades INTO #trade_ids;
GO

-- Use those IDs in next query's body
WITH WEB allocations AS (
    URL 'https://api.example.com/allocations'
    BODY {
        "trade_ids": #trade_ids  -- Expanded to array
    }
    FORMAT JSON
)
SELECT * FROM allocations;
GO
```

## Architecture

### 1. Current Script Execution Flow

```
┌─────────────────────────────────────────┐
│  Script File                             │
│  --------------------------------        │
│  SELECT ... FROM web_cte INTO #tmp;      │
│  GO                                      │
│  SELECT * FROM #tmp;                     │
│  GO                                      │
└─────────────────┬───────────────────────┘
                  ▼
┌─────────────────────────────────────────┐
│  ScriptParser::parse_statements()       │
│  Splits on "GO" separator                │
└─────────────────┬───────────────────────┘
                  ▼
┌─────────────────────────────────────────┐
│  non_interactive::execute_script()       │
│  Loop: for each statement                │
│    1. Parse SQL                          │
│    2. Execute query                      │
│    3. Display results                    │
│    4. **Context is lost**                │
└─────────────────────────────────────────┘
```

**Problem**: Each statement executes independently - no shared state between queries.

### 2. New Architecture with Temp Tables

```
┌─────────────────────────────────────────┐
│  TempTableRegistry (New)                 │
│  --------------------------------        │
│  - HashMap<String, Arc<DataTable>>       │
│  - Lives for script duration             │
│  - Not persisted to Redis                │
│  - Dropped after script completes        │
└─────────────────┬───────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────┐
│  Enhanced execute_script()               │
│  --------------------------------        │
│  let mut temp_tables = TempTableRegistry │
│                                          │
│  for statement in statements:            │
│    1. Parse SQL                          │
│    2. Check for INTO #tmp                │
│    3. Execute with temp_tables context   │
│    4. If INTO: store result              │
│    5. If FROM #tmp: resolve from registry│
└─────────────────┬───────────────────────┘
                  ▼
┌─────────────────────────────────────────┐
│  QueryEngine (Enhanced)                  │
│  --------------------------------        │
│  - Accept TempTableRegistry              │
│  - Resolve #tmp in FROM clause           │
│  - Treat #tmp like CTE                   │
└─────────────────────────────────────────┘
```

## Implementation Plan

### Phase 1: Basic Persistence (Immediate Value)

#### 1.1 Add TempTableRegistry

```rust
// src/data/temp_table_registry.rs
use std::collections::HashMap;
use std::sync::Arc;
use crate::data::datatable::DataTable;

pub struct TempTableRegistry {
    tables: HashMap<String, Arc<DataTable>>,
}

impl TempTableRegistry {
    pub fn new() -> Self {
        Self {
            tables: HashMap::new(),
        }
    }

    /// Store a temporary table
    /// Returns error if table already exists
    pub fn insert(&mut self, name: String, table: Arc<DataTable>) -> Result<()> {
        if self.tables.contains_key(&name) {
            // Option 1: Error on duplicate
            return Err(anyhow!("Temporary table {} already exists", name));

            // Option 2: Drop and replace (SQL Server behavior)
            // self.tables.insert(name, table);
        } else {
            self.tables.insert(name, table);
        }
        Ok(())
    }

    /// Get a temporary table by name
    pub fn get(&self, name: &str) -> Option<Arc<DataTable>> {
        self.tables.get(name).cloned()
    }

    /// Check if a temporary table exists
    pub fn contains(&self, name: &str) -> bool {
        self.tables.contains_key(name)
    }

    /// Drop a temporary table
    pub fn drop(&mut self, name: &str) -> bool {
        self.tables.remove(name).is_some()
    }

    /// Get count of temp tables
    pub fn count(&self) -> usize {
        self.tables.len()
    }

    /// Clear all temp tables
    pub fn clear(&mut self) {
        self.tables.clear();
    }
}
```

#### 1.2 Extend AST for SELECT INTO

```rust
// In src/sql/parser/ast.rs

#[derive(Debug, Clone, PartialEq)]
pub struct SelectStatement {
    // ... existing fields ...

    /// Optional INTO clause for creating temporary tables
    pub into_table: Option<IntoTable>,
}

#[derive(Debug, Clone, PartialEq)]
pub struct IntoTable {
    /// Name of the temporary table (with # prefix)
    pub name: String,
    /// Whether to replace if exists
    pub replace_if_exists: bool,
}
```

#### 1.3 Parser Enhancement

```rust
// In src/sql/parser/mod.rs or recursive_parser.rs

impl Parser {
    fn parse_select_statement(&mut self) -> Result<SelectStatement> {
        // Parse SELECT ...
        let select_items = self.parse_select_items()?;

        // Parse FROM ...
        let from = self.parse_from_clause()?;

        // NEW: Check for INTO clause
        let into_table = if self.current_token_is_keyword("INTO") {
            self.consume_keyword("INTO")?;
            Some(self.parse_into_clause()?)
        } else {
            None
        };

        // Continue with WHERE, GROUP BY, etc.
        // ...

        Ok(SelectStatement {
            // ... existing fields ...
            into_table,
        })
    }

    fn parse_into_clause(&mut self) -> Result<IntoTable> {
        // Expect identifier starting with #
        let name = self.expect_identifier()?;

        if !name.starts_with('#') {
            return Err(anyhow!("Temporary table must start with #"));
        }

        // Future: Handle "OR REPLACE" if needed
        Ok(IntoTable {
            name,
            replace_if_exists: false,
        })
    }
}
```

#### 1.4 Script Execution Enhancement

```rust
// In src/non_interactive.rs

pub fn execute_script(config: NonInteractiveConfig) -> Result<()> {
    let parser = ScriptParser::new(&config.query);
    let statements = parser.parse_and_validate()?;

    // NEW: Create temp table registry for script duration
    let mut temp_tables = TempTableRegistry::new();
    let mut script_result = ScriptResult::new();

    // Load initial data file if provided
    let mut base_table = if !config.data_file.is_empty() {
        Some(load_data_file(&config.data_file)?)
    } else {
        None
    };

    for (idx, sql) in statements.iter().enumerate() {
        let start = Instant::now();

        // Parse the statement
        let mut parser = Parser::new(sql);
        let statement = parser.parse_select()?;

        // Execute with temp table context
        let result = execute_with_temp_tables(
            &statement,
            &base_table,
            &mut temp_tables,
            &config
        );

        match result {
            Ok(data_view) => {
                let rows = data_view.row_count();

                // If this is an INTO statement, store the result
                if let Some(into) = &statement.into_table {
                    temp_tables.insert(
                        into.name.clone(),
                        Arc::new(data_view.to_datatable())
                    )?;

                    println!("({} rows affected) -> {}", rows, into.name);
                } else {
                    // Regular output
                    output_result(&data_view, &config)?;
                }

                script_result.add_success(idx + 1, sql.clone(), rows, start.elapsed());
            }
            Err(e) => {
                script_result.add_failure(idx + 1, sql.clone(), e.to_string(), start.elapsed());
                if !config.continue_on_error {
                    break;
                }
            }
        }
    }

    // Cleanup: temp_tables dropped when going out of scope
    Ok(())
}

fn execute_with_temp_tables(
    statement: &SelectStatement,
    base_table: &Option<DataTable>,
    temp_tables: &mut TempTableRegistry,
    config: &NonInteractiveConfig
) -> Result<DataView> {
    // Check if FROM clause references a temp table
    let table = if let Some(from_table_name) = &statement.from_table {
        if from_table_name.starts_with('#') {
            // Resolve from temp table registry
            let temp_table = temp_tables.get(from_table_name)
                .ok_or_else(|| anyhow!("Temporary table {} not found", from_table_name))?;
            Some((*temp_table).clone())
        } else {
            base_table.clone()
        }
    } else {
        base_table.clone()
    };

    // Execute query with resolved table
    let engine = QueryEngine::new();
    engine.execute(&table, statement)
}
```

#### 1.5 Query Engine Enhancement

Currently query engine receives a `table: Option<DataTable>`. Need to enhance to resolve temp tables in CTE context:

```rust
// In src/data/query_engine.rs

impl QueryEngine {
    pub fn execute_with_temp_tables(
        &self,
        table: &Option<DataTable>,
        statement: &SelectStatement,
        temp_tables: &TempTableRegistry,
    ) -> Result<DataView> {
        // When processing CTEs and FROM clauses, check temp_tables first
        // Treat #tmp tables like CTEs - they're already materialized

        // ... existing execution logic ...
    }
}
```

## Phase 2: Dynamic Query Injection (Future Enhancement)

### Use Case Expanded

```sql
-- Step 1: Extract subset of data
WITH WEB trades AS (
    URL 'https://api.example.com/trades'
    BODY { "date": "2024-01-15", "status": "pending" }
    FORMAT JSON
)
SELECT trade_id FROM trades INTO #pending_ids;
GO

-- Step 2: Use those IDs to fetch allocations
WITH WEB allocations AS (
    URL 'https://api.example.com/allocations'
    BODY {
        "trade_ids": #pending_ids,          -- Array injection
        "mode": "detailed"
    }
    FORMAT JSON
)
SELECT * FROM allocations;
GO
```

### Body Template Expansion

```rust
// In web_cte_parser.rs or http_fetcher.rs

fn expand_body_with_temp_tables(
    body: &str,
    temp_tables: &TempTableRegistry
) -> Result<String> {
    // Parse body as JSON/template
    let mut body_value: serde_json::Value = serde_json::from_str(body)?;

    // Find references to #table_name
    expand_references(&mut body_value, temp_tables)?;

    Ok(serde_json::to_string(&body_value)?)
}

fn expand_references(
    value: &mut serde_json::Value,
    temp_tables: &TempTableRegistry
) -> Result<()> {
    match value {
        Value::String(s) if s.starts_with('#') => {
            // Replace with table data
            let table = temp_tables.get(s)
                .ok_or_else(|| anyhow!("Temp table {} not found", s))?;

            // Convert first column to array
            let array = table.rows.iter()
                .map(|row| row.values[0].clone())
                .collect::<Vec<_>>();

            *value = serde_json::to_value(array)?;
        }
        Value::Object(map) => {
            for (_, v) in map.iter_mut() {
                expand_references(v, temp_tables)?;
            }
        }
        Value::Array(arr) => {
            for v in arr.iter_mut() {
                expand_references(v, temp_tables)?;
            }
        }
        _ => {}
    }
    Ok(())
}
```

### Advanced Templating (Future)

```sql
-- More flexible with explicit column selection
WITH WEB orders AS (
    URL 'https://api.example.com/orders'
    BODY {
        "customer_ids": #customers.customer_id,    -- Specific column
        "include_details": true,
        "filters": {
            "regions": #active_regions.region_code  -- Another column
        }
    }
    FORMAT JSON
)
SELECT * FROM orders;
```

## Benefits

### Phase 1 Benefits (Immediate)
1. **Code Reuse**: Write WEB CTE once, query results multiple ways
2. **Performance**: Avoid redundant HTTP calls
3. **Clarity**: Separate data fetching from analysis
4. **Debugging**: Inspect intermediate results

### Phase 2 Benefits (Future)
1. **Dynamic Queries**: Build queries based on previous results
2. **Data Pipelines**: Multi-stage ETL-style workflows
3. **Batch Operations**: Process data in chunks
4. **Complex Integrations**: Chain multiple API calls with dependencies

## Implementation Complexity

### Phase 1: **Medium** (1-2 days)
- ✅ Clear scope and boundaries
- ✅ Existing patterns to follow (CTEs)
- ✅ No Redis integration needed
- ✅ Minimal parser changes

**Risks**: Low
- Parser changes are straightforward (INTO clause)
- TempTableRegistry is simple HashMap wrapper
- Script execution already loops over statements

### Phase 2: **High** (3-5 days)
- ⚠️ JSON template parsing complexity
- ⚠️ Need robust error handling
- ⚠️ Multiple expansion strategies needed
- ⚠️ Type coercion challenges

**Risks**: Medium
- Need to handle various data types (strings, numbers, nulls)
- Template syntax needs careful design
- Error messages must be clear

## Recommended Approach

### Step 1: Phase 1 MVP (Start Here)
1. Implement TempTableRegistry
2. Add INTO clause parsing
3. Enhance execute_script() to maintain registry
4. Add temp table resolution in FROM clauses
5. Test with basic scripts

**Success Criteria**:
- Can save WEB CTE results to #tmp
- Can query #tmp in subsequent statements
- Temp tables cleared after script ends

### Step 2: Evaluate Phase 1 (After Testing)
- Gather feedback on syntax and UX
- Identify real-world use cases
- Determine if Phase 2 is needed
- Refine Phase 2 design based on usage patterns

### Step 3: Phase 2 Design Refinement
- Design template expansion syntax
- Decide on column selection syntax
- Plan type handling strategy
- Build prototype with limited scope

## Alternative Syntaxes Considered

### SQL Server Style
```sql
SELECT * INTO #tmp FROM web_trades;  -- SQL Server
SELECT * FROM web_trades INTO #tmp;  -- Our proposed syntax
```
**Decision**: Use "INTO #tmp" after FROM for consistency with CTEs

### Explicit Declaration
```sql
DECLARE @tmp AS TEMP TABLE;
SELECT * FROM web_trades INTO @tmp;
```
**Decision**: Too verbose, # prefix is sufficient

### Hash vs At
```sql
#tmp  -- Unix/SQL Server style (chosen)
@tmp  -- Variable style
$tmp  -- Shell style
```
**Decision**: # prefix is most SQL-like

## Testing Strategy

### Unit Tests
- TempTableRegistry operations
- Parser handles INTO clause
- AST includes into_table field

### Integration Tests
```sql
-- Test 1: Basic persistence
SELECT * FROM trades INTO #t1;
GO
SELECT COUNT(*) FROM #t1;
GO

-- Test 2: Error on duplicate
SELECT * FROM trades INTO #t1;
GO
SELECT * FROM orders INTO #t1;  -- Should error
GO

-- Test 3: Multiple temp tables
SELECT * FROM trades INTO #t1;
GO
SELECT * FROM orders INTO #t2;
GO
SELECT * FROM #t1 JOIN #t2 ON ...;
GO
```

### Performance Tests
- Memory usage with large temp tables
- Impact on script execution time
- Cleanup verification

## Documentation Needs

1. **User Guide**: How to use temp tables in scripts
2. **Examples**: Real-world use cases
3. **API Docs**: TempTableRegistry methods
4. **Migration**: From multiple WEB CTEs to temp tables

## Future Enhancements (Beyond Phase 2)

1. **Indexing**: `CREATE INDEX ON #tmp (column)`
2. **Explicit Drops**: `DROP TABLE #tmp`
3. **Table Variables**: `DECLARE @t TABLE (...)`
4. **Cross-Script Persistence**: Named temp tables in ~/.sql-cli/temp/
5. **Memory Limits**: Max size for temp tables

## Conclusion

**Phase 1 is highly feasible and provides immediate value** with:
- Clear scope and implementation path
- Low risk and complexity
- Leverages existing architecture
- Solves real pain point (redundant WEB CTEs)

**Phase 2 requires careful design** but is architecturally sound:
- Can build incrementally on Phase 1
- Template expansion has clear patterns
- Provides powerful data pipeline capabilities

**Recommendation**: Proceed with Phase 1 implementation, evaluate adoption, then refine Phase 2 design based on real-world usage patterns.