sql-cli 1.73.1 - Docs.rs

# Minimal Redis Integration for SQL CLI

## The Plan
- sql-cli remains transient (start → check cache → run → exit)
- Redis runs in Docker as persistent cache
- Cache key = SHA256 hash of (URL + Method + Headers + Body)
- Changes limited to `WebDataFetcher::fetch()` only

## 1. Start Redis (at work)
```bash
# Run Redis in Docker
docker run -d \
  --name sql-cli-cache \
  -p 6379:6379 \
  -v redis-data:/data \
  redis:alpine \
  redis-server --appendonly yes

# Verify it's running
docker ps
redis-cli ping  # Should return PONG
```

## 2. Minimal Changes to sql-cli

### Add Redis dependency
```toml
# Cargo.toml
[dependencies]
redis = { version = "0.23", optional = true }
sha2 = "0.10"

[features]
default = ["redis-cache"]
redis-cache = ["redis"]
```

### Add cache module
```rust
// src/cache/mod.rs
#[cfg(feature = "redis-cache")]
pub mod redis_cache;

// src/cache/redis_cache.rs
use redis::{Client, Commands};
use sha2::{Sha256, Digest};
use crate::data::datatable::DataTable;

pub struct RedisCache {
    enabled: bool,
    client: Option<Client>,
}

impl RedisCache {
    pub fn new() -> Self {
        // Try to connect, but don't fail if Redis isn't available
        let client = Client::open("redis://127.0.0.1:6379").ok();

        Self {
            enabled: client.is_some() && std::env::var("SQL_CLI_NO_CACHE").is_err(),
            client,
        }
    }

    pub fn generate_key(url: &str, method: &str, headers: &[(String, String)], body: Option<&str>) -> String {
        let mut hasher = Sha256::new();

        // Hash all the components that make this request unique
        hasher.update(url.as_bytes());
        hasher.update(method.as_bytes());

        for (k, v) in headers {
            hasher.update(k.as_bytes());
            hasher.update(v.as_bytes());
        }

        if let Some(body) = body {
            hasher.update(body.as_bytes());
        }

        format!("sql-cli:web:{:x}", hasher.finalize())
    }

    pub fn get(&self, key: &str) -> Option<Vec<u8>> {
        if !self.enabled {
            return None;
        }

        let mut conn = self.client.as_ref()?.get_connection().ok()?;
        conn.get(key).ok()
    }

    pub fn set(&self, key: &str, data: &[u8], ttl_seconds: u64) -> Result<(), redis::RedisError> {
        if !self.enabled {
            return Ok(());
        }

        let mut conn = self.client.as_ref()
            .ok_or_else(|| redis::RedisError::from((redis::ErrorKind::IoError, "No client")))?
            .get_connection()?;

        conn.set_ex(key, data, ttl_seconds as usize)?;
        Ok(())
    }
}
```

### Modify WebDataFetcher (the ONLY change to existing code)
```rust
// src/web/http_fetcher.rs

impl WebDataFetcher {
    pub fn fetch(&self, spec: &WebCTESpec, table_name: &str) -> Result<DataTable> {
        // NEW: Generate cache key from spec
        let cache_key = RedisCache::generate_key(
            &spec.url,
            &format!("{:?}", spec.method.as_ref().unwrap_or(&HttpMethod::GET)),
            &spec.headers,
            spec.body.as_deref(),
        );

        // NEW: Try Redis cache first
        #[cfg(feature = "redis-cache")]
        {
            let cache = RedisCache::new();
            if let Some(cached_bytes) = cache.get(&cache_key) {
                // Deserialize from Parquet bytes
                if let Ok(table) = DataTable::from_parquet_bytes(&cached_bytes) {
                    eprintln!("Cache HIT for {} (key: {}...)", table_name, &cache_key[0..16]);
                    return Ok(table);
                }
            }
            eprintln!("Cache MISS for {} (key: {}...)", table_name, &cache_key[0..16]);
        }

        // Existing fetch logic
        let data = if spec.url.starts_with("file://") {
            self.fetch_file(spec, table_name)?
        } else {
            self.fetch_http(spec, table_name)?
        };

        // NEW: Store in cache
        #[cfg(feature = "redis-cache")]
        {
            let ttl = spec.cache_seconds.unwrap_or_else(|| {
                // Smart default: 1 hour for prod, 5 min for staging
                if spec.url.contains("prod") {
                    3600
                } else {
                    300
                }
            });

            if let Ok(parquet_bytes) = data.to_parquet_bytes() {
                let cache = RedisCache::new();
                let _ = cache.set(&cache_key, &parquet_bytes, ttl);
                eprintln!("Cached {} for {} seconds", table_name, ttl);
            }
        }

        Ok(data)
    }
}
```

### Add Parquet serialization to DataTable
```rust
// src/data/datatable.rs

impl DataTable {
    pub fn to_parquet_bytes(&self) -> Result<Vec<u8>> {
        // Use arrow2 to serialize to Parquet
        let mut buffer = Vec::new();
        // ... parquet serialization logic ...
        Ok(buffer)
    }

    pub fn from_parquet_bytes(bytes: &[u8]) -> Result<Self> {
        // Deserialize from Parquet
        // ... parquet deserialization logic ...
    }
}
```

## 3. How It Works in Practice

```sql
-- First execution (Monday morning)
WITH WEB trades AS (
    URL 'https://prod.api/trades'
    METHOD POST
    HEADERS ('Authorization': 'Bearer token')
    BODY '{"source": "Barclays", "date": "2025-09-01"}'
    CACHE 86400  -- 24 hours
)
SELECT * FROM trades WHERE amount > 1000000;

-- Console output:
-- Cache MISS for trades (key: a3f4b2c1d5e7...)
-- Fetching from https://prod.api/trades...
-- Cached trades for 86400 seconds
-- [Results displayed]

-- Second execution (any time that day)
-- Same query runs instantly

-- Console output:
-- Cache HIT for trades (key: a3f4b2c1d5e7...)
-- [Results displayed immediately]
```

## 4. Cache Management

```bash
# Check what's cached
redis-cli
> KEYS sql-cli:*
1) "sql-cli:web:a3f4b2c1d5e7..."
2) "sql-cli:web:b5c6d8e9f0a1..."

# Check TTL
> TTL "sql-cli:web:a3f4b2c1d5e7..."
(integer) 85234

# Manual clear if needed
> DEL "sql-cli:web:a3f4b2c1d5e7..."

# Clear all sql-cli cache
> EVAL "return redis.call('del', unpack(redis.call('keys', 'sql-cli:*')))" 0

# Monitor cache activity
> MONITOR
```

## 5. Environment Control

```bash
# Normal operation (cache enabled)
sql-cli query.sql

# Bypass cache for testing
SQL_CLI_NO_CACHE=1 sql-cli query.sql

# Different Redis host
REDIS_URL=redis://other-host:6379 sql-cli query.sql
```

## Key Design Decisions

1. **Graceful Degradation**: If Redis isn't running, sql-cli still works (just no cache)
2. **Minimal Changes**: Only touches `WebDataFetcher::fetch()`
3. **Smart Keys**: Hash includes all request components that affect the response
4. **Optional Feature**: Can compile without Redis support if needed
5. **Observable**: stderr shows cache hits/misses for debugging

## What This Gives You

- **Zero config**: Just run Redis and it works
- **Shared cache**: All sql-cli invocations share the cache
- **Persistent**: Cache survives sql-cli restarts
- **Fast**: Parquet bytes in Redis = millisecond retrieval
- **Simple**: ~100 lines of code total

## Next Steps

1. Add Redis to Cargo.toml
2. Create `src/cache/redis_cache.rs`
3. Add cache check to `WebDataFetcher::fetch()`
4. Add Parquet serialization helpers
5. Test with your Barclays/Bloomberg queries

This keeps sql-cli simple and stable while giving you powerful caching!