sql-cli 1.73.1 - Docs.rs

# Parser Keyword Cleanup Plan

## Current State Analysis

The recursive parser currently has many hardcoded string comparisons for keywords, causing:
- Performance overhead from repeated `to_uppercase()` calls
- String allocations in hot paths
- Inconsistent keyword handling
- Maintenance challenges

## Phase 1: Audit Current Keyword Usage

### Identified Problem Patterns

1. **Autocomplete Context (lines 1880-1885)**
```rust
if trimmed.to_uppercase().ends_with(" AND")
    || trimmed.to_uppercase().ends_with(" OR")
    || trimmed.to_uppercase().ends_with(" AND ")
    || trimmed.to_uppercase().ends_with(" OR ")
```

2. **Direct String Comparisons**
```rust
if token_str.to_uppercase() == "SELECT"
if word.to_uppercase() == "FROM"
```

3. **Multiple Case Variations**
```rust
matches!(s.to_uppercase().as_str(), "ASC" | "DESC")
```

## Phase 2: Solution Design

### Option A: Enhance Token System (Recommended)
Leverage the existing Token enum more effectively:

```rust
// Current Token enum already has many keywords
pub enum Token {
    Select, From, Where, And, Or, // etc.
}

// Add helper methods
impl Token {
    pub fn from_keyword(s: &str) -> Option<Token> {
        match s.to_uppercase().as_str() {
            "SELECT" => Some(Token::Select),
            "FROM" => Some(Token::From),
            // ... etc
        }
    }

    pub fn is_logical_operator(&self) -> bool {
        matches!(self, Token::And | Token::Or)
    }
}
```

### Option B: Keyword Module with Constants
Create a dedicated keywords module:

```rust
pub mod keywords {
    pub const SELECT: &str = "SELECT";
    pub const FROM: &str = "FROM";

    pub fn is_keyword(s: &str) -> bool {
        KEYWORDS.contains(&s.to_uppercase().as_str())
    }

    lazy_static! {
        static ref KEYWORDS: HashSet<&'static str> = {
            // All SQL keywords
        };
    }
}
```

## Phase 3: Implementation Steps

### Step 1: Create Keyword Infrastructure
- [ ] Enhance Token enum with all SQL keywords
- [ ] Add Token::from_keyword() method
- [ ] Add Token classification methods (is_join_type, is_aggregate, etc.)
- [ ] Create case-insensitive comparison utilities

### Step 2: Replace String Comparisons
- [ ] Replace autocomplete context checks (lines 1880-1885)
- [ ] Replace parse_select_list keyword checks
- [ ] Replace parse_from_clause keyword checks
- [ ] Replace parse_where_clause keyword checks
- [ ] Replace parse_order_by keyword checks
- [ ] Replace parse_group_by keyword checks

### Step 3: Optimize Hot Paths
- [ ] Cache uppercase conversions where needed
- [ ] Use token lookahead instead of string peeking
- [ ] Pre-compute keyword lengths for position tracking

### Step 4: Testing
- [ ] Ensure all existing tests pass
- [ ] Add performance benchmarks for parsing
- [ ] Test case-insensitive parsing thoroughly

## Phase 4: Web CTE Module Extraction

After keyword cleanup, extract WEB CTE:

### Module Structure
```
src/web_cte/
├── mod.rs           # Public API
├── spec.rs          # WebCTESpec types
├── parser.rs        # Parse WEB keyword and spec
├── executor.rs      # HTTP execution
├── auth/
│   ├── mod.rs      # Auth trait
│   ├── bearer.rs   # Bearer token auth
│   └── basic.rs    # Basic auth
├── cache.rs         # Response caching
└── error.rs         # Error types
```

### Future NTLM Support (if needed)
```rust
// In auth/ntlm.rs (future)
pub struct NtlmAuthProvider {
    // Could delegate to Flask proxy
    proxy_url: Option<String>,
}
```

## Success Criteria

1. **Performance**: Fewer string allocations, no redundant uppercase conversions
2. **Maintainability**: Clear keyword handling, easy to add new keywords
3. **Correctness**: All tests pass, case-insensitive parsing works
4. **Modularity**: WEB CTE in its own module with clear interfaces

## Order of Operations

1. Start with Token enum enhancements
2. Replace easiest string comparisons first (single keywords)
3. Handle complex cases (autocomplete, multi-word keywords)
4. Extract WEB CTE module
5. Add auth provider abstraction

---

*Note on NTLM: The Flask proxy solution is actually quite good. Native NTLM in Rust is complex and platform-specific. We could document the proxy pattern as an official solution for legacy auth systems.*