sql-cli 1.73.1 - Docs.rs

# Window Functions Implementation Architecture

## Overview
Window functions require a helper class (`WindowContext`) that manages partitioned and ordered views of data, similar to GROUP BY but maintaining row-level access for functions like LAG/LEAD.

## Core Concept

```rust
// Window functions operate on "frames" within partitions
// Each partition is an ordered subset of rows
// Unlike GROUP BY, we don't collapse rows - we augment them

WindowContext {
    partitions: BTreeMap<PartitionKey, OrderedPartition>,
    frame_spec: FrameSpecification,
}

OrderedPartition {
    rows: Vec<usize>,          // Row indices in original DataView
    sort_order: Vec<SortKey>,  // How this partition is ordered
}
```

## Architecture Design

### 1. WindowContext Helper Class

```rust
pub struct WindowContext {
    source: Arc<DataView>,
    partitions: BTreeMap<Vec<DataValue>, OrderedPartition>,
    window_spec: WindowSpecification,
}

impl WindowContext {
    /// Create window context from DataView with PARTITION BY and ORDER BY
    pub fn new(
        view: Arc<DataView>,
        partition_by: Vec<String>,
        order_by: Vec<OrderByColumn>,
    ) -> Result<Self> {
        // 1. Partition the data (like GROUP BY)
        // 2. Sort each partition independently
        // 3. Store ordered row indices
    }
    
    /// Get value at offset from current row (for LAG/LEAD)
    pub fn get_offset_value(
        &self,
        current_row: usize,
        offset: i32,
        column: &str,
    ) -> Option<DataValue> {
        // Find which partition this row belongs to
        // Navigate within that partition's ordered rows
    }
    
    /// Get row number within partition
    pub fn get_row_number(&self, row_index: usize) -> usize {
        // Return position within partition
    }
    
    /// Get first/last value in partition
    pub fn get_first_value(&self, row_index: usize, column: &str) -> DataValue
    pub fn get_last_value(&self, row_index: usize, column: &str) -> DataValue
}
```

### 2. OrderedPartition Structure

```rust
pub struct OrderedPartition {
    /// Original row indices from DataView, in sorted order
    rows: Vec<usize>,
    
    /// Quick lookup: row_index -> position in partition
    row_positions: HashMap<usize, usize>,
    
    /// The sort keys used for ordering
    sort_spec: Vec<SortColumn>,
}

impl OrderedPartition {
    /// Navigate to offset from current position
    pub fn get_row_at_offset(&self, current_pos: usize, offset: i32) -> Option<usize> {
        let target_pos = (current_pos as i32) + offset;
        if target_pos >= 0 && target_pos < self.rows.len() as i32 {
            Some(self.rows[target_pos as usize])
        } else {
            None
        }
    }
    
    /// Get position of row in this partition
    pub fn get_position(&self, row_index: usize) -> Option<usize> {
        self.row_positions.get(&row_index).copied()
    }
}
```

### 3. Integration with Query Engine

```rust
// In query_engine.rs or window_functions.rs

pub fn evaluate_window_function(
    view: &DataView,
    func: &WindowFunction,
    spec: &WindowSpec,
) -> Result<Vec<DataValue>> {
    // Create WindowContext with partition and order specs
    let context = WindowContext::new(
        Arc::new(view.clone()),
        spec.partition_by.clone(),
        spec.order_by.clone(),
    )?;
    
    // Evaluate function for each row
    let mut results = Vec::new();
    for row_idx in view.get_visible_rows() {
        let value = match func {
            WindowFunction::Lag { column, offset, default } => {
                context.get_offset_value(row_idx, -offset, column)
                    .unwrap_or_else(|| default.clone())
            }
            WindowFunction::Lead { column, offset, default } => {
                context.get_offset_value(row_idx, offset, column)
                    .unwrap_or_else(|| default.clone())
            }
            WindowFunction::RowNumber => {
                DataValue::Integer(context.get_row_number(row_idx) as i64)
            }
            WindowFunction::FirstValue { column } => {
                context.get_first_value(row_idx, column)
            }
            WindowFunction::LastValue { column } => {
                context.get_last_value(row_idx, column)
            }
            // ... other functions
        };
        results.push(value);
    }
    
    Ok(results)
}
```

## Implementation Phases

### Phase 1: LAG/LEAD (Simplest)
```sql
SELECT 
    price,
    LAG(price, 1) OVER (ORDER BY trade_time) as prev_price,
    price - LAG(price, 1) OVER (ORDER BY trade_time) as price_change
FROM trades
```

**Implementation:**
1. Create WindowContext with no partitions (single partition)
2. Order by trade_time
3. For each row, navigate offset within ordered partition

### Phase 2: ROW_NUMBER
```sql
SELECT 
    trader,
    price,
    ROW_NUMBER() OVER (ORDER BY price DESC) as price_rank
FROM trades
```

**Implementation:**
1. Single partition, ordered by price DESC
2. Return position in ordered list

### Phase 3: PARTITION BY
```sql
SELECT 
    trader,
    price,
    ROW_NUMBER() OVER (PARTITION BY trader ORDER BY price DESC) as rank_within_trader
FROM trades
```

**Implementation:**
1. Multiple partitions (one per trader)
2. Each partition independently ordered
3. Row number restarts at 1 for each partition

### Phase 4: FIRST_VALUE/LAST_VALUE
```sql
SELECT 
    trader,
    price,
    FIRST_VALUE(price) OVER (PARTITION BY trader ORDER BY trade_time) as opening_price,
    LAST_VALUE(price) OVER (PARTITION BY trader ORDER BY trade_time) as current_price
FROM trades
```

## Key Design Decisions

### 1. Partition Storage
- Use `BTreeMap` for sorted partition keys
- Each partition maintains its own ordered row indices
- Quick lookup from row -> partition -> position

### 2. Memory Efficiency
- Don't duplicate data, only store indices
- Share source DataView across all partitions
- Lazy evaluation where possible

### 3. API Design
- WindowContext encapsulates all partition logic
- Clean separation from DataView (composition, not modification)
- Reusable for different window functions

### 4. Performance Considerations
- Pre-compute partitions and ordering once
- O(1) lookup for row position within partition
- O(1) navigation for LAG/LEAD within partition

## Example Usage Flow

```rust
// 1. Parser creates WindowSpec from SQL
let window_spec = WindowSpec {
    partition_by: vec!["trader".to_string()],
    order_by: vec![OrderByColumn { 
        column: "price".to_string(), 
        ascending: false 
    }],
};

// 2. Create WindowContext
let window_ctx = WindowContext::new(data_view, window_spec)?;

// 3. Evaluate window function for each row
for row in data_view.get_visible_rows() {
    let lag_value = window_ctx.get_offset_value(row, -1, "price");
    let row_num = window_ctx.get_row_number(row);
    // Add to result columns
}
```

## Benefits of This Architecture

1. **Separation of Concerns**: Window logic separate from DataView
2. **Reusability**: Same context works for multiple window functions
3. **Efficiency**: Partitions computed once, used many times
4. **Extensibility**: Easy to add new window functions
5. **Similarity to GROUP BY**: Leverages existing partitioning concepts

## Next Steps

1. Implement WindowContext struct
2. Add LAG/LEAD as proof of concept
3. Extend parser to recognize OVER clause
4. Add tests with sample data
5. Optimize for large datasets