oxibase 0.2.2

Autonomous relational database management system with MVCC, time-travel queries, and full ACID compliance
Documentation
---
layout: default
title: Window Functions
parent: Functions
nav_order: 4
---

# Window Functions

Window functions perform calculations across a set of table rows that are related to the current row. Unlike regular aggregate functions, window functions do not cause rows to become grouped into a single output row - the rows retain their separate identities.

## Architecture Overview

```mermaid
graph TB
    subgraph "Entry Points"
        Query["execute_select<br/>[query.rs:155]"]
        HasWindow["has_window_functions<br/>[window.rs:2522]"]
    end
    
    subgraph "Execution Coordinators"
        ExecWindow["execute_select_with_window_functions<br/>[window.rs:115]"]
        ExecPresorted["execute_select_with_window_functions_presorted<br/>[window.rs:135]"]
        ExecPregrouped["execute_select_with_window_functions_pregrouped<br/>[window.rs:156]"]
        ExecStreaming["execute_select_with_window_functions_streaming<br/>[window.rs:422]"]
        ExecLazy["execute_select_with_window_functions_lazy_partition<br/>[window.rs:591]"]
    end
    
    subgraph "Core Computation"
        ParseWF["parse_window_functions<br/>[window.rs:974]"]
        ComputeWF["compute_window_function<br/>[window.rs:1216]"]
        ComputePartition["compute_window_for_partition<br/>[window.rs:1423]"]
        ComputeAggWF["compute_aggregate_window_function<br/>[window.rs:2111]"]
    end
    
    subgraph "Specialized Computations"
        ComputeLeadLag["compute_lead_lag<br/>[window.rs:1541]"]
        ComputeNtile["compute_ntile<br/>[window.rs:1585]"]
        ComputeRank["compute_rank<br/>[window.rs:1639]"]
        ComputePercentRank["compute_percent_rank<br/>[window.rs:1821]"]
        ComputeCumeDist["compute_cume_dist<br/>[window.rs:1859]"]
    end
    
    subgraph "Support Functions"
        PrecomputeOrderBy["precompute_order_by_values<br/>[window.rs:1967]"]
        SortByOrderValues["sort_by_order_values<br/>[window.rs:2092]"]
        ParseSelectList["parse_select_list_for_window<br/>[window.rs:748]"]
    end
    
    Query -->|"has window functions?"| HasWindow
    HasWindow -->|"yes"| ExecWindow
    ExecWindow --> ParseWF
    
    ExecWindow --> ComputeWF
    ExecPresorted --> ComputeWF
    ExecPregrouped --> ComputeWF
    ExecStreaming --> ComputePartition
    ExecLazy --> ComputePartition
    
    ComputeWF -->|"partition rows"| ComputePartition
    ComputeWF -->|"aggregate functions"| ComputeAggWF
    
    ComputePartition --> PrecomputeOrderBy
    ComputePartition --> SortByOrderValues
    ComputePartition --> ComputeLeadLag
    ComputePartition --> ComputeNtile
    ComputePartition --> ComputeRank
    
    ComputeAggWF --> PrecomputeOrderBy
    ComputeAggWF --> SortByOrderValues
    
    ExecWindow --> ParseSelectList
    ExecStreaming --> ParseSelectList
    ExecLazy --> ParseSelectList
```

## Execution Pipeline

```mermaid
graph TD
    Start["Query with Window Function"] --> Detect["Detect Window Functions<br/>[window.rs:2522-2536]"]
    Detect --> Parse["Parse Window Functions<br/>[window.rs:974-1031]<br/>Extract WindowFunctionInfo"]
    
    Parse --> OptCheck{"Optimization<br/>Eligible?"}
    
    OptCheck -->|"PARTITION BY + LIMIT<br/>+ indexed partition col"| LazyPartition["Lazy Partition Fetch<br/>[window.rs:591-745]<br/>Fetch partitions one-by-one<br/>Stop at LIMIT"]
    
    OptCheck -->|"No PARTITION BY<br/>+ indexed ORDER BY col"| PreSorted["Pre-sorted Optimization<br/>[window.rs:135-151]<br/>Skip sorting"]
    
    OptCheck -->|"PARTITION BY<br/>+ indexed partition col"| PreGrouped["Pre-grouped Optimization<br/>[window.rs:156-172]<br/>Skip hash grouping"]
    
    OptCheck -->|"PARTITION BY + LIMIT"| Streaming["Streaming with LIMIT<br/>[window.rs:422-586]<br/>Process partitions until LIMIT"]
    
    OptCheck -->|"Standard case"| Standard["Standard Execution<br/>[window.rs:115-131]"]
    
    LazyPartition --> BuildResult["Build Result Rows<br/>[window.rs:352-417]"]
    PreSorted --> Compute["Compute Window Values<br/>[window.rs:1216-1416]"]
    PreGrouped --> Compute
    Streaming --> BuildResult
    Standard --> Compute
    
    Compute --> Partition{"Has<br/>PARTITION BY?"}
    
    Partition -->|"No"| SinglePartition["Single Partition<br/>[window.rs:1250-1293]<br/>All rows together"]
    Partition -->|"Yes"| MultiPartition["Hash-based Partitioning<br/>[window.rs:1295-1354]<br/>Group by partition key"]
    
    SinglePartition --> ComputePartition["compute_window_for_partition<br/>[window.rs:1423-1538]"]
    MultiPartition -->|"≥10 partitions<br/>+ ≥1000 rows"| Parallel["Parallel Partition Processing<br/>[window.rs:1359-1390]<br/>Rayon parallel_iter"]
    MultiPartition -->|"Small dataset"| Sequential["Sequential Processing<br/>[window.rs:1391-1415]"]
    
    Parallel --> ComputePartition
    Sequential --> ComputePartition
    
    ComputePartition --> FuncType{"Function<br/>Type?"}
    
    FuncType -->|"Aggregate<br/>(SUM, COUNT, etc.)"| AggCompute["compute_aggregate_window_function<br/>[window.rs:2111-2520]<br/>With frame bounds"]
    FuncType -->|"LEAD/LAG"| LeadLag["compute_lead_lag<br/>[window.rs:1541-1582]"]
    FuncType -->|"NTILE"| Ntile["compute_ntile<br/>[window.rs:1585-1636]"]
    FuncType -->|"RANK/DENSE_RANK"| Rank["compute_rank<br/>[window.rs:1639-1693]"]
    FuncType -->|"Other ranking"| Other["ROW_NUMBER, etc.<br/>[window.rs:1530-1532]"]
    
    AggCompute --> BuildResult
    LeadLag --> BuildResult
    Ntile --> BuildResult
    Rank --> BuildResult
    Other --> BuildResult
    
    BuildResult --> End["Return ExecutorMemoryResult<br/>or streaming result"]
```

## Syntax

```sql
function_name([expression]) OVER (
    [PARTITION BY partition_expression [, ...]]
    [ORDER BY sort_expression [ASC | DESC] [, ...]]
)
```

- **PARTITION BY**: Divides rows into groups (partitions) that share the same values
- **ORDER BY**: Defines the order of rows within each partition

## Available Window Functions

### Ranking Functions

#### ROW_NUMBER()
Assigns a unique sequential number to each row within a partition.

```sql
SELECT name, dept, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM employees;
```

Result:
```
name   | dept        | salary | row_num
-------+-------------+--------+--------
Diana  | Engineering | 80000  | 1
Frank  | Engineering | 75000  | 2
Charlie| Engineering | 70000  | 3
Bob    | Sales       | 60000  | 4
Eve    | Sales       | 55000  | 5
Alice  | Sales       | 50000  | 6
```

#### RANK()
Assigns a rank to each row within a partition. Rows with equal values receive the same rank, with gaps in the sequence.

```sql
SELECT name, salary,
       RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;
```

If two employees have the same salary and are ranked 2, the next rank would be 4 (not 3).

#### DENSE_RANK()
Similar to RANK(), but without gaps in the ranking sequence.

```sql
SELECT name, salary,
       DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;
```

If two employees have the same salary and are ranked 2, the next rank would be 3.

#### NTILE(n)
Distributes rows into a specified number of buckets.

```sql
SELECT name, salary,
       NTILE(3) OVER (ORDER BY salary DESC) as tertile
FROM employees;
```

Divides employees into 3 groups based on salary.

#### PERCENT_RANK()
Returns the relative rank of the current row as a percentage (0 to 1).

```sql
SELECT name, salary,
       PERCENT_RANK() OVER (ORDER BY salary) as pct_rank
FROM employees;
```

#### CUME_DIST()
Returns the cumulative distribution of a value (fraction of rows with values <= current row).

```sql
SELECT name, salary,
       CUME_DIST() OVER (ORDER BY salary) as cume_dist
FROM employees;
```

### Value Functions

#### LAG(expression [, offset [, default]])
Accesses a value from a previous row within the partition.

```sql
-- Get previous row's salary
SELECT name, salary,
       LAG(salary) OVER (ORDER BY salary) as prev_salary
FROM employees;

-- Get salary from 2 rows back, with default of 0
SELECT name, salary,
       LAG(salary, 2, 0) OVER (ORDER BY salary) as prev2_salary
FROM employees;
```

#### LEAD(expression [, offset [, default]])
Accesses a value from a following row within the partition.

```sql
-- Get next row's salary
SELECT name, salary,
       LEAD(salary) OVER (ORDER BY salary) as next_salary
FROM employees;

-- Get salary from 2 rows ahead, with default of 0
SELECT name, salary,
       LEAD(salary, 2, 0) OVER (ORDER BY salary) as next2_salary
FROM employees;
```

#### FIRST_VALUE(expression)
Returns the first value within the partition.

```sql
SELECT name, dept, salary,
       FIRST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) as max_in_dept
FROM employees;
```

#### LAST_VALUE(expression)
Returns the last value within the current window frame.

```sql
SELECT name, dept, salary,
       LAST_VALUE(salary) OVER (PARTITION BY dept ORDER BY salary DESC) as current_last
FROM employees;
```

#### NTH_VALUE(expression, n)
Returns the nth value within the partition.

```sql
SELECT name, salary,
       NTH_VALUE(salary, 2) OVER (ORDER BY salary DESC) as second_highest
FROM employees;
```

## Using PARTITION BY

PARTITION BY divides the result set into partitions, and window functions are applied to each partition separately.

```sql
-- Row numbers within each department
SELECT name, dept, salary,
       ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank
FROM employees
ORDER BY dept, dept_rank;
```

Result:
```
name   | dept        | salary | dept_rank
-------+-------------+--------+----------
Diana  | Engineering | 80000  | 1
Frank  | Engineering | 75000  | 2
Charlie| Engineering | 70000  | 3
Bob    | Sales       | 60000  | 1
Eve    | Sales       | 55000  | 2
Alice  | Sales       | 50000  | 3
```

## Aggregate Functions as Window Functions

Standard aggregate functions can also be used as window functions:

```sql
-- Running total
SELECT name, salary,
       SUM(salary) OVER (ORDER BY salary) as running_total
FROM employees
ORDER BY salary;
```

Result:
```
name   | salary | running_total
-------+--------+--------------
Alice  | 50000  | 50000
Eve    | 55000  | 105000
Bob    | 60000  | 165000
Charlie| 70000  | 235000
Frank  | 75000  | 310000
Diana  | 80000  | 390000
```

## Common Use Cases

### Top N per Group

Find the highest paid employee in each department:

```sql
SELECT * FROM (
    SELECT name, dept, salary,
           ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as rn
    FROM employees
) ranked
WHERE rn = 1;
```

### Running Totals

Calculate cumulative sales:

```sql
SELECT order_date, amount,
       SUM(amount) OVER (ORDER BY order_date) as cumulative_sales
FROM orders;
```

### Compare to Previous/Next

Calculate month-over-month change:

```sql
SELECT month, revenue,
       revenue - LAG(revenue) OVER (ORDER BY month) as change
FROM monthly_sales;
```

### Percentile Calculation

Find employees in the top 10% of salaries:

```sql
SELECT * FROM (
    SELECT name, salary,
           PERCENT_RANK() OVER (ORDER BY salary DESC) as pct
    FROM employees
) ranked
WHERE pct <= 0.10;
```

## Notes

- When ORDER BY is omitted, the entire partition is treated as a single group
- NULL values are handled according to SQL standards
- Window functions can only appear in SELECT and ORDER BY clauses
- Multiple window functions can be used in the same query