oxibase 0.2.3

Autonomous relational database management system with MVCC, time-travel queries, and full ACID compliance
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
---
layout: default
title: Query Optimizer
parent: Performance
nav_order: 2
---

# Query Optimizer

Oxibase includes a sophisticated cost-based query optimizer that analyzes queries and chooses efficient execution plans. The optimizer uses table statistics, index information, and cost models to make intelligent decisions.

## Overview

The query optimizer performs several tasks:
- **Cost estimation**: Calculates the expected cost of different execution strategies
- **Access method selection**: Chooses between table scans, index lookups, and primary key access
- **Join ordering**: Determines the optimal order to join multiple tables
- **Join algorithm selection**: Chooses Hash Join, Merge Join, or Nested Loop Join
- **Predicate pushdown**: Moves filters close to data sources
- **Expression simplification**: Optimizes boolean and arithmetic expressions

### Optimization Phases

```mermaid
graph TD
    SQL["SQL Query"]
    Parse["Parser<br/>AST Generation"]
    Analyze["Query Analysis<br/>• Collect predicates<br/>• Detect aggregations<br/>• Identify subqueries"]
    
    subgraph "Optimization Phases"
        Simplify["Expression Simplification<br/>ExpressionSimplifier"]
        SubqueryOpt["Subquery Optimization<br/>• Cache non-correlated<br/>• Semi-join conversion<br/>• EXISTS to index probe"]
        PredicateOpt["Predicate Optimization<br/>• Pushdown to storage<br/>• Partition for JOINs<br/>• Zone map pruning"]
        IndexOpt["Index Selection<br/>• PK lookup detection<br/>• Single/multi-column<br/>• Type-based selection"]
        JoinOpt["Join Optimization<br/>• AQE algorithm selection<br/>• Build side selection<br/>• Filter pushdown"]
        LimitOpt["LIMIT Optimization<br/>• Storage pushdown<br/>• TOP-N heap<br/>• Early termination"]
    end
    
    Physical["Physical Plan<br/>ScannerResult pipeline"]
    Execute["Execution<br/>Storage Layer"]
    
    SQL --> Parse
    Parse --> Analyze
    Analyze --> Simplify
    Simplify --> SubqueryOpt
    SubqueryOpt --> PredicateOpt
    PredicateOpt --> IndexOpt
    IndexOpt --> JoinOpt
    JoinOpt --> LimitOpt
    LimitOpt --> Physical
    Physical --> Execute
```

## Cost Model

### Cost Components

The optimizer considers two main cost components:

1. **I/O Cost**: Reading data from storage
   - Sequential page reads (table scans)
   - Random page reads (index access)

2. **CPU Cost**: Processing data
   - Tuple processing
   - Predicate evaluation
   - Join operations

### Access Methods

| Method | Use Case | Cost |
|--------|----------|------|
| Primary Key Lookup | WHERE pk = value | Lowest (O(1)) |
| Index Scan | WHERE indexed_col = value | Low (O(log n)) |
| Index Range Scan | WHERE indexed_col > value | Medium |
| Sequential Scan | No usable index | Highest (O(n)) |

#### Primary Key Lookup Flow

```mermaid
graph LR
    Query["WHERE id = 42"]
    Detect["try_pk_lookup<br/>Extract PK value"]
    Storage["get_visible_version<br/>O(1) HashMap lookup"]
    Result["Single Row"]
    
    Query --> Detect
    Detect -->|"pk_id = 42"| Storage
    Storage --> Result
```

#### Single-Column Index Usage Flow

```mermaid
graph TD
    Predicate["WHERE Predicate"]
    Extract["collect_comparisons<br/>Extract column comparisons"]
    
    subgraph "Index Selection"
        CheckEq["Equality?<br/>col = value"]
        CheckRange["Range?<br/>col > value"]
        CheckIn["IN List?<br/>col IN (...)"]
        CheckLike["Prefix LIKE?<br/>col LIKE 'abc%'"]
    end
    
    IndexLookup["Index Lookup"]
    Results["Filtered row_ids"]
    
    Predicate --> Extract
    Extract --> CheckEq
    Extract --> CheckRange
    Extract --> CheckIn
    Extract --> CheckLike
    
    CheckEq -->|"get_row_ids_equal"| IndexLookup
    CheckRange -->|"get_row_ids_in_range"| IndexLookup
    CheckIn -->|"get_row_ids_in"| IndexLookup
    CheckLike -->|"find_range (prefix scan)"| IndexLookup
    
    IndexLookup --> Results
```

### Example

```sql
-- The optimizer evaluates multiple strategies
SELECT * FROM users WHERE id = 123;

-- Strategy 1: Sequential scan - O(n) rows
-- Strategy 2: PK lookup - O(1)
-- Winner: PK lookup (much lower cost)
```

## Table Statistics

The optimizer uses statistics collected by `ANALYZE` to make better decisions.

### ANALYZE Command

```sql
-- Collect statistics for a table
ANALYZE users;

-- Statistics are stored in system tables
SELECT * FROM _sys_table_stats WHERE table_name = 'users';
SELECT * FROM _sys_column_stats WHERE table_name = 'users';
```

### Statistics Collected

| Statistic | Description |
|-----------|-------------|
| Row count | Total number of rows in the table |
| Distinct count | Number of unique values per column |
| NULL count | Number of NULL values per column |
| Min/Max values | Range of values per column |
| Histograms | Distribution of values for selectivity estimation |

### Selectivity Estimation

Statistics help estimate how selective predicates are:

```sql
-- Without statistics: assume 10% selectivity
-- With statistics: use actual data distribution
SELECT * FROM orders WHERE status = 'pending';
```

## Join Optimization

### Join Ordering

For queries joining multiple tables, the optimizer uses dynamic programming to find the optimal join nav_order:

```sql
-- The optimizer considers all possible join orders
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id;

-- Possible orders:
-- 1. (orders JOIN customers) JOIN products
-- 2. (orders JOIN products) JOIN customers
-- 3. (customers JOIN orders) JOIN products
-- etc.
```

### Join Algorithms

The optimizer selects the best join algorithm:

| Algorithm | Best For | Requirements |
|-----------|----------|--------------|
| Hash Join | Large tables, equality joins | Memory for hash table |
| Merge Join | Pre-sorted inputs | Sorted data |
| Nested Loop | Small tables, any join condition | None |

### Semi-Join Optimization

For EXISTS and IN subqueries, the optimizer may use semi-join:

```sql
-- Can use semi-join optimization
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
```

## Adaptive Query Execution

Oxibase implements Adaptive Query Execution (AQE), which can adjust the execution plan at runtime based on actual data characteristics.

### How AQE Works

1. **Initial Plan**: Optimizer creates plan using statistics
2. **Runtime Monitoring**: Actual row counts are tracked during execution
3. **Plan Adjustment**: If actual differs significantly from estimated, the plan may be adjusted

### Cardinality Feedback

The optimizer learns from query execution:

```sql
-- First execution: uses estimated cardinality
SELECT * FROM orders WHERE amount > 1000;

-- Subsequent executions: uses feedback from actual row counts
-- This improves future estimates for similar queries
```

## Bloom Filter Propagation

For join-heavy queries, the optimizer can use Bloom filters to reduce data movement:

```sql
-- Bloom filter can pre-filter orders before join
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'US';
```

The filter on `customers.region` creates a Bloom filter that filters `orders` early.

## Expression Simplification

The optimizer simplifies expressions when possible:

### Constant Folding

```sql
-- Before: SELECT * FROM t WHERE x > 1 + 1
-- After:  SELECT * FROM t WHERE x > 2
```

### Boolean Optimization

```sql
-- Before: SELECT * FROM t WHERE true AND x > 5
-- After:  SELECT * FROM t WHERE x > 5

-- Before: SELECT * FROM t WHERE false OR x > 5
-- After:  SELECT * FROM t WHERE x > 5
```

### Predicate Merging

```sql
-- Before: WHERE x > 5 AND x > 10
-- After:  WHERE x > 10

-- Before: WHERE x = 5 AND x = 5
-- After:  WHERE x = 5
```

### Filter Pushdown to Storage

```mermaid
graph TD
    WHERE["WHERE age > 30<br/>AND status = 'active'"]
    
    subgraph "Compilation"
        Parse["Parse to Expression AST"]
        Compile["RowFilter::new<br/>Compile to bytecode"]
        Program["Arc<Program><br/>Shared bytecode"]
    end
    
    subgraph "Storage Layer"
        Arena["Arena Read<br/>get_all_visible_rows_filtered"]
        Filter["CompiledFilter::matches<br/>Eliminate virtual dispatch"]
        Collect["Collect matching rows"]
    end
    
    WHERE --> Parse
    Parse --> Compile
    Compile --> Program
    Program --> Arena
    Arena --> Filter
    Filter --> Collect
    Collect --> Results["Filtered Results<br/>3-5x faster"]
```

## LIMIT and TOP-N Optimization

### LIMIT Pushdown

For queries with LIMIT but no ORDER BY, the optimizer enables true early termination by pushing LIMIT directly to the storage layer:

```mermaid
graph TD
    Query["SELECT * FROM users<br/>LIMIT 100"]
    
    Check["Has ORDER BY?"]
    
    Ordered["Sorted LIMIT<br/>get_visible_rows_with_limit<br/>• Scan all rows<br/>• Sort by row_id<br/>• Take first 100"]
    
    Unordered["Unordered LIMIT<br/>get_visible_rows_with_limit_unordered<br/>• Early termination<br/>• Stop after 100 rows<br/>• 50x faster"]
    
    Query --> Check
    Check -->|"No"| Unordered
    Check -->|"Yes"| Ordered
```

### TOP-N Heap Optimization

For `ORDER BY ... LIMIT N` queries, the optimizer uses a bounded heap instead of full sorting:

```mermaid
graph TD
    Query["SELECT * FROM users<br/>ORDER BY age DESC<br/>LIMIT 10"]
    
    Detect["Detect ORDER BY + LIMIT"]
    
    TopN["TopNResult<br/>Bounded heap of size 10"]
    
    subgraph "Processing"
        Scan["Scan rows"]
        Compare["Compare with heap top"]
        Insert["Insert if smaller<br/>Evict largest"]
    end
    
    Query --> Detect
    Detect --> TopN
    TopN --> Scan
    Scan --> Compare
    Compare -->|"Better than top"| Insert
    Insert --> Scan
    
    Compare -->|"Worse than top"| Skip["Skip row"]
    Skip --> Scan
    
    Scan --> Results["10 best rows<br/>O(n log k) vs O(n log n)<br/>5-50x faster"]
```

## Viewing Query Plans

### EXPLAIN

Shows the planned execution strategy:

```sql
EXPLAIN SELECT * FROM users WHERE id = 1;
```

Output:
```
plan
----
SELECT
  Columns: *
  -> PK Lookup on users
       id = 1
```

### EXPLAIN ANALYZE

Shows the plan with actual runtime statistics:

```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE status = 'active';
```

Output:
```
plan
----
SELECT (actual time=2.5ms, rows=1500)
  Columns: *
  -> Seq Scan on users (actual rows=1500)
       Filter: (status = 'active')
```

## Best Practices

### Keep Statistics Updated

Run ANALYZE after significant data changes:

```sql
-- After bulk insert
INSERT INTO orders SELECT * FROM staging_orders;
ANALYZE orders;
```

### Create Appropriate Indexes

The optimizer can only use indexes that exist:

```sql
-- Create index for common query patterns
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
```

### Use EXPLAIN to Understand Plans

Before optimizing, understand current behavior:

```sql
-- Check what plan is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
```

### Consider Query Patterns

Design indexes based on how queries filter and join:

```sql
-- If queries often filter by status and date together
CREATE INDEX idx_orders_status_date ON orders(status, order_date);
```

## Cost Constants

The optimizer uses these tunable constants:

| Constant | Description | Default |
|----------|-------------|---------|
| cpu_tuple_cost | Cost to process one row | 0.01 |
| cpu_operator_cost | Cost to evaluate one predicate | 0.0025 |
| seq_page_cost | Cost for sequential I/O | 1.0 |
| random_page_cost | Cost for random I/O | 2.0 |
| pk_lookup_cost | Cost for primary key lookup | 0.1 |

These values are tuned for in-memory operation and may differ from disk-based databases.