FeatherDB Query Engine
The query engine transforms SQL strings into executable query plans and produces results. It features a cost-based optimizer with index selection, join ordering, and prepared statement support with plan caching.
Architecture Overview
SQL String
|
v
+-------------------------------------------------------------+
| Parser (parser.rs) |
| - Uses sqlparser-rs for SQL parsing |
| - Rich error messages with keyword suggestions |
| - Data type conversion to FeatherDB types |
+-------------------------------------------------------------+
|
v
+-------------------------------------------------------------+
| Planner (planner.rs) |
| - AST to LogicalPlan conversion |
| - Table/column resolution with suggestions |
| - Parameter placeholder support ($1, $2, ?1, ?2) |
+-------------------------------------------------------------+
|
v
+-------------------------------------------------------------+
| Optimizer (optimizer.rs) |
| - Predicate pushdown |
| - Projection pushdown |
| - Constant folding |
| - Index selection (100-1000x speedup) |
| - Join order optimization (greedy algorithm) |
| - Cost estimation with table statistics |
+-------------------------------------------------------------+
|
v
+-------------------------------------------------------------+
| Executor (executor.rs) |
| - Volcano model (pull-based iteration) |
| - Hash join for equi-joins |
| - Index scan execution |
| - MVCC visibility checking |
| - EXPLAIN with cost annotations |
+-------------------------------------------------------------+
|
v
Query Results
Components
1. Parser (parser.rs)
Wraps sqlparser-rs to parse SQL into AST with enhanced error messages.
Features:
- Parse SQL strings into AST using
GenericDialect - Rich error context with line/column positions
- Keyword typo detection and suggestions (e.g., "SELEKT" suggests "SELECT")
- Data type conversion to FeatherDB's
ColumnType
Supported SQL:
-- DDL
(col TYPE [constraints], ...)
[IF EXISTS] name
(col)
-- DML
SELECT [DISTINCT] cols FROM table [AS alias]
[JOIN table ON condition]
[WHERE condition]
[GROUP BY cols] [HAVING condition]
[ORDER BY cols [ASC|DESC]]
[LIMIT n [OFFSET m]]
-- CTEs (Common Table Expressions)
WITH cte_name [(col1, col2, ...)] AS (
SELECT ...
)
SELECT * FROM cte_name;
-- Subqueries
SELECT * FROM table WHERE col IN (SELECT ...);
SELECT * FROM table WHERE EXISTS (SELECT ...);
SELECT * FROM (SELECT ...) AS derived_table;
INSERT INTO table [(cols)] VALUES (...)
UPDATE table SET col = expr [WHERE condition]
DELETE FROM table [WHERE condition]
-- Query analysis
EXPLAIN [VERBOSE] [ANALYZE] query
Data Types:
| SQL Type | FeatherDB Type |
|---|---|
BOOLEAN |
Boolean |
SMALLINT, INT, INTEGER, BIGINT, TINYINT |
Integer |
FLOAT, REAL, DOUBLE, DECIMAL, NUMERIC |
Real |
CHAR(n), VARCHAR(n), TEXT |
Text { max_len } |
BINARY, VARBINARY, BLOB, BYTEA |
Blob { max_len } |
TIMESTAMP, DATETIME, DATE |
Timestamp |
2. Planner (planner.rs)
Converts SQL AST to logical query plans with comprehensive error handling.
Logical Plan Types:
Join Types:
Inner- Standard inner joinLeft- Left outer joinRight- Right outer joinFull- Full outer join
Index Range Types:
point(value)- Equality lookup (fastest)between(start, end)- Range with both boundsfrom_inclusive/exclusive(value)- Range >= or >to_inclusive/exclusive(value)- Range <= or <full()- Full index scan
Error Handling:
- Table not found with suggestions for typos
- Column not found with similar name suggestions
- Ambiguous column references in joins
- Type validation for constraints
3. Optimizer (optimizer.rs)
Applies transformation rules and cost-based optimization.
Optimization Rules:
| Rule | Description | Benefit |
|---|---|---|
| Predicate Pushdown | Push filters closer to table scans | Reduces rows early |
| Projection Pushdown | Only read needed columns | Reduces I/O |
| Constant Folding | Evaluate constant expressions at plan time | Avoids runtime computation |
| Index Selection | Convert Scan+Filter to IndexScan | 100-1000x speedup |
| Join Order Optimization | Reorder joins by estimated cardinality | 10-100x speedup |
Index Selection:
The optimizer automatically selects indexes for:
- Equality predicates:
WHERE id = 5-> point lookup - Range predicates:
WHERE age > 18 AND age < 65-> range scan - Combined predicates: indexed predicate + residual filter
-- Example: query with index
SELECT * FROM users WHERE id = 5 AND name = 'Alice';
-- Optimizer creates: IndexScan(id=5) + residual_filter(name='Alice')
Cost Estimation:
The CostEstimator provides:
- Cardinality estimation for each plan node
- Cost estimation based on:
- Sequential scan: 1.0 per row
- Index scan: 1.5 per row (but fewer rows)
- Hash join: build cost + probe cost
- Sort: n log n amortized
Cost Constants:
SEQ_SCAN_COST_PER_ROW: 1.0
INDEX_SCAN_COST_PER_ROW: 1.5
HASH_JOIN_COST_PER_ROW: 0.1
HASH_BUILD_COST_PER_ROW: 1.0
SORT_COST_PER_ROW: 2.0
DEFAULT_SELECTIVITY: 0.1
DEFAULT_RANGE_SELECTIVITY: 0.3
DEFAULT_JOIN_SELECTIVITY: 0.1
Join Order Optimization:
For multi-way joins (3+ tables), uses greedy algorithm:
- Extract all base relations and predicates
- Estimate cardinalities using catalog statistics
- Iteratively join the pair with smallest intermediate result
- Place smaller table on build side of hash join
4. Executor (executor.rs)
Executes physical operators using the Volcano (iterator) model.
Physical Operators:
| Operator | Description |
|---|---|
SeqScan |
Sequential table scan with optional pushed filter |
IndexScan |
Index-based lookup (point or range) |
SubqueryScan |
Execute subquery and expose as table |
SemiJoin |
Semi-join for EXISTS and IN subqueries |
AntiJoin |
Anti-join for NOT EXISTS and NOT IN subqueries |
Filter |
Evaluate predicate on each row |
Project |
Compute output expressions, expand wildcards |
HashJoin |
Hash-based equi-join (preferred for equality) |
NestedLoopJoin |
Fallback for non-equi joins |
HashAggregate |
Group-by aggregation with hash table |
Sort |
In-memory quicksort |
Limit |
Limit/offset row output |
Distinct |
Remove duplicate rows via hash set |
Execution Context:
Query Results:
EXPLAIN Support:
-- Show query plan with cost estimates
EXPLAIN SELECT * FROM users WHERE id = 5;
-- Show detailed costs
EXPLAIN VERBOSE SELECT * FROM users WHERE id = 5;
-- Actually execute and show timing
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 5;
Output includes:
- Plan tree structure
- Cost estimates per node
- Row estimates per node
- For ANALYZE: actual execution time and row counts
5. Expressions (expr.rs)
Expression representation and evaluation.
Expression Types:
Binary Operators:
- Arithmetic:
+,-,*,/,% - Comparison:
=,!=,<,<=,>,>= - Logical:
AND,OR - String:
||(concatenation)
Unary Operators:
NOT- Logical negation-- Numeric negationIS NULL/IS NOT NULL
Aggregate Functions:
COUNT(*),COUNT(col),COUNT(DISTINCT col)SUM(col),AVG(col)MIN(col),MAX(col)
Built-in Functions:
String Functions:
UPPER(text),LOWER(text)- Case conversionLENGTH(text),LENGTH(blob)- Get lengthSUBSTR(text, start [, length])- Extract substring (1-based indexing, SQL standard)TRIM(text)- Remove leading and trailing whitespaceLTRIM(text)- Remove leading whitespaceRTRIM(text)- Remove trailing whitespaceREPLACE(text, from, to)- Replace all occurrences
Date/Time Functions:
DATE(timestamp)- Extract date portion ("YYYY-MM-DD")YEAR(timestamp)- Extract year as integerMONTH(timestamp)- Extract month as integer (1-12)DAY(timestamp)- Extract day as integer (1-31)NOW()- Get current timestamp
Numeric Functions:
ABS(numeric)- Absolute value
Utility Functions:
COALESCE(val1, val2, ...)- First non-NULL valueNULLIF(val1, val2)- Return NULL if equal
NULL Handling:
- Most operations propagate NULL
AND:FALSE AND NULL=FALSEOR:TRUE OR NULL=TRUE- Division by zero returns NULL
6. Prepared Statements (prepared.rs)
Plan caching for repeated queries (5-10x speedup).
PreparedStatement:
Parameter Placeholders:
- PostgreSQL style:
$1,$2,$3, ... - Question mark with index:
?1,?2,?3, ...
PlanCache:
- LRU cache with configurable max size (default: 1000)
- Automatic eviction of least recently used plans
- Schema-aware invalidation (on DDL)
- Table-specific invalidation
Cache Statistics:
Usage Example:
use ;
// Create cache
let cache = new;
// Prepare statement (parse + plan + optimize once)
let stmt = db.prepare?;
// Execute multiple times with different parameters
let params = vec!;
let plan = substitute_plan_params?;
let result = execute?;
// Cache statistics
println!;
Query Execution Flow
1. Parse: "SELECT name FROM users WHERE id = $1"
|
2. Plan: Project(name)
|__ Filter(id = $1)
|__ Scan(users)
|
3. Optimize: IndexScan(users, idx_id, point($1), project=[name])
|
4. Substitute: IndexScan(users, idx_id, point(42), project=[name])
|
5. Execute: IndexScan iterator -> MVCC visibility check -> project
|
6. Results: Row { name: "Alice" }
Usage Examples
Basic Query Execution
use ;
// Parse SQL
let stmt = parse_one?;
// Create logical plan
let planner = new;
let plan = planner.plan?;
// Optimize (with index selection and cost-based optimization)
let optimizer = with_catalog;
let optimized = optimizer.optimize;
// Execute
let ctx = ExecutionContext ;
match execute?
Using Prepared Statements
use ;
use ;
// Create plan cache
let cache = new;
// Parse and plan with parameters
let sql = "SELECT * FROM users WHERE id = $1 AND status = $2";
let stmt = parse_one?;
let plan = planner.plan?;
let optimized = optimizer.optimize;
// Create prepared statement
let param_count = count_plan_params;
let prepared = new;
// Cache it
cache.insert;
// Execute with parameters
let params = vec!;
prepared.validate_params?;
let bound_plan = substitute_plan_params?;
let result = execute?;
Using EXPLAIN
// Show query plan
let stmt = parse_one?;
let plan = planner.plan?;
let result = execute?;
// Output:
// QUERY PLAN
// ============================================================
// -> Index Lookup using idx_users_id on users
//
// Estimated total cost: 1.50
// Estimated rows: 1
// With verbose costs
let stmt = parse_one?;
// Shows cost and row estimates for each node
// With actual execution stats
let stmt = parse_one?;
// Shows actual execution time and row counts
Configuration Options
Optimizer Configuration
// Without catalog (no index selection or join ordering)
let optimizer = new;
// With catalog (enables all optimizations)
let optimizer = with_catalog;
Plan Cache Configuration
// Custom cache size
let cache = new;
// Default size (1000 entries)
let cache = default_size;
// Invalidate on schema changes
cache.invalidate_all;
// Invalidate specific table
cache.invalidate_table;
Testing
# Run all query engine tests (using Make)
# Or with cargo directly
# Test specific component
# Run with coverage
Performance Considerations
- Index Selection: Ensure indexes exist on frequently filtered columns
- Predicate Pushdown: Filters are automatically pushed to scans
- Projection Pruning: Select only needed columns
- Join Order: Optimizer reorders joins for multi-table queries
- Prepared Statements: Use for repeated queries (5-10x speedup)
- Statistics: Keep table statistics updated for accurate cost estimates
Implemented Features
Core Query Processing:
- SQL parsing with rich error messages
- Logical plan generation
- Predicate pushdown optimization
- Projection pushdown optimization
- Constant folding
- Index selection (point and range)
- Join order optimization
- Cost-based estimation with statistics
- Prepared statements with plan caching
- EXPLAIN with cost annotations
- Hash join for equi-joins
- MVCC-aware execution
Advanced SQL:
- Common Table Expressions (WITH clause, column aliasing, chained CTEs)
- Subquery support (scalar, IN, NOT IN, EXISTS, NOT EXISTS, correlated)
- Semi-join and anti-join optimization
- Derived tables (subquery in FROM clause)
- UNION / UNION ALL / INTERSECT / EXCEPT set operations
- CREATE VIEW / DROP VIEW
- CAST, EXTRACT, date interval arithmetic
Functions:
- String: UPPER, LOWER, LENGTH, SUBSTR, TRIM, LTRIM, RTRIM, REPLACE
- Date/time: DATE, YEAR, MONTH, DAY, NOW
- Math: ABS, SQRT, POWER, FLOOR, CEIL, LOG, MOD
- Aggregate: COUNT, SUM, AVG, MIN, MAX
- Window: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD
- Utility: COALESCE, NULLIF
Constraints:
- CHECK constraints (parsing and enforcement)
- Foreign key constraints (CASCADE, RESTRICT, SET NULL, NO ACTION)
- UNIQUE constraints
- NOT NULL constraints
Future Improvements
- Parallel query execution
- Lateral joins
- Recursive CTEs
- External sort for large datasets
- Merge join algorithm
- Bitmap index scans
- Adaptive hash aggregate with spill-to-disk