Why Stoolap?
Stoolap is a feature-rich embedded SQL database with capabilities that rival established databases like PostgreSQL and DuckDB - all in a single dependency with zero external requirements.
Performance
┌───────────────────────────────────────────────────────────────┐
│ STOOLAP vs SQLite: 43 wins / 10 losses (81% win rate) │
│ STOOLAP vs DuckDB: 49 wins / 4 losses (92% win rate) │
└───────────────────────────────────────────────────────────────┘
See BENCHMARKS.md for detailed comparisons.
Unique Features
| Feature | Stoolap | SQLite | DuckDB | PostgreSQL |
|---|---|---|---|---|
| AS OF Time-Travel Queries | ✅ | ❌ | ❌ | ❌* |
| MVCC Transactions | ✅ | ❌ | ✅ | ✅ |
| Cost-Based Optimizer | ✅ | ❌ | ✅ | ✅ |
| Adaptive Query Execution | ✅ | ❌ | ❌ | ❌ |
| Semantic Query Caching | ✅ | ❌ | ❌ | ❌ |
| Parallel Query Execution | ✅ | ❌ | ✅ | ✅ |
| Pure Rust (Memory Safe) | ✅ | ❌ | ❌ | ❌ |
| No C/C++ Required | ✅ | ❌ | ❌ | ❌ |
*PostgreSQL requires extensions for temporal queries
Quick Start
Installation
Add to your Cargo.toml:
[]
= "0.2"
Or build from source:
Library Usage
use Database;
Command Line Interface
# Interactive REPL (in-memory)
# Persistent database
# Execute query directly
# Execute SQL file
Features
MVCC Transactions
Full multi-version concurrency control with isolation levels:
-- Read Committed (default)
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- Snapshot Isolation (repeatable reads)
BEGIN TRANSACTION ISOLATION LEVEL SNAPSHOT;
SELECT * FROM accounts; -- Consistent view throughout transaction
COMMIT;
Time-Travel Queries
Query historical data at any point in time - a feature typically only found in enterprise databases:
-- Query data as it existed at a specific timestamp
SELECT * FROM orders AS OF TIMESTAMP '2024-01-15 10:30:00';
-- Query data as of a specific transaction
SELECT * FROM inventory AS OF TRANSACTION 1234;
-- Compare current vs historical data
SELECT
c.price AS current_price,
h.price AS old_price,
c.price - h.price AS change
FROM products c
JOIN products AS OF TIMESTAMP '2024-01-01 00:00:00' h ON c.id = h.id
WHERE c.price != h.price;
Smart Indexes
Automatic index type selection based on data characteristics:
-- B-tree (auto-selected for INTEGER, FLOAT, TIMESTAMP)
-- Best for: range queries, sorting, prefix matching
(created_at);
SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
-- Hash (auto-selected for TEXT, JSON)
-- Best for: O(1) equality lookups
(email);
SELECT * FROM users WHERE email = 'alice@example.com';
-- Bitmap (auto-selected for BOOLEAN)
-- Best for: low-cardinality columns, efficient AND/OR
(status) USING BITMAP;
-- Multi-column composite indexes
(user_id, event_type);
(customer_id, order_date);
Window Functions
Full analytical query support:
SELECT
employee_name,
department,
salary,
ROW_NUMBER OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
LAG(salary) OVER (ORDER BY hire_date) AS prev_salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
SUM(salary) OVER (ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM employees;
Common Table Expressions (CTEs)
Including recursive queries for hierarchical data:
-- Recursive CTE: organizational hierarchy
WITH RECURSIVE org_chart AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees under managers
SELECT e.id, e.name, e.manager_id, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, name;
Advanced Aggregations
-- ROLLUP: hierarchical subtotals
SELECT region, product, SUM(sales)
FROM sales GROUP BY ROLLUP(region, product);
-- CUBE: all dimension combinations
SELECT region, product, SUM(sales)
FROM sales GROUP BY CUBE(region, product);
-- GROUPING SETS: custom combinations
SELECT region, product, SUM(sales)
FROM sales GROUP BY GROUPING SETS ((region, product), (region), );
Subqueries
Scalar, correlated, EXISTS, IN, ANY/ALL:
-- Correlated subquery
SELECT * FROM employees e
WHERE salary > (
SELECT AVG(salary) FROM employees
WHERE department = e.department
);
-- EXISTS with correlation
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.amount > 1000
);
Query Optimizer
PostgreSQL-style cost-based optimizer with runtime adaptation:
-- Collect statistics for better query plans
ANALYZE orders;
ANALYZE customers;
-- View query plan with cost estimates
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
-- View plan with actual execution statistics
EXPLAIN ANALYZE
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'US';
Data Types
| Type | Description | Example |
|---|---|---|
INTEGER |
64-bit signed integer | 42, -100 |
FLOAT |
64-bit floating point | 3.14, -0.001 |
TEXT |
UTF-8 string | 'hello', '日本語' |
BOOLEAN |
true/false | TRUE, FALSE |
TIMESTAMP |
Date and time | '2024-01-15 10:30:00' |
JSON |
JSON data | '{"key": "value"}' |
Built-in Functions (100+)
UPPER, LOWER, LENGTH, TRIM, LTRIM, RTRIM, CONCAT, SUBSTRING, REPLACE, REVERSE, LEFT, RIGHT, LPAD, RPAD, REPEAT, POSITION, LOCATE, INSTR, SPLIT_PART, INITCAP, ASCII, CHR, TRANSLATE
ABS, CEIL, FLOOR, ROUND, TRUNC, SQRT, POWER, MOD, SIGN, GREATEST, LEAST, EXP, LN, LOG, LOG10, LOG2, SIN, COS, TAN, ASIN, ACOS, ATAN, ATAN2, DEGREES, RADIANS, PI, RAND, RANDOM
NOW, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, EXTRACT, DATE_TRUNC, DATE_ADD, DATE_SUB, DATEDIFF, YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, DAYOFWEEK, DAYOFYEAR, WEEK, QUARTER, TO_CHAR, TO_DATE, TO_TIMESTAMP
JSON_EXTRACT, JSON_EXTRACT_PATH, JSON_TYPE, JSON_TYPEOF, JSON_VALID, JSON_KEYS, JSON_ARRAY_LENGTH
COUNT, SUM, AVG, MIN, MAX, STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP, STRING_AGG, ARRAY_AGG, FIRST, LAST, BIT_AND, BIT_OR, BIT_XOR, BOOL_AND, BOOL_OR
ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE, PERCENT_RANK, CUME_DIST
COALESCE, NULLIF, CAST, CASE, IF, IIF, NVL, NVL2, DECODE, GENERATE_SERIES
Storage & Persistence
# In-memory (fastest, data lost on exit)
# File-based (durable storage with WAL)
Durability features:
- Write-Ahead Logging (WAL): All changes logged before applied
- Periodic Snapshots: Fast recovery from crashes
- Index Persistence: All indexes saved and restored automatically
Architecture
src/
├── api/ # Public API (Database, Connection, Rows)
├── core/ # Core types (Value, Row, Schema, Error)
├── parser/ # SQL lexer and parser
├── optimizer/ # Cost-based query optimizer
│ ├── cost.rs # Cost model with I/O and CPU costs
│ ├── join.rs # Join optimization (dynamic programming)
│ ├── bloom.rs # Bloom filter propagation
│ └── aqe.rs # Adaptive query execution
├── executor/ # Query execution engine
│ ├── operators/ # Volcano-style operators
│ ├── parallel.rs # Parallel execution (Rayon)
│ └── expression/ # Expression VM
├── functions/ # 100+ built-in functions
│ ├── scalar/ # String, math, date, JSON
│ ├── aggregate/ # COUNT, SUM, AVG, etc.
│ └── window/ # ROW_NUMBER, RANK, LAG, etc.
└── storage/ # Storage engine
├── mvcc/ # Multi-version concurrency control
└── index/ # B-tree, Hash, Bitmap indexes
Development
Building
Testing
Code Quality
Documentation
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
Apache License 2.0. See LICENSE for details.