RustMemDB
A lightweight, in-memory SQL database engine written in pure Rust with a focus on educational clarity and extensibility.
π Table of Contents
- Overview
- Mission & Purpose
- Architecture
- Features
- Installation
- Quick Start
- Usage Examples
- API Documentation
- Performance Characteristics
- Design Patterns
- Extensibility
- Limitations
- Roadmap
- Contributing
- Educational Resources
- License
π Additional Documentation
- DEVELOPER_GUIDE.md - Complete guide for adding new features to RustMemoDB
- PRODUCTION_READINESS_ANALYSIS.md - Production readiness assessment
π― Overview
RustMemDB is an educational in-memory SQL database that demonstrates how modern relational databases work under the hood. Built entirely in Rust, it implements a complete SQL query execution pipeline from parsing to result generation, while maintaining clean architecture and extensible design.
Unlike production databases (PostgreSQL, MySQL), RustMemDB prioritizes:
- Code Clarity - Easy to understand implementation
- Educational Value - Learn database internals by reading/modifying code
- Extensibility - Plugin-based architecture for adding features
- Type Safety - Leveraging Rust's strong type system
What Makes It Unique?
// Simple, clean API
let mut db = new;
db.execute?;
db.execute?;
let result = db.execute?;
result.print;
Under the hood, this simple query goes through a complete database pipeline:
- SQL Parsing β AST (Abstract Syntax Tree)
- Query Planning β Logical execution plan
- Optimization β (Future: predicate pushdown, join ordering)
- Execution β Physical operators (scan, filter, project, sort)
- Result Formatting β User-friendly output
π― Mission & Purpose
Primary Mission
"Make database internals accessible and understandable through clean, well-documented Rust code."
Target Audience
-
Students & Educators
- Learn how SQL databases work internally
- Understand query processing pipelines
- Study classic database algorithms (sorting, filtering, etc.)
-
Rust Developers
- See real-world application of design patterns
- Learn concurrent data structure design
- Understand plugin architectures
-
Database Enthusiasts
- Prototype new database features
- Experiment with query optimization algorithms
- Build custom storage engines
-
Embedded Systems
- Lightweight SQL for resource-constrained environments
- No external dependencies (pure Rust)
- Small memory footprint
What This Project Is For
β Learning - Study database architecture β Prototyping - Test database algorithms quickly β Testing - In-memory database for unit tests β Embedded SQL - Simple queries in Rust applications β Research - Academic database research projects
What This Project Is NOT For
β Production Databases - Use PostgreSQL, MySQL, SQLite instead β Persistent Storage - Data lost on shutdown (in-memory only) β High Performance - Educational focus over optimization β Full SQL Compliance - Subset of SQL features
ποΈ Architecture
RustMemDB follows the classic three-stage database architecture used by most relational databases:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SQL Query β
β "SELECT * FROM users WHERE age > 25" β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARSER LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SqlParserAdapter (Facade Pattern) β β
β β - Uses sqlparser crate for SQL parsing β β
β β - Converts external AST β Internal AST β β
β β - Plugin-based expression conversion β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ Statement AST
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PLANNER LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β QueryPlanner (Strategy Pattern) β β
β β - AST β LogicalPlan transformation β β
β β - Logical operators: Scan, Filter, Project, Sort β β
β β - Future: Query optimization β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ LogicalPlan
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXECUTOR LAYER β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ExecutorPipeline (Chain of Responsibility) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β DDL: CreateTableExecutor, DropTableExecutor β β β
β β β DML: InsertExecutor, UpdateExecutor, β β β
β β β DeleteExecutor β β β
β β β DQL: QueryExecutor β β β
β β β - TableScan β Filter β Aggregate/Sort β β β
β β β - Project β Limit β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STORAGE LAYER β
β βββββββββββββββββββββ βββββββββββββββββββββββ β
β β Catalog β β InMemoryStorage β β
β β (Copy-on-Write) β β (Row-based) β β
β β β β β β
β β - Table schemas β β - Per-table RwLock β β
β β - Arc<HashMap> β β - Concurrent access β β
β β - Lock-free reads β β - Vec<Row> storage β β
β βββββββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββ
β QueryResult β
β - Columns β
β - Rows β
βββββββββββββββββ
Key Components
1. Parser (src/parser/)
Converts SQL text into an Abstract Syntax Tree (AST).
- SqlParserAdapter - Facade over
sqlparsercrate - Plugin System - Extensible expression conversion
- AST Definition - Internal representation optimized for our needs
2. Planner (src/planner/)
Transforms AST into a logical execution plan.
- QueryPlanner - AST β LogicalPlan converter
- LogicalPlan Nodes - TableScan, Filter, Projection, Sort, Limit
- Future - Query optimization passes
3. Executor (src/executor/)
Executes logical plans against storage.
- ExecutorPipeline - Chain of Responsibility pattern
- Specialized Executors - DDL, DML, DQL handlers
- Physical Operators - Actual data processing
- EvaluatorRegistry - Plugin-based expression evaluation
4. Storage (src/storage/)
In-memory data storage with concurrent access.
- Catalog - Metadata (schemas) with lock-free reads
- InMemoryStorage - Actual row data with fine-grained locking
- TableSchema - Column definitions and constraints
5. Evaluator (src/evaluator/)
Runtime expression evaluation system.
- Plugin Architecture - Extensible evaluators
- Built-in Evaluators - Arithmetic, comparison, logical, LIKE, BETWEEN, IS NULL
- EvaluationContext - Thread-safe expression evaluation
β¨ Features
Currently Implemented
SQL Support
- β
DDL (Data Definition Language)
CREATE TABLEwith column types and constraintsDROP TABLEwithIF EXISTSsupportCREATE INDEXfor faster lookupsALTER TABLE(Basic support: Add/Drop column)
- β
Constraints
PRIMARY KEY(enforces uniqueness and NOT NULL)UNIQUE(enforces uniqueness, allows multiple NULLs)
- β
DML (Data Manipulation Language)
INSERT INTOwith multiple rowsUPDATEwithSETandWHEREclausesDELETE FROMwith conditional filtering
- β
DQL (Data Query Language)
SELECTwith full query capabilities- Aggregate functions (
COUNT,SUM,AVG,MIN,MAX)
- β
Transaction Control
BEGIN/START TRANSACTION- Start a new transactionCOMMIT- Commit all changes atomicallyROLLBACK- Undo all changes in the transaction- Full MVCC support with snapshot isolation
- Manual
close()required for safety (Warning on connection drop)
Query Capabilities
- β
Projection -
SELECT col1, col2orSELECT * - β
Filtering -
WHEREwith complex predicates and parentheses - β
Aggregation -
COUNT(*),SUM(col),AVG(col),MIN(col),MAX(col) - β
Sorting -
ORDER BY col1 ASC, col2 DESC(multiple columns) - β
Limiting -
LIMIT nfor result pagination - β Indexing - B-Tree backed indexes for O(log n) lookups
- β Expressions - Full arithmetic and logical expressions in all clauses
Operators & Functions
- β
Arithmetic -
+,-,*,/,% - β
Comparison -
=,!=,<,<=,>,>= - β
Logical -
AND,OR,NOTwith parentheses support - β
Pattern Matching -
LIKE,NOT LIKE(with%,_wildcards) - β
Range -
BETWEEN x AND y - β
Null Checking -
IS NULL,IS NOT NULL - β
List Membership -
IN (value1, value2, ...) - β
Aggregate Functions -
COUNT,SUM,AVG,MIN,MAX
Data Types
- β INTEGER - 64-bit signed integers
- β FLOAT - 64-bit floating point
- β TEXT - Variable-length strings
- β BOOLEAN - true/false values
- β NULL - Null value support with proper handling
Advanced Features
- β Multi-column sorting with NULL handling
- β Expression evaluation in WHERE, ORDER BY, SELECT, UPDATE
- β Concurrent access - Fine-grained table locking with global singleton
- β Plugin system - Extensible parsers and evaluators
- β Type coercion - Automatic INTEGER β FLOAT conversion
- β Client API - PostgreSQL/MySQL-like connection interface
- β Connection pooling - Efficient connection management
- β User management - Authentication and authorization system
- β Persistence - WAL-based durability and snapshots
Performance Features
- β Per-table locking - Concurrent access to different tables
- β Lock-free catalog reads - Copy-on-Write metadata
- β Stable sorting - Predictable ORDER BY results
- β Efficient aggregation - Single-pass aggregate computation
- β Global singleton - Shared state for all connections
- β Indexing - High-performance data retrieval
Performance Metrics
Sequential UPDATE: 2.9M updates/sec (5,000 rows)
Mixed operations: 7,083 ops/sec (UPDATE + SELECT)
Concurrent access: Stable with 4 threads
Aggregate functions: Fast single-pass computation
Index Scan: O(log n) retrieval vs O(n) full scan
π Installation
Prerequisites
- Rust 1.70 or higher
- Cargo (comes with Rust)
From Source
# Clone the repository
# Build the project
# Run tests
# Run the demo application
As a Library
Add to your Cargo.toml:
[]
= { = "../rustmemodb" } # or from crates.io when published
β‘ Quick Start
Basic Example
use InMemoryDB;
Output:
ββββββ¬ββββββββββ¬ββββββ
β id β name β age β
ββββββΌββββββββββΌββββββ€
β 1 β Alice β 30 β
β 3 β Charlie β 35 β
ββββββ΄ββββββββββ΄ββββββ
π Usage Examples
Example 1: User Management System
use InMemoryDB;
Example 2: Product Catalog
use InMemoryDB;
Example 3: Advanced Queries
use InMemoryDB;
Example 4: NULL Value Handling
use InMemoryDB;
Example 5: UPDATE and DELETE Operations
use Client;
Example 6: Transactions (ACID Support)
use Client;
Example 7: Transaction Rollback
use Client;
Example 8: Aggregate Functions
use Client;
Example 7: Database Statistics
use InMemoryDB;
π API Documentation
Core Types
InMemoryDB
The main database facade providing a simple API.
Client
PostgreSQL/MySQL-style client API with connection pooling.
QueryResult
Result of a query execution.
Value
Represents a SQL value.
DataType
Column data type.
Error Handling
All operations return Result<T, DbError>:
β‘ Performance Characteristics
Time Complexity
| Operation | Complexity | Notes |
|---|---|---|
| CREATE TABLE | O(n) | Clones entire catalog (n = tables) |
| DROP TABLE | O(1) | HashMap removal |
| INSERT | O(1) | Amortized vector push |
| UPDATE | O(n) | n = rows in table (full scan) |
| DELETE | O(n + m log m) | n = scan, m = matches to delete |
| SELECT (full scan) | O(n) | n = rows in table |
| SELECT (with WHERE) | O(n) | No indexes yet |
| SELECT (with ORDER BY) | O(n log n) | Stable sort |
| SELECT (with LIMIT) | O(n) | Must scan before limiting |
| SELECT (with aggregates) | O(n) | Single-pass computation |
Space Complexity
| Structure | Space | Notes |
|---|---|---|
| Row | O(columns) | Vector of values |
| Table | O(rows Γ columns) | Vector of rows |
| Catalog | O(tables Γ columns) | Metadata only |
Concurrency
- Catalog Reads: Lock-free (Copy-on-Write via Arc)
- Table Reads: Multiple concurrent readers (RwLock)
- Table Writes: Exclusive lock per table
- Cross-Table: Different tables can be accessed concurrently
Benchmark Results
Concurrent reads (different tables): ~145ms
Operations: 800 SELECTs
Throughput: ~5,500 ops/sec
Mixed read/write (different tables): ~85ms
Operations: 400 SELECTs + 100 INSERTs
Note: Benchmarks run on M1 Mac, results vary by hardware
π¨ Design Patterns
RustMemDB demonstrates several classic software design patterns:
1. Facade Pattern (InMemoryDB)
Provides a simple interface to a complex subsystem.
// Simple facade hides parser, planner, executor complexity
db.execute?;
2. Chain of Responsibility (ExecutorPipeline)
Each executor decides if it can handle a statement.
for executor in &self.executors
3. Strategy Pattern (QueryPlanner, Executors)
Different strategies for different statement types.
4. Plugin/Registry Pattern (Expression Evaluators)
Extensible evaluation system.
registry.register;
registry.register;
// Users can add custom evaluators
5. Adapter Pattern (SqlParserAdapter)
Adapts external sqlparser API to internal AST.
6. Copy-on-Write (Catalog)
Immutable data structure for lock-free reads.
7. Builder Pattern (Logical Plan construction)
Composable query plans.
π§ Extensibility
RustMemDB uses a plugin-based architecture that makes it easy to add new SQL operators, functions, and statement types without modifying the core engine.
Developer Guide
π See DEVELOPER_GUIDE.md for comprehensive instructions on:
- Understanding the two plugin systems (conversion + evaluation)
- Adding new SQL operators and functions (step-by-step)
- Adding new statement types (e.g., CREATE INDEX)
- Best practices and common pitfalls
- Testing guidelines
- Complete working examples
Quick Example: Adding UPPER() Function
Step 1: Create Conversion Plugin (src/plugins/string_functions.rs)
Step 2: Create Evaluation Plugin (src/evaluator/plugins/string_functions.rs)
Step 3: Register Both Plugins
// In src/plugins/mod.rs
registry.register;
// In src/evaluator/plugins/mod.rs
registry.register;
That's it! Now you can use SELECT UPPER(name) FROM users.
For detailed instructions, examples, and best practices, see DEVELOPER_GUIDE.md.
β οΈ Limitations
Current Limitations
β No JOINs - Single table queries only β No GROUP BY/HAVING - Aggregates work on full result set only β No FOREIGN KEY constraints - Referential integrity not enforced β No views - No CREATE VIEW β Limited SQL - Subset of SQL-92 β No query optimization - Plans not optimized (basic index usage only) β Single process - No client-server architecture
What We Have β
β
Transactions - Full ACID transaction support with MVCC
β
Connection Pooling - Efficient connection management
β
User Authentication - Secure password hashing with bcrypt
β
Concurrent Access - Fine-grained locking for multiple connections
β
Manual Rollback - Safety via explicit close() or rollback() on drop warning
β
Indexes - B-Tree indexes for fast lookups
β
Persistence - Write-Ahead Log (WAL) and Snapshots
Known Issues
See CODE_REVIEW_REPORT.md for detailed issue analysis.
Critical:
- Float comparison uses fixed epsilon (incorrect for large numbers)
- Benchmarks use write locks instead of read locks
- Silent error swallowing in sort comparisons
High:
- Catalog clones entire HashMap on schema changes
- Transaction system exists but not integrated
πΊοΈ Roadmap
Phase 1: Stability β (Completed)
- Basic SELECT, INSERT, CREATE TABLE
- WHERE clause with complex predicates
- ORDER BY with multiple columns
- Plugin-based architecture
- DROP TABLE support
- UPDATE and DELETE statements
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
- Client API and connection pooling
- User management system
- Comprehensive test coverage (380+ passing tests)
- Performance benchmarks (load tests)
- Password hashing with bcrypt
- Transaction support (BEGIN, COMMIT, ROLLBACK)
- MVCC with snapshot isolation
- Basic Indexes (CREATE INDEX)
- Persistence (WAL + Snapshots)
Phase 2: Core Features (Current)
- GROUP BY and HAVING
- Subqueries
- Fix remaining bugs from code review
-
ALTER TABLEfull support
Phase 3: Advanced Features
- INNER JOIN support
- LEFT/RIGHT JOIN support
- Query optimizer (predicate pushdown, join ordering)
- Secondary indexes (optimization)
- Views (CREATE VIEW)
- Constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE)
Phase 4: Production Readiness (Future)
- Query caching
- SQL-92 compliance
Phase 5: Ecosystem
- Client-server architecture
- Wire protocol
- Language bindings (Python, JavaScript)
- SQL shell/REPL
- Migration tools
- Performance profiling tools
π€ Contributing
Contributions are welcome! This is an educational project, so clear, well-documented code is more valuable than clever optimizations.
Developer Resources
π New to the codebase? Start with these guides:
- DEVELOPER_GUIDE.md - Complete guide to adding new features
- PRODUCTION_READINESS_ANALYSIS.md - Architecture analysis and known issues
- CODE_REVIEW_REPORT.md - Detailed code review findings
How to Contribute
- Read DEVELOPER_GUIDE.md for architecture overview
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure all tests pass (
cargo test) - Run clippy (
cargo clippy -- -D warnings) - Format code (
cargo fmt) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Code Clarity > Performance (unless critical path)
- Add tests for all new features (see DEVELOPER_GUIDE.md for test checklist)
- Document public APIs with
///comments - Follow Rust conventions (cargo fmt, clippy)
- Update README if adding user-facing features
- Update DEVELOPER_GUIDE.md if changing plugin architecture
- Reference issues in commits when applicable
Good First Issues
Looking to contribute? Try these:
- CRITICAL: Implement password hashing (bcrypt/argon2) to replace plaintext storage
- Add missing documentation comments
- Implement GROUP BY and HAVING clauses
- Add more expression evaluators (string functions, date functions)
- Improve error messages
- Add more integration tests
- Fix issues from CODE_REVIEW_REPORT.md
π Educational Resources
Understanding the Code
- Start Here: Read
src/main.rsfor a complete example - Architecture: Review the architecture diagram above
- Query Flow: Follow a query through parser β planner β executor
- Tests: Read tests in
src/executor/query.rsfor examples
Learning Database Internals
Recommended Reading:
- "Database Internals" by Alex Petrov
- "Database System Concepts" by Silberschatz, Korth, Sudarshan
- "Architecture of a Database System" (Hellerstein, Stonebraker, Hamilton)
- CMU Database Systems Course (free online)
Related Projects:
- SQLite - Simple, embedded SQL database
- DuckDB - In-process OLAP database
- ToyDB - Educational distributed SQL database in Rust
Rust Resources
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- sqlparser-rs - SQL parsing library
- Rust Community - Excellent documentation and tools
- Database Research - Decades of academic research in database systems
π§ Contact
- GitHub Issues: For bugs and feature requests
- Discussions: For questions and ideas
- Pull Requests: For contributions
β Star History
If you find this project useful for learning, please consider giving it a star!
Built with β€οΈ in Rust