# sql-rs Architecture
## Overview
sql-rs is a lightweight embedded database combining traditional relational database features with vector database capabilities. It follows a modular, layered architecture inspired by SQLite and modern vector databases.
## System Architecture
```
┌─────────────────────────────────────────────────────────┐
│ CLI Layer │
│ (Command-line interface using clap) │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Query Engine │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SQL Parser │→ │ Executor │→ │ Optimizer │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Storage Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ B-Tree │ │ Page Cache │ │ WAL │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Vector Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ HNSW Index │ │ Collections │ │ Similarity │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ File Layer │
│ (Memory-mapped files, page management) │
└─────────────────────────────────────────────────────────┘
```
## Module Structure
### 1. Storage Layer (`src/storage/`)
#### B-Tree (`btree.rs`)
- Implements a B-tree index structure for key-value storage
- Node size: 64 keys per node (MAX_KEYS_PER_NODE)
- Supports: insert, get, delete, scan operations
- Uses serialization with bincode for node persistence
#### Page Management (`page.rs`)
- Fixed-size pages (4KB default)
- Page types: Data, Index, Vector, Free
- PageCache: LRU-based caching for frequently accessed pages
- PageId: Unique identifier for each page
#### File Manager (`file.rs`)
- Memory-mapped file I/O using memmap2
- Page allocation and deallocation
- File growth and management
#### Storage Trait (`mod.rs`)
- Abstract interface for storage operations
- Methods: get, put, delete, scan, flush
- Implemented by StorageEngine
### 2. Query Engine (`src/query/`)
#### Parser (`parser.rs`)
- SQL parsing for basic operations
- Supported statements:
- CREATE TABLE
- INSERT INTO
- SELECT (with WHERE clause)
- UPDATE
- Simple tokenization and parsing logic
#### Executor (`executor.rs`)
- Query execution engine
- Schema management
- Row-level operations
- WHERE clause evaluation
- Type conversion and validation
### 3. Vector Layer (`src/vector/`)
#### HNSW Index (`hnsw.rs`)
- Hierarchical Navigable Small World graph
- Approximate nearest neighbor search
- Configurable parameters: M, ef_construction, ef_search
#### Vector Collection (`collection.rs`)
- Manages vectors with metadata
- Dimension validation
- Distance metric support
#### Similarity (`similarity.rs`)
- Distance metrics:
- Cosine similarity
- Euclidean distance
- Dot product
- Vector normalization utilities
#### Vector Store (`store.rs`)
- High-level API for vector operations
- Collection management
- Integration with storage layer
### 4. Type System (`src/types/`)
#### Value (`value.rs`)
- Core data types: Null, Integer, Float, Text, Blob, Boolean
- Serialization support
- Type conversions
#### Schema (`schema.rs`)
- Table schema definition
- Column metadata
- Type information
### 5. CLI Layer (`src/cli/`)
#### Commands (`commands.rs`)
- Command-line interface using clap
- Subcommands:
- create: Create database
- query: Execute SQL
- insert: Insert data
- info: Database info
- vector: Vector operations (add, search, create)
## Data Flow
### SQL Query Execution
1. User inputs SQL via CLI
2. Parser tokenizes and parses SQL into Statement
3. Executor validates schema and executes statement
4. Storage layer performs B-tree operations
5. Results returned to user
### Vector Search
1. User provides query vector via CLI
2. VectorStore retrieves collection
3. HNSW index performs approximate nearest neighbor search
4. Results sorted by distance metric
5. Top-k results returned with metadata
## File Format
### Database File Structure
```
┌─────────────────────────────────────┐
│ File Header │
│ - Magic bytes │
│ - Version │
│ - Page size │
│ - Root page ID │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Page 0 (Root) │
│ - B-tree root node │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Page 1..N │
│ - Data pages │
│ - Index pages │
│ - Vector pages │
└─────────────────────────────────────┘
```
### Page Layout
```
┌─────────────────────────────────────┐
│ Page Header │
│ - Page ID (4 bytes) │
│ - Page Type (1 byte) │
│ - Data Length (4 bytes) │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Page Data │
│ - Serialized node/data │
└─────────────────────────────────────┘
```
## Concurrency Model
- **Read-Write Lock**: Uses parking_lot::RwLock for thread-safe access
- **Page Cache**: Thread-safe LRU cache
- **File Manager**: Synchronized file operations
## Performance Characteristics
### Storage
- **Insert**: O(log n) - B-tree insertion
- **Get**: O(log n) - B-tree lookup
- **Scan**: O(k + log n) - Range scan with k results
- **Page Cache**: O(1) - LRU cache lookup
### Vector Search
- **HNSW Build**: O(n log n) - Index construction
- **Search**: O(log n) - Approximate nearest neighbor
- **Exact Search**: O(n) - Brute force fallback
## Memory Management
- **Page Cache**: LRU eviction policy
- **Memory-mapped Files**: OS-managed memory mapping
- **Vector Index**: In-memory HNSW graph
## Error Handling
- Custom error type: `sql-rsError`
- Error categories:
- Io: File system errors
- Serialization: Data encoding errors
- Storage: B-tree operation errors
- Query: SQL parsing/execution errors
- Vector: Vector operation errors
- NotFound: Missing data errors
- InvalidOperation: Logic errors
## Future Enhancements
### Planned Features
1. **WAL (Write-Ahead Logging)**: Durability and crash recovery
2. **Transactions**: ACID compliance with BEGIN/COMMIT/ROLLBACK
3. **Secondary Indexes**: Multiple indexes per table
4. **Query Optimizer**: Cost-based query optimization
5. **Compression**: Data compression for large values
6. **Replication**: Master-slave replication
### Performance Improvements
1. **Parallel Query Execution**: Multi-threaded queries
2. **Batch Operations**: Bulk insert/update
3. **Adaptive Indexing**: Automatic index creation
4. **Query Caching**: Result caching for repeated queries
## Testing Strategy
- **Unit Tests**: Module-level testing
- **Integration Tests**: End-to-end workflows
- **Benchmark Tests**: Performance validation
- **Fuzz Tests**: Parser robustness
- **Property Tests**: Invariant validation
## Dependencies
- **clap**: CLI parsing
- **serde/serde_json**: Serialization
- **bincode**: Binary encoding
- **thiserror/anyhow**: Error handling
- **memmap2**: Memory-mapped files
- **parking_lot**: Synchronization primitives
- **tempfile**: Test utilities
- **criterion**: Benchmarking
## License
Copyright 2025 SQL-RS Contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.