yamlbase 0.4.4

A lightweight SQL server that serves YAML-defined tables over standard SQL protocols
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
# YAML SQL Server - Product Requirements Document

## Executive Summary

A lightweight SQL server application that serves YAML-defined tables over standard SQL protocols, designed for local development and testing environments. The application reads table definitions and data from YAML files and exposes them via SQL queries on the standard SQL port (1433 for SQL Server protocol, 3306 for MySQL protocol, or 5432 for PostgreSQL protocol).

## Problem Statement

Developers frequently need to test applications against databases during development, but setting up full database servers with test data is time-consuming and resource-intensive. Existing solutions either require complex database installations or don't provide authentic SQL protocol compatibility for testing database drivers and ORMs.

## Goals

**Primary Goals:**
- Enable instant database setup for local testing using simple YAML files
- Provide authentic SQL protocol compatibility for testing database connections
- Minimize resource usage compared to full database servers
- Support standard SQL SELECT queries for read operations

**Secondary Goals:**
- Future extensibility for write operations (INSERT, UPDATE, DELETE)
- Multiple SQL protocol support (PostgreSQL, MySQL, SQL Server)
- Hot-reloading of YAML files during development

## Target Users

- Application developers testing database integrations
- QA engineers setting up test environments
- DevOps engineers creating lightweight testing infrastructure
- Students learning SQL and database concepts

## Core Features

### Phase 1: Read-Only SQL Server

**YAML Schema Support**
- Parse YAML files with table definitions including column names and data types
- Support common SQL data types: INTEGER, VARCHAR(n), TEXT, TIMESTAMP, BOOLEAN, DECIMAL(p,s)
- Handle multiple tables per YAML file or multiple YAML files
- Validate data against defined schema on startup

**SQL Protocol Server**
- Listen on configurable port (default based on chosen SQL dialect)
- Implement core SQL protocol handshake and authentication
- Support basic authentication (username/password) with configurable credentials
- Handle connection pooling and concurrent client connections

**Query Processing**
- Parse and execute SELECT statements with WHERE clauses
- Support basic SQL operations: filtering, sorting (ORDER BY), limiting (LIMIT/TOP)
- Implement JOIN operations between tables
- Support aggregate functions: COUNT, SUM, AVG, MIN, MAX
- Provide basic SQL functions: string manipulation, date functions

**Data Management**
- Load YAML data into in-memory table structures on startup
- Validate data types and constraints
- Support file watching for hot-reload during development
- Handle large datasets efficiently with lazy loading if needed

### Phase 2: Write Operations (Future)

**Mutation Support**
- INSERT statements with validation against schema
- UPDATE statements with WHERE clause support
- DELETE statements
- Transaction support (BEGIN, COMMIT, ROLLBACK)

**Persistence Options**
- Write changes back to YAML files
- Optional in-memory only mode for temporary testing
- Backup and restore functionality

## Technical Requirements

### Performance
- Handle up to 10,000 records per table efficiently
- Support up to 10 concurrent connections
- Query response time under 100ms for simple queries
- Memory usage under 100MB for typical test datasets

### Compatibility
- Support PostgreSQL wire protocol (recommended for broad compatibility)
- Optional MySQL protocol support
- Standard SQL syntax compliance for core operations
- Compatible with popular database drivers and ORMs

### Configuration
- YAML file path specification via command line or config file
- Port configuration
- Authentication settings
- Logging levels and output options

### Error Handling
- Graceful handling of malformed YAML files
- Clear error messages for SQL syntax errors
- Connection timeout and cleanup
- Detailed logging for debugging

## YAML File Format Specification

```yaml
database:
  name: "test_db"

tables:
  users:
    columns:
      id: "INTEGER PRIMARY KEY"
      name: "VARCHAR(100) NOT NULL"
      email: "VARCHAR(255) UNIQUE"
      created_at: "TIMESTAMP DEFAULT CURRENT_TIMESTAMP"
      is_active: "BOOLEAN DEFAULT true"
    data:
      - id: 1
        name: "John Doe"
        email: "john@example.com"
        created_at: "2024-01-15 10:30:00"
        is_active: true
      - id: 2
        name: "Jane Smith"
        email: "jane@example.com"
        created_at: "2024-01-16 14:22:00"
        is_active: false

  orders:
    columns:
      id: "INTEGER PRIMARY KEY"
      user_id: "INTEGER REFERENCES users(id)"
      amount: "DECIMAL(10,2)"
      status: "VARCHAR(50)"
    data:
      - id: 101
        user_id: 1
        amount: 29.99
        status: "completed"
      - id: 102
        user_id: 2
        amount: 45.50
        status: "pending"
```

## Implementation Architecture

## Rust Implementation Architecture

### Project Structure
```
yaml-sql-server/
├── Cargo.toml
├── src/
│   ├── main.rs              # CLI entry point and server setup
│   ├── lib.rs               # Public API and module exports
│   ├── config.rs            # Configuration management
│   ├── yaml/
│   │   ├── mod.rs           # YAML parsing and validation
│   │   ├── parser.rs        # YAML to internal schema conversion
│   │   └── watcher.rs       # File watching for hot-reload
│   ├── database/
│   │   ├── mod.rs           # Database abstraction layer
│   │   ├── schema.rs        # Table schema definitions
│   │   ├── storage.rs       # In-memory data storage
│   │   └── index.rs         # Basic indexing for performance
│   ├── sql/
│   │   ├── mod.rs           # SQL processing coordination
│   │   ├── parser.rs        # SQL AST parsing wrapper
│   │   ├── executor.rs      # Query execution engine
│   │   └── functions.rs     # SQL function implementations
│   ├── protocol/
│   │   ├── mod.rs           # Protocol abstraction
│   │   ├── postgres.rs      # PostgreSQL wire protocol
│   │   └── connection.rs    # Connection management
│   └── server.rs            # Main server loop and client handling
├── tests/
│   ├── integration/         # End-to-end protocol tests
│   ├── sql/                 # SQL query execution tests
│   └── yaml/                # YAML parsing tests
└── examples/
    ├── sample_data.yaml     # Example database files
    └── client_test.rs       # Example client connections
```

### Key Rust Dependencies

```toml
[dependencies]
# Async runtime and networking
tokio = { version = "1.0", features = ["full"] }
tokio-postgres = "0.7"  # For protocol reference
bytes = "1.0"

# SQL and data processing
sqlparser = "0.40"
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.9"

# CLI and configuration
clap = { version = "4.0", features = ["derive"] }
config = "0.14"

# Error handling and logging
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

# File watching
notify = "6.0"

# Performance and utilities
dashmap = "5.0"  # Concurrent HashMap for indexing
chrono = { version = "0.4", features = ["serde"] }
uuid = { version = "1.0", features = ["v4"] }

[dev-dependencies]
tokio-test = "0.4"
criterion = "0.5"
tempfile = "3.0"
```

### Core Type Definitions

```rust
// src/database/schema.rs
use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SqlType {
    Integer,
    Varchar(usize),
    Text,
    Timestamp,
    Boolean,
    Decimal(u8, u8), // precision, scale
}

#[derive(Debug, Clone)]
pub struct Column {
    pub name: String,
    pub sql_type: SqlType,
    pub primary_key: bool,
    pub nullable: bool,
    pub unique: bool,
    pub default: Option<String>,
}

#[derive(Debug, Clone)]
pub struct Table {
    pub name: String,
    pub columns: Vec<Column>,
    pub data: Vec<HashMap<String, serde_yaml::Value>>,
}

#[derive(Debug)]
pub struct Database {
    pub name: String,
    pub tables: HashMap<String, Table>,
}
```

### Performance Optimizations

**Connection Pooling**
- Use `tokio::sync::Semaphore` to limit concurrent connections
- Implement connection reuse with `Arc<Mutex<Connection>>` patterns
- Graceful connection cleanup on client disconnect

**Query Optimization**
- Basic hash indexing for primary keys using `DashMap`
- Query plan caching for repeated queries
- Lazy evaluation for large result sets

**Memory Management**
- `Arc<RwLock<Database>>` for shared read access, exclusive write access
- Copy-on-write for data modifications in future write support
- Efficient string handling with `Cow<'_, str>` where appropriate

### Technology Stack (Rust Implementation)

**Core Language:** Rust 1.70+
**SQL Parsing:** `sqlparser` crate for robust SQL AST parsing
**PostgreSQL Protocol:** `tokio-postgres` protocol implementation or custom using `bytes`
**Async Runtime:** `tokio` for async I/O and connection handling
**YAML Processing:** `serde_yaml` for parsing, `serde` for serialization
**Configuration:** `clap` for CLI, `config` crate for file-based config
**Logging:** `tracing` with `tracing-subscriber` for structured logging
**Error Handling:** `anyhow` for error chaining, `thiserror` for custom errors
**Testing:** `tokio-test` for async testing, `criterion` for benchmarks

## Command Line Interface

```bash
yaml-sql-server [OPTIONS]

Options:
  -f, --file <YAML_FILE>     Path to YAML database file (required)
  -p, --port <PORT>          Port to listen on (default: 5432)
  --protocol <PROTOCOL>      SQL protocol: postgres, mysql, sqlserver (default: postgres)
  -u, --username <USER>      Authentication username (default: admin)
  -P, --password <PASS>      Authentication password (default: password)
  --hot-reload               Enable hot-reloading of YAML file changes
  -v, --verbose             Enable verbose logging
  --log-level <LEVEL>       Set log level: debug, info, warn, error
  -h, --help                Display help information

Examples:
  yaml-sql-server -f test_data.yaml
  yaml-sql-server -f test_data.yaml -p 3306 --protocol mysql
  yaml-sql-server -f test_data.yaml --hot-reload -v
```

## Success Metrics

**Functionality Metrics**
- Successfully parse and serve 95%+ of valid YAML database files
- Handle all basic SQL SELECT operations correctly
- Maintain connection stability for extended testing sessions

**Performance Metrics**
- Query response time consistently under 100ms
- Support 10+ concurrent connections without degradation
- Memory usage scaling linearly with data size

**Adoption Metrics**
- Integration with popular development frameworks and ORMs
- Positive developer experience feedback
- Usage in CI/CD pipelines for automated testing

## Risks and Mitigations

**Risk: SQL Protocol Complexity**
- Mitigation: Start with PostgreSQL protocol (well-documented), leverage existing libraries

**Risk: Performance with Large Datasets**
- Mitigation: Implement basic indexing, add pagination for large result sets

**Risk: SQL Feature Completeness**
- Mitigation: Focus on core features used in testing, document limitations clearly

**Risk: Protocol Compatibility Issues**
- Mitigation: Test with popular database drivers and ORMs, maintain compatibility matrix

## Future Enhancements

- Web-based admin interface for browsing tables and data
- Multiple YAML file support with schema merging
- Advanced SQL features: subqueries, window functions, CTEs
- Export functionality to real database formats
- Integration with testing frameworks
- Docker container distribution
- Performance monitoring and query optimization tools

## Development Phases (Rust-Specific)

### Phase 1: Core Infrastructure (Weeks 1-3)
- Set up Rust project with async tokio runtime
- Implement YAML parsing with `serde_yaml` and schema validation
- Create in-memory database structures with `Arc<RwLock<Database>>`
- Basic CLI with `clap` for configuration

### Phase 2: SQL Engine (Weeks 4-6)
- Integrate `sqlparser` crate for SQL AST parsing
- Implement query executor for SELECT statements with WHERE clauses
- Add support for JOINs, ORDER BY, and basic aggregate functions
- Create comprehensive test suite for SQL compatibility

### Phase 3: PostgreSQL Protocol (Weeks 7-8)
- Implement PostgreSQL wire protocol message handling
- Add authentication and connection management
- Handle multiple concurrent connections with tokio spawning
- Protocol compliance testing with real PostgreSQL clients

### Phase 4: Polish & Performance (Weeks 9-10)
- Add hot-reload functionality using `notify` crate
- Performance optimization and benchmarking with `criterion`
- Comprehensive documentation and examples
- Integration testing with popular Rust ORMs (diesel, sea-orm, sqlx)

## Rust-Specific Implementation Notes

**Error Handling Strategy**
```rust
#[derive(thiserror::Error, Debug)]
pub enum YamlSqlError {
    #[error("YAML parsing error: {0}")]
    YamlParse(#[from] serde_yaml::Error),

    #[error("SQL parsing error: {0}")]
    SqlParse(#[from] sqlparser::parser::ParserError),

    #[error("Database error: {message}")]
    Database { message: String },

    #[error("Protocol error: {0}")]
    Protocol(#[from] std::io::Error),
}
```

**Async Architecture**
- Use `tokio::spawn` for handling each client connection
- `tokio::sync::broadcast` for notifying connections of schema changes
- `tokio::time::timeout` for connection timeouts and query limits
- `tokio::fs` for async file operations during hot-reload

**Testing Strategy**
- Unit tests for each module with `#[tokio::test]`
- Integration tests using `tokio-postgres` client to verify protocol compliance
- Property-based testing with `proptest` for SQL query generation
- Benchmark suite comparing performance against SQLite for similar datasets