# real-rs Design Document
## Executive Summary
**real-rs** is a proof-of-concept demonstrating that relational algebra can serve as a universal abstraction layer for querying **fundamentally different** data storage systems - not just SQL variants.
The key innovation: **YottaDB backend support**. While most "universal" query engines only translate between SQL dialects, real-rs compiles relational algebra to hierarchical global storage, proving genuine universality.
## Architecture
### Three-Layer Design
```
┌─────────────────────────────────────────────┐
│ Application Layer │
│ (Build queries using algebra combinators) │
└──────────────────┬──────────────────────────┘
│
┌──────────────────▼──────────────────────────┐
│ Relational Algebra Core │
│ • Expr (algebra AST) │
│ • Predicate (boolean expressions) │
│ • Schema (type information) │
│ • Type inference & validation │
└──────────────────┬──────────────────────────┘
│
┌──────────┼──────────┐
│ │ │
┌───────▼────┐ ┌───▼──────┐ ┌▼─────────┐
│ YottaDB │ │ SQLite │ │ MongoDB │
│ (M code) │ │ (SQL) │ │ (aggr.) │
│ Hier. KV │ │ Tables │ │ Document │
└────────────┘ └──────────┘ └──────────┘
```
### Core Relational Algebra
All operations defined formally:
| σ | Selection | Filter rows by predicate |
| π | Projection | Select specific columns |
| ⨝ | Join | Combine relations |
| ∪ | Union | Set union |
| ∩ | Intersection | Set intersection |
| - | Difference | Set difference |
| ρ | Rename | Rename columns |
| γ | Aggregation | Group and aggregate |
## The YottaDB Differentiator
### Why YottaDB Matters
YottaDB/GT.M uses **hierarchical global variables** instead of tables:
```m
^Users(1,"name") = "Alice"
^Users(1,"age") = 30
^Users(2,"name") = "Bob"
^Users(2,"age") = 25
```
This is fundamentally different from:
- **Relational** (rows/columns)
- **Document** (JSON trees)
- **Graph** (nodes/edges)
Supporting this proves the algebra abstraction isn't just SQL-to-SQL translation.
### Mapping Relational → Hierarchical
#### Tables → Globals
```
Relational:
┌────┬──────┬─────┐
│ id │ name │ age │
├────┼──────┼─────┤
│ 1 │Alice │ 30 │
│ 2 │ Bob │ 25 │
└────┴──────┴─────┘
Hierarchical:
^Users(1) = [subscripts...]
^Users(1,"name") = "Alice"
^Users(1,"age") = 30
^Users(2,"name") = "Bob"
^Users(2,"age") = 25
```
#### Selection → Traversal
**Algebra:** σ(age > 25) users
**SQL:**
```sql
SELECT * FROM users WHERE age > 25
```
**M Code:**
```m
SET id=""
FOR SET id=$ORDER(^Users(id)) QUIT:id="" DO
. SET age=$GET(^Users(id,"age"))
. IF age>25 DO
. . ; Emit row
```
**MongoDB:**
```js
db.users.aggregate([
{ $match: { "age": { "$gt": 25 } } }
])
```
Same algebra → Three completely different execution models.
## Type Safety
### Compile-Time Schema Validation
```rust
// Schema defines structure
let users = Schema::new("users")
.with_column("id", DataType::Integer)
.with_column("name", DataType::String);
// Query builder knows column names
let query = Expr::relation("users", users)
.select(col("id").eq(1)) // ✓ id exists
.project(vec!["name"]); // ✓ name exists
// This would fail at schema validation:
// .select(col("email").eq("x")) // ✗ email not in schema
```
### Type Inference
```rust
let query = users_table
.project(vec!["name", "age"]);
// Infer result schema
let result_schema = query.infer_schema();
// result_schema.columns = ["name": String, "age": Integer]
```
## Backend Trait
The universal interface:
```rust
pub trait Backend {
type Connection;
type CompiledQuery;
// Translate algebra → native query
fn compile(&self, expr: &Expr) -> Result<Self::CompiledQuery>;
// Execute native query
fn execute(&self, conn: &mut Self::Connection,
query: &Self::CompiledQuery) -> Result<ResultSet>;
// Get relation schema
fn get_schema(&self, conn: &mut Self::Connection,
relation: &str) -> Result<Schema>;
}
```
This abstraction forces **true universality**. You can't fake hierarchical support.
## Comparison Matrix
| Algebra-first | ✓ | ~ | ~ | ✗ | ✗ |
| Hierarchical DBs | ✓ | ✗ | ✗ | ✗ | ✗ |
| Compile-time types | ✓ | ~ | ~ | ✓ | ~ |
| No Arrow runtime | ✓ | ✗ | ✗ | ✗ | ✓ |
| Formal semantics | ✓ | ~ | ✓ | ~ | ✗ |
| Backend-agnostic | ✓ | ~ | ~ | ✗ | ~ |
Legend:
- ✓ = Full support
- ~ = Partial support
- ✗ = No support
## What Makes This Different
### 1. Not Just SQL Variants
Most "universal" engines support:
- PostgreSQL ✓
- MySQL ✓
- SQLite ✓
- MS SQL ✓
**That's still just SQL.** They're table→table translators.
real-rs supports:
- Relational (SQLite) ✓
- Hierarchical (YottaDB) ✓
- Document (MongoDB) ✓
### 2. Algebra as Source of Truth
```
Other engines:
SQL → Parse → Maybe optimize → Execute
real-rs:
Algebra → Validate → Compile to backend → Execute
```
SQL becomes a **compilation target**, not the source language.
### 3. Lightweight
No dependencies on:
- Apache Arrow (multi-GB runtime)
- Complex query planners
- Distributed systems
Just:
- Relational algebra types
- Backend trait
- Compilation logic
### 4. Provable Correctness
Each operation has formal relational algebra semantics. Mission-critical systems can audit:
```
σ(P₁ ∧ P₂) R ≡ σ(P₁) (σ(P₂) R) // Selection is commutative
π(A) (π(B) R) ≡ π(A) R if A ⊆ B // Projection elimination
```
## Implementation Status
### ✓ Phase 1: Core (Complete)
- [x] Relational algebra types (`Expr`, `Predicate`)
- [x] Schema system with type information
- [x] Backend trait abstraction
- [x] YottaDB backend (M code generation)
- [x] SQLite backend (SQL generation)
- [x] MongoDB backend (aggregation pipeline)
- [x] Type inference
- [x] Basic tests
### ⧗ Phase 2: Completeness (Planned)
- [ ] Full YottaDB FFI (currently just M code generation)
- [ ] PostgreSQL backend
- [ ] All algebra operations (currently: σ, π, ⨝ basic)
- [ ] Aggregation (γ) with GROUP BY
- [ ] Window functions
- [ ] Subqueries
- [ ] Advanced predicates (LIKE, IN, EXISTS)
### ⧗ Phase 3: Optimization (Future)
- [ ] Query optimization passes
- Selection pushdown
- Projection elimination
- Join reordering
- [ ] Cost-based backend selection
- [ ] Query plan visualization
- [ ] Macro-based DSL for ergonomics:
```rust
query!(SELECT name FROM users WHERE age > 25)
```
### ⧗ Phase 4: Advanced (Research)
- [ ] Distributed backends (Cassandra, etc.)
- [ ] Query federation (join across backends!)
- [ ] Streaming/incremental evaluation
- [ ] Approximate query processing
## Technical Challenges
### 1. Impedance Mismatch
Relational algebra assumes:
- Flat relations (2D tables)
- Set semantics (no duplicates)
- Unordered rows
But backends have:
- **YottaDB**: Hierarchical keys, ordered traversal
- **MongoDB**: Nested documents, duplicate-friendly
- **SQLite**: NULL semantics, bag (not set) semantics
**Solution:** Define canonical mapping with documented semantics:
- Hierarchical → Flatten to relation on read
- Documents → Enforce schema via validation
- Bags → Remove duplicates in algebra layer if needed
### 2. Type Systems Mismatch
| NULL | Native | `null` | `""` (empty string) |
| Boolean | 0/1 | `true`/`false` | 1/0 |
| Date | Text/Int | ISODate | String |
**Solution:** Canonical type system in `DataType` enum. Backends handle conversion.
### 3. Performance
Direct SQL is faster than algebra→SQL→execution.
**Mitigation:**
- Optimize algebra before compilation
- Backend-specific optimizations (e.g., SQLite can optimize compiled SQL)
- For YottaDB, generate efficient M code (minimize $ORDER calls)
### 4. Feature Coverage
Not all operations make sense everywhere:
- Hierarchical stores: How to represent JOIN?
- Document stores: How to handle relational constraints?
**Solution:** Backend capabilities matrix. Query fails at compile time if unsupported.
## Code Examples
### Basic Query
```rust
let query = Expr::relation("users", users_schema)
.select(Predicate::Compare {
left: ColumnRef::new("age"),
op: CompareOp::Gt,
right: Operand::Literal(Value::Integer(25)),
})
.project(vec!["name".to_string()]);
```
### Compilation
```rust
// To SQLite
let sqlite = SQLiteBackend::new();
let sql = sqlite.compile(&query)?;
println!("{}", sql.sql);
// SELECT name FROM (SELECT * FROM users WHERE age > ?)
// To YottaDB
let ydb = YottaDBBackend::new();
let m = ydb.compile(&query)?;
println!("{}", m.m_code);
// FOR SET id=$ORDER(^Users(id)) ...
```
### Full Execution
```rust
let mut conn = Connection::open(":memory:")?;
let results = sqlite.query(&mut conn, &query)?;
for row in results {
println!("{:?}", row);
}
```
## Future Work
### Short Term
1. Complete YottaDB FFI integration
2. Add compile-time column validation via proc macros
3. Implement all relational algebra operations
4. Query optimization passes
### Medium Term
1. Additional backends (Postgres, Redis, DuckDB)
2. Query plan introspection/visualization
3. Benchmark suite
4. Property-based testing (algebra laws)
### Long Term
1. Query federation (join SQLite + MongoDB!)
2. Distributed query execution
3. Adaptive query execution
4. Integration with Polars/Arrow for analytics
## Conclusion
**real-rs proves a controversial thesis:** Relational algebra can be a truly universal query abstraction, not just for SQL variants but for fundamentally different data models.
The YottaDB backend is the differentiator. No other "universal" query engine handles hierarchical storage.
This isn't production-ready, but it demonstrates:
1. ✓ Formal algebra works as abstraction
2. ✓ Compile-time type safety is achievable
3. ✓ Hierarchical DBs can be supported
4. ✓ Lightweight implementation is possible
5. ✓ Backend extensibility via traits
## References
- Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks"
- YottaDB Documentation: https://docs.yottadb.com/
- MongoDB Aggregation: https://www.mongodb.com/docs/manual/aggregation/
- SQLite Query Planner: https://www.sqlite.org/queryplanner.html
- Relational Algebra: https://en.wikipedia.org/wiki/Relational_algebra