# KoruDelta Architecture
This document describes the internal architecture, design decisions, and coding standards for KoruDelta.
## Overview
KoruDelta is a zero-configuration causal database built on top of [koru-lambda-core](https://github.com/swyrknt/koru-lambda-core). It provides Git-like versioning, Redis-like simplicity, and mathematical guarantees from distinction calculus.
## Architecture Layers
KoruDelta is architected in layers that enable distinction-driven operations:
```
┌─────────────────────────────────────────┐
│ KoruDelta Public API │ ← Simple, async interface
│ (put, get, history, get_at) │
├─────────────────────────────────────────┤
│ Vector Search Layer (v2) │ ← AI/ML embeddings
│ (SNSW, HNSW, CausalIndex) │
├─────────────────────────────────────────┤
│ Auth Layer (v2) │ ← Self-sovereign identity
│ (Identity, Session, Capability) │
├─────────────────────────────────────────┤
│ Reconciliation Layer (v2) │ ← Distributed sync
│ (MerkleTree, BloomFilter, WorldRec) │
├─────────────────────────────────────────┤
│ Evolutionary Processes (v2) │ ← Automated management
│ (Consolidation, Distillation, Genome) │
├─────────────────────────────────────────┤
│ Memory Tiering (v2) │ ← Hot/Warm/Cold/Deep
│ (LRU cache, chronicle, epochs, DNA) │
├─────────────────────────────────────────┤
│ Causal Storage Layer │ ← Versioning & history
│ (CausalGraph, ReferenceGraph) │
├─────────────────────────────────────────┤
│ Distinction Engine (core) │ ← Mathematical foundation
│ (DistinctionEngine, synthesis) │
└─────────────────────────────────────────┘
```
### Layer 1: Public API (`src/core.rs`)
The user-facing interface that abstracts away all internal complexity.
**Key Components:**
- `KoruDelta` - Main database handle
- `DatabaseStats` - Metrics and monitoring
**Design Principles:**
- Async-first (future-proof for distributed operations)
- Simple method signatures (namespace, key, value)
- Hide mathematical concepts from users
- Thread-safe via `Arc` cloning
**Example:**
```rust
let db = KoruDelta::start().await?;
db.put("users", "alice", json!({"name": "Alice"})).await?;
let value = db.get("users", "alice").await?;
```
### Layer 2: Vector Search (`src/vector/`)
Native vector storage and similarity search for AI embeddings. **Fully implemented in Phase 4.**
**Key Components:**
- `Vector` - Embedding with model metadata and serialization
- `VectorStorage` trait - Extension trait for `KoruDelta` adding `embed()` and `embed_search()`
- `SNSW` (Synthesis-Navigable Small World) - Distinction-based ANN search
- `HNSW` (Hierarchical Navigable Small World) - Traditional ANN implementation
- `CausalVectorIndex` - Time-travel vector search (query historical embeddings)
**Design Principles:**
- **Content-addressed**: Blake3 hash = vector identity (automatic deduplication)
- **Causal-aware**: Vector history preserved like any data
- **Time-travel search**: `similar_at()` queries embeddings at past timestamps
- **Model-agnostic**: Works with OpenAI, local models, any embedding
**Search Tiers (SNSW):**
```
🔥 Hot (Exact Cache) → O(1) hash lookup
🌤️ Warm-Fast → Beam search, low ef
🌤️ Warm-Thorough → Deep synthesis navigation
❄️ Cold (Exact) → Linear scan with proximity
```
**API Example:**
```rust
// Store embedding
let embedding = Vector::new(vec![0.1, 0.2, 0.3], "text-embedding-3-small");
db.embed("documents", "doc1", embedding, Some(json!({"title": "AI"}))).await?;
// Search similar vectors
let query = Vector::new(vec![0.1, 0.2, 0.3], "text-embedding-3-small");
let results = db.embed_search(Some("documents"), &query, VectorSearchOptions::new().top_k(5)).await?;
// Time-travel search (what was similar yesterday?)
let past_results = db.similar_at(Some("documents"), &query, "2026-02-01T00:00:00Z", opts).await?;
```
### Layer 3: Causal Storage (`src/storage.rs`)
Manages versioned key-value storage with complete causal history.
**Key Components:**
- `CausalStorage` - Storage engine with causal graph
- `VersionedValue` - Value + dual IDs (write_id, distinction_id)
- `CausalGraph` - Tracks all writes and their causal relationships
**Design Principles:**
- Immutable history (append-only, never overwrite)
- **Dual identification**: `write_id` (unique per write) + `distinction_id` (content hash)
- Thread-safe concurrent access via `DashMap`
- Time-travel queries by traversing causal graph
**Data Structures:**
```rust
current_state: DashMap<FullKey, VersionedValue> // Latest version per key
version_store: DashMap<WriteId, VersionedValue> // All versions by write_id
causal_graph: CausalGraph // Causal relationships
value_store: DashMap<DistinctionId, Arc<Value>> // Deduplicated values
```
**Version Linking:**
```
write_1 ← write_2 ← write_3 ← write_4 (current)
↑ ↑ ↑
distinction_id: hash(content)
write_id: hash + timestamp_nanos
previous_version: links via write_id
```
### Layer 4: Document Mapping (`src/mapper.rs`)
Bridge between JSON data and distinction structures.
**Key Components:**
- `DocumentMapper` - Stateless conversion utility
**Algorithm:**
1. Serialize JSON → canonical bytes
2. Map each byte → distinction (cached O(1) lookup)
3. Fold distinctions → single root distinction (deterministic)
**Properties:**
- Same JSON → same distinction ID (content-addressed)
- Deterministic (order-independent for objects, order-dependent for arrays)
- Efficient via koru-lambda-core's byte caching
### Layer 5: Memory Tiering ✅ (`src/memory/`)
Brain-like memory hierarchy for efficient resource usage. **Fully implemented in Phase 7.**
**Components:**
- `HotMemory` - LRU cache for frequently accessed distinctions (ACTIVE)
- `WarmMemory` - Recent chronicle with idle detection (ACTIVE)
- `ColdMemory` - Consolidated epochs with fitness filtering (ACTIVE)
- `DeepMemory` - Genomic storage for 1KB portable backups (ACTIVE)
**GET Cascade with Promotion:**
```
User → get(key)
↓
HotMemory? → Return (<1ms)
↓ No
WarmMemory? → Promote to Hot → Return
↓ No
ColdMemory? → Promote through tiers → Return
↓ No
CausalStorage → Promote to Hot → Return
```
**Background Processes:**
- **Consolidation** (5 min): Hot ↔ Warm ↔ Cold ↔ Deep
- **Distillation** (1 hour): Fitness-based selection
- **Genome Update** (daily): Extract causal topology
### Layer 6: Evolutionary Processes ✅ (`src/processes/`)
Automated memory management through natural selection. **Running in Phase 7.**
**Components:**
- `ConsolidationProcess` - Rhythmic movement between memory layers (5 min interval)
- `DistillationProcess` - Fitness-based natural selection (1 hour interval)
- `GenomeUpdateProcess` - DNA maintenance and disaster recovery (daily interval)
**Analogy:** Like sleep consolidating memories—unfit distinctions are archived, essence is preserved.
**Implementation:**
```rust
// Spawned on KoruDelta initialization
tokio::spawn(async move {
loop {
tokio::select! {
_ = interval.tick() => run_consolidation(),
_ = shutdown.changed() => break,
}
}
});
```
### Layer 7: Reconciliation (`src/reconciliation/`)
Efficient distributed sync via set reconciliation.
**Components:**
- `MerkleTree` - Hash tree for O(log n) set comparison
- `BloomFilter` - Probabilistic membership testing
- `WorldReconciliation` - Protocol for merging causal graphs
**Protocol:**
```
1. Exchange Merkle roots
2. If different, drill down to find differences
3. Send only missing distinctions
4. Merge causal graphs (conflicts become branches)
```
### Layer 8: Auth Layer (`src/auth/`)
Self-sovereign identity and capability-based authorization using distinctions.
**Key Components:**
- `Identity` - Mined identity with Ed25519 keys and proof-of-work
- `Session` - Authenticated session with derived encryption keys
- `Capability` - Signed permission grants (granter → grantee)
- `AuthManager` - High-level authentication coordinator
**Design Principles:**
- **Self-sovereign**: Users generate and own their keys
- **Distinction-based**: Auth state stored as `_auth:*` distinctions
- **Capability-based**: No roles, only explicit permission grants
- **Reconcilable**: Auth state syncs between nodes like any data
**Storage Layout:**
```
_auth:identity:{pubkey} → Identity (mined, proof-of-work)
_auth:capability:{id} → Capability (signed grant)
_auth:revocation:{cap_id} → Revocation (tombstone)
```
**Authentication Flow:**
```
1. Mine identity (proof-of-work, ~1s)
2. Store identity as distinction
3. Request challenge (ephemeral, 5min TTL)
4. Sign challenge with private key
5. Verify signature, create session
6. Session keys derived via HKDF-SHA256
```
### Foundation: Distinction Engine
Provided by `koru-lambda-core`, this gives us:
- **Deterministic synthesis**: `synthesize(a, b)` always produces same result
- **Content addressing**: Distinction IDs are SHA256 hashes
- **Thread-safety**: Lock-free concurrent operations via `DashMap`
- **Mathematical guarantees**: Five core axioms ensure consistency
## Key Data Types
### `FullKey` (`src/types.rs`)
Combines namespace + key into a single identifier.
```rust
FullKey {
namespace: "users",
key: "alice"
}
// Canonical: "users:alice"
```
### `VersionedValue` (`src/types.rs`)
Every write creates a versioned entry with dual identification:
```rust
VersionedValue {
value: JsonValue, // The actual data (Arc-wrapped)
timestamp: DateTime<Utc>, // When written (nanosecond precision)
write_id: String, // Unique per write: "{hash}_{timestamp_nanos}"
distinction_id: String, // Content hash (SHA256)
previous_version: Option<String>, // Causal link via write_id
}
```
**Dual ID Design:**
- `write_id` enables **complete history**—writing the same value 100 times = 100 unique writes
- `distinction_id` enables **deduplication**—same content shares storage in value_store
- `version_id()` returns `distinction_id` for content-addressing compatibility
### `HistoryEntry` (`src/types.rs`)
Simplified view for history queries (omits previous_version link).
## Concurrency Model
### Thread Safety
All core structures are thread-safe:
- `DistinctionEngine` - Uses `DashMap` for lock-free synthesis
- `CausalStorage` - Uses `DashMap` for lock-free state updates
- `KoruDelta` - Uses `Arc` for cheap cloning across threads
### Concurrent Writes
**Same Key:**
- Multiple threads can write to the same key concurrently
- Each write appends to the history log (thread-safe)
- Causal chain is maintained correctly via `DashMap` atomic operations
- All writes are recorded, none are lost
**Different Keys:**
- Fully parallel, no contention
- Each key has independent history
### Memory Model
- **Shared engine**: Single `DistinctionEngine` shared via `Arc`
- **Shared storage**: Single `CausalStorage` shared via `Arc`
- **Clone semantics**: `KoruDelta::clone()` is cheap (Arc increment)
## Error Handling
All errors use the `DeltaError` enum for type-safe matching.
### Error Types
```rust
pub enum DeltaError {
KeyNotFound { namespace, key },
NoValueAtTimestamp { namespace, key, timestamp },
SerializationError(serde_json::Error),
InvalidData { reason },
EngineError(String),
StorageError(String),
TimeError(String),
}
```
### Error Philosophy
- **Explicit errors**: No silent failures
- **Rich context**: Errors include relevant metadata (namespace, key, etc.)
- **Pattern matching**: Users can match on specific error variants
- **No panics**: Public API never panics (except for unrecoverable bugs)
## Code Style Guidelines
### General Principles
1. **Simplicity over cleverness** - Straightforward code beats clever code
2. **Documentation first** - Every public item has docs
3. **Test everything** - Comprehensive unit + integration tests
4. **Hide complexity** - Mathematical concepts stay internal
5. **Thread-safe by default** - All structures support concurrent access
### Documentation Standards
```rust
/// One-line summary (ends with period).
///
/// Detailed explanation with:
/// - Use cases
/// - Examples
/// - Thread safety notes
/// - Performance characteristics
///
/// # Example
///
/// ```ignore
/// let result = function_name(args);
/// ```
pub fn function_name() { }
```
### Naming Conventions
- **Modules**: `snake_case` (e.g., `causal_storage`)
- **Types**: `PascalCase` (e.g., `VersionedValue`)
- **Functions**: `snake_case` (e.g., `get_at`)
- **Constants**: `SCREAMING_SNAKE_CASE` (e.g., `MAX_HISTORY`)
### Module Organization
```
src/
├── lib.rs # Public API exports, crate docs
├── core.rs # KoruDelta main implementation
├── storage.rs # CausalStorage implementation
├── causal_graph.rs # Causal graph tracking
├── reference_graph.rs # Reference tracking for GC
├── mapper.rs # DocumentMapper implementation
├── types.rs # Shared data structures
├── error.rs # Error types
├── memory/ # Memory tiering (v2)
│ ├── hot.rs # Hot memory (LRU cache)
│ ├── warm.rs # Warm memory (chronicle)
│ ├── cold.rs # Cold memory (epochs)
│ └── deep.rs # Deep memory (genomic)
├── processes/ # Evolutionary processes (v2)
│ ├── consolidation.rs
│ ├── distillation.rs
│ └── genome_update.rs
├── reconciliation/ # Set reconciliation (v2)
│ ├── mod.rs # ReconciliationManager
│ ├── merkle.rs # Merkle trees
│ ├── bloom.rs # Bloom filters
│ └── world.rs # World reconciliation
└── auth/ # Self-sovereign authentication (v2)
├── mod.rs # Public API exports
├── types.rs # Identity, Session, Capability
├── identity.rs # Proof-of-work mining
├── verification.rs # Challenge-response
├── session.rs # Session management
├── capability.rs # Permission grants
├── storage.rs # Storage adapter
└── manager.rs # High-level API
```
### Testing Strategy
**Unit Tests:**
- Located in same file as implementation (`#[cfg(test)] mod tests`)
- Test individual functions and edge cases
- Fast, isolated, deterministic
**Integration Tests:**
- Located in `tests/` directory
- Test end-to-end workflows
- Test concurrency scenarios
- Test error conditions
**Test Naming:**
```rust
#[test]
fn test_<feature>_<scenario>() {
// Arrange
let db = setup();
// Act
let result = db.operation();
// Assert
assert_eq!(result, expected);
}
```
## Performance Characteristics
### Time Complexity
| `put()` | O(n) | n = bytes in value (JSON serialization + mapping) |
| `get()` | O(1) | HashMap lookup |
| `get_at()` | O(h) | h = history size (causal graph traversal) |
| `history()` | O(h) | h = history size |
| `contains()` | O(1) | HashMap lookup |
| **Reconciliation** |||
| `MerkleTree::diff()` | O(log n) | n = distinctions (best case) |
| `BloomFilter::might_contain()` | O(1) | k hash functions |
| `WorldReconciliation::reconcile()` | O(d) | d = differences |
| **Memory Tiering** |||
| `HotMemory::get()` | O(1) | LRU cache |
| `WarmMemory::get()` | O(1) | HashMap + disk |
| `ColdMemory::consolidate()` | O(n) | n = distinctions to consolidate |
### Space Complexity
- **Per key**: O(h) where h = number of versions
- **Total**: O(k × h̄) where k = keys, h̄ = average history size
- **Bloom filter**: O(-n·ln(p)/ln²(2)) bits for n items at FP rate p
- **Merkle tree**: O(n) nodes for n distinctions
### Sync Efficiency
| Identical sets | 32 bytes (root hash) | 100% |
| 1% difference | ~1% of data + tree overhead | 99% |
| 50% difference | ~50% of data | 50% |
| Bloom filter (1% FP) | ~1KB for 10K items | ~99% |
## Phase 2: Distribution (Complete)
Multi-node clustering is now implemented:
### Architecture
```
┌─────────────────────────────────────────┐
│ KoruDelta Public API │ ← Unchanged
├─────────────────────────────────────────┤
│ Cluster Layer │ ← NEW
│ (ClusterNode, PeerManager, Sync) │
├─────────────────────────────────────────┤
│ Causal Storage Layer │ ← Minor updates
├─────────────────────────────────────────┤
│ Distinction Engine (core) │ ← Unchanged
└─────────────────────────────────────────┘
```
### New Modules
- `network.rs` - TCP communication, message protocol
- `cluster.rs` - ClusterNode, peer management, gossip
### Features
1. **Join**: `kdelta start --join <ip>` joins an existing cluster
2. **Discovery**: Gossip protocol shares peer information
3. **Sync**: Full snapshot sync on join, incremental broadcast for writes
4. **Health**: Heartbeat pings track peer status
### Usage
```bash
# Start first node
kdelta start --port 7878
# Join from another machine
kdelta start --join 192.168.1.100:7878
```
## Phase 3: Queries, Views, and Subscriptions (Complete)
Phase 3 adds powerful query capabilities:
### Architecture
```
┌─────────────────────────────────────────┐
│ KoruDelta Public API │ ← Extended with query/view/subscription
│ (query, create_view, subscribe) │
├─────────────────────────────────────────┤
│ Query Layer │ ← NEW
│ (Filter, Query, Aggregation) │
├─────────────────────────────────────────┤
│ Views Layer │ ← NEW
│ (ViewManager, ViewDefinition) │
├─────────────────────────────────────────┤
│ Subscriptions Layer │ ← NEW
│ (SubscriptionManager, ChangeEvent) │
├─────────────────────────────────────────┤
│ Cluster Layer │ ← Unchanged
├─────────────────────────────────────────┤
│ Causal Storage Layer │ ← Extended with scan_collection
├─────────────────────────────────────────┤
│ Distinction Engine (core) │ ← Unchanged
└─────────────────────────────────────────┘
```
### New Modules
- `query.rs` - Filter, Query, Aggregation, QueryExecutor
- `views.rs` - ViewDefinition, ViewManager, ViewData
- `subscriptions.rs` - Subscription, SubscriptionManager, ChangeEvent
### Features
1. **Query Engine**
- Filter: Eq, Ne, Gt, Gte, Lt, Lte, Contains, Exists, Matches, And, Or, Not
- Projection: Select specific fields
- Sorting: Ascending/descending by field
- Limiting: Offset and limit results
- Aggregation: Count, Sum, Avg, Min, Max, Distinct, GroupBy
2. **Materialized Views**
- Create views with query definitions
- Auto-refresh on writes (optional)
- Manual refresh on demand
- List and delete views
3. **Subscriptions**
- Subscribe to all changes
- Subscribe to specific collection/key
- Filter by change type (insert, update, delete)
- Filter by value conditions
- `put_notify()` for writes with notifications
### Usage
```rust
// Query
let results = db.query("users", Query::new()
.filter(Filter::gt("age", 30))
.sort_by("name", true)
.limit(10)
).await?;
// Views
db.create_view(ViewDefinition::new("active_users", "users")
.with_query(Query::new().filter(Filter::eq("status", "active")))
).await?;
// Subscriptions
let (id, mut rx) = db.subscribe(Subscription::collection("users")).await;
```
## Future Extensions
### Phase 4: Cloud & Deployment
Potential features:
- Managed cloud service
- Kubernetes operator
- Auto-scaling clusters
- Multi-region replication
### Phase 5: Storage Backends
Current: In-memory only
Future: Pluggable backends
- Disk persistence (RocksDB, SQLite)
- Cloud storage (S3, etc.)
- Maintain same API regardless of backend
## Design Decisions
### Why Async?
Even though Phase 1 is synchronous under the hood, we use async APIs because:
1. **Future-proof**: Distribution (Phase 2) will be naturally async
2. **Consistency**: Same API from development to production
3. **Ecosystem**: Integrates with Tokio ecosystem
### Why Content-Addressed Versioning?
Distinction IDs (SHA256 hashes) provide:
1. **Deduplication**: Identical values share the same distinction
2. **Integrity**: Can verify data hasn't been corrupted
3. **Distribution**: Natural merge semantics for distributed sync
### Why Immutable History?
Append-only history gives us:
1. **Audit trails**: Complete provenance of all changes
2. **Time travel**: Query any point in the past
3. **Debugging**: Understand how state evolved
4. **Causal consistency**: Clear ordering of events
### Why Hide the Math?
The distinction calculus is powerful but abstract. We hide it because:
1. **Accessibility**: More developers can use the database
2. **Simplicity**: Users think in terms of keys and values
3. **Flexibility**: Can change internal implementation
4. **Marketing**: "Just works" is better than "read this paper"
The math is our **secret weapon**, not our pitch.
## Getting Started (Development)
### Build
```bash
cargo build
```
### Test
```bash
cargo test
cargo test --release # Optimized
```
### Documentation
```bash
cargo doc --open
```
### Benchmarks
```bash
cargo bench
```
Performance characteristics (benchmarked):
- **Read latency**: ~340ns (2.9M ops/sec)
- **Write throughput**: ~27K ops/sec
- **History query**: 4.3M elements/sec
## Contributing Guidelines
When adding new features:
1. **Design first**: Update this document with your design
2. **Test first**: Write tests before implementation
3. **Document**: Add comprehensive docs
4. **Simplify**: Can this API be simpler?
5. **Verify**: All tests must pass
For questions, see [DESIGN.md](DESIGN.md) for the product vision.
## References
- [DESIGN.md](DESIGN.md) - Product vision and roadmap
- [koru-lambda-core](https://github.com/swyrknt/koru-lambda-core) - Underlying engine
- [README.md](README.md) - User-facing introduction