koru-delta 3.0.1

The invisible database: causal, consistent, and everywhere—without configuration
# KoruDelta Design Philosophy

This document describes the design philosophy, core principles, and architectural decisions behind KoruDelta.

## Vision

KoruDelta is a **zero-configuration causal database** that combines:

- **Git-like versioning** - Every change is tracked in an immutable history
- **Redis-like simplicity** - Simple key-value API with no setup required
- **Distributed consistency** - Multi-node clusters that sync automatically

The goal is to make distributed, versioned data storage as easy as using a local hash map.

## Core Principles

### 1. Invisible Complexity

The underlying mathematical foundation (distinction calculus via koru-lambda-core) provides strong guarantees but should never be exposed to users.

**What users see:**
```rust
let db = KoruDelta::start().await?;
db.put("users", "alice", json!({"name": "Alice"})).await?;
```

**What happens internally:**
- JSON is serialized to bytes
- Bytes are mapped to distinctions
- Distinctions are synthesized into a content-addressed version ID
- Causal chains track the relationship between versions

Users don't need to understand this. They just store and retrieve data.

### 2. History as a First-Class Citizen

Unlike traditional databases where history is an afterthought (if available at all), KoruDelta treats history as fundamental:

- Every write creates a new version
- All versions are retained
- Time-travel queries are built-in
- Diffs between versions are trivial

This enables:
- Complete audit trails
- Easy debugging ("how did we get here?")
- Rollback capabilities
- Causal consistency guarantees

### 3. Zero Configuration

Starting a database should require exactly zero configuration:

```bash
kdelta start  # That's it
```

Joining a cluster should require exactly one piece of information:

```bash
kdelta start --join 192.168.1.100  # Join existing cluster
```

No config files, no schema definitions, no consensus tuning, no port configurations (unless you want them).

### 4. Universal Runtime

The same code runs everywhere:
- Linux, macOS, Windows
- Server, laptop, edge device
- Browser (via WASM)

This is achieved through Rust's cross-compilation and WASM support.

## API Design Principles

### Simplicity Over Power

Prefer simple APIs that cover 90% of use cases over complex APIs that cover 100%.

**Good:**
```rust
db.get("users", "alice").await?
```

**Avoided:**
```rust
db.get_with_options("users", "alice", GetOptions::builder()
    .consistency(ConsistencyLevel::Strong)
    .timeout(Duration::from_secs(5))
    .build())
```

### Async by Default

All public APIs are async, even if the current implementation is synchronous. This ensures:
- Future-proof for network operations
- Consistent API regardless of deployment mode
- Integration with async ecosystems (Tokio)

### Explicit Errors

All fallible operations return `Result<T, DeltaError>`. No silent failures, no panics in library code.

```rust
pub enum DeltaError {
    KeyNotFound { namespace: String, key: String },
    SerializationError(serde_json::Error),
    StorageError(String),
    // ...
}
```

## Architecture Decisions

### Why Content-Addressed Versioning?

Each version has two identifiers:
- **`distinction_id`**: SHA256 hash of content (content-addressed)
- **`write_id`**: `{distinction_id}_{timestamp_nanos}` (write-addressed)

Benefits:
- **Deduplication**: Identical values share the same `distinction_id` (value store)
- **Complete History**: Every write has unique `write_id` (version store)
- **Integrity**: Corruption is detectable
- **Distribution**: Natural merge semantics for sync
- **Causal Chains**: `previous_version` links via `write_id` (not content hash)

### Why Immutable History?

All history is append-only with dual identification:
- **Value Store**: Maps `distinction_id` → value (deduplication)
- **Version Store**: Maps `write_id` → VersionedValue (complete history)

Benefits:
- **Audit**: Complete provenance of all changes (even rewrites of same value)
- **Time travel**: Query any historical state via causal graph traversal
- **Concurrency**: No locks needed for reads
- **Deduplication**: Same content stored once, referenced many times

### Why DashMap?

We use `DashMap` (lock-free concurrent hash map) for:
- Thread-safe concurrent access
- Better performance than `RwLock<HashMap>`
- Simpler code than manual locking

### Why JSON?

JSON as the data format because:
- Universal (every language has JSON support)
- Human-readable (easy debugging)
- Flexible (no schema required)
- Good enough performance for most use cases

## Distinction-Driven Architecture

KoruDelta is evolving toward a **distinction calculus system** that captures the emergent behavior of distinctions:

### Core Insight

The system doesn't just store data—it tracks the **becoming** of distinctions:
- **Synthesis**: New distinctions emerge from prior ones (causal graph)
- **Reference**: Distinctions point to other distinctions (reference graph)
- **Memory**: Like a brain, distinctions flow through layers (Hot → Warm → Cold → Deep)
- **Evolution**: Unfit distinctions are archived, essence is preserved (distillation)

### Two IDs, Two Purposes

```rust
struct VersionedValue {
    write_id: String,        // Unique per write: "{hash}_{timestamp_nanos}"
    distinction_id: String,  // Content hash: SHA256(value)
    previous_version: Option<String>, // Links via write_id
    // ...
}
```

- **`write_id`**: Enables complete history—even writing the same value 100 times creates 100 unique writes
- **`distinction_id`**: Enables deduplication—identical values share storage

### The Causal Graph

The causal graph is the **source of truth** for history:
- Nodes are `write_id`s (every write)
- Edges represent causality (parent → child)
- Traversal yields complete history
- Time travel queries navigate this graph

## Local Causal Agent (LCA) Design

KoruDelta implements the **Local Causal Agent** pattern, where every component is an agent with a local causal perspective in a unified field.

### The Core Formula

All operations follow:
```
ΔNew = ΔLocal_Root ⊕ ΔAction
```

This is not just documentation—it's the actual implementation pattern:

```rust
// Every agent has a local root (its causal perspective)
local_root: Distinction,

// Every operation synthesizes action with local root
let action_distinction = action.to_canonical_structure(engine);
let new_root = engine.synthesize(&local_root, &action_distinction);
self.local_root = new_root.clone();
```

### Why LCA?

**1. Deterministic Identity**
- Same action + same root = same distinction ID
- Content-addressed (Blake3 hash)
- No UUIDs, no randomness

**2. Complete Audit Trail**
- Every operation leaves a causal trace
- Query: "How did we get here?"
- Answer: Follow the synthesis chain

**3. Composable Agents**
- Agents combine through synthesis
- Cross-agent causality is natural
- `orchestrator.synthesize_cross_agent(&["agent1", "agent2"], action)`

**4. Universal Addressing**
- Distinction IDs are universal
- Same data = same ID on any node
- Natural for distributed systems

### Interior Mutability Pattern

For ergonomic APIs, agents use interior mutability:

```rust
// Internal: RwLock for local_root
local_root: RwLock<Distinction>,

// Public: &self API
pub fn do_something(&self, data: Data) -> Result<Distinction> {
    // Synthesize internally
    let new_root = self.synthesize_action(data)?;
    *self.local_root.write().unwrap() = new_root;
    Ok(new_root)
}
```

This preserves the simple `&self` API while following LCA internally.

### The Unified Field

All agents share one `DistinctionEngine` (the "field"):

```
┌─────────────────────────────────────┐
│       DistinctionEngine             │  ← The unified field
│  (single instance, shared by all)   │
└─────────────────────────────────────┘
         │         │         │
    ┌────┘    ┌────┘    ┌────┘
┌───┴───┐ ┌───┴───┐ ┌───┴───┐
│Storage│ │Vector │ │Identity│  ← Agents with local roots
│ Agent │ │ Agent │ │ Agent  │
└───────┘ └───────┘ └────────┘
```

Each agent has its own `local_root` (perspective), but they all synthesize into the same field.

## Development Phases

### Phase 1: Single Node (Complete)

Foundation with all core features:
- Put/Get/History operations
- Time travel queries
- Visual diffs
- CLI tool
- Disk persistence

### Phase 2: Distribution (Complete)

Multi-node clustering:
- Peer discovery via gossip
- Automatic data sync
- Cluster health monitoring
- Join/leave operations

### Phase 3: Advanced Features (Complete)

Query engine and real-time features:
- Filter, sort, project, aggregate
- Materialized views
- Real-time subscriptions
- History queries

### Future Considerations

Potential future enhancements:
- Pluggable storage backends (RocksDB, SQLite, S3)
- Conflict resolution strategies
- Schema validation (optional)
- Web dashboard
- Cloud-managed service

## Trade-offs

### Consistency vs Availability

KoruDelta prioritizes **consistency**. In a network partition:
- Writes may fail if sync cannot be verified
- Reads return the last known consistent state

### Memory vs Disk

Current implementation keeps working set in memory:
- Pros: Fast reads, simple implementation
- Cons: Limited by available RAM
- Future: Tiered storage with hot/cold separation

### Simplicity vs Features

We consciously limit features to maintain simplicity:
- No SQL (use the query API instead)
- No transactions (use versioning for consistency)
- No triggers (use subscriptions instead)

## References

- [ARCHITECTURE.md]ARCHITECTURE.md - Technical implementation details
- [CONTRIBUTING.md]CONTRIBUTING.md - Contribution guidelines
- [koru-lambda-core]https://github.com/swyrknt/koru-lambda-core - Underlying distinction engine