phago 0.2.0

Self-evolving knowledge substrates through biological computing primitives
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
# Phago — Biological Computing Primitives

**Status: Beta / Production-Ready**

A framework that maps cellular biology mechanisms to computational operations. Agents self-organize, consume documents, build a Hebbian knowledge graph, share vocabulary, detect anomalies, and exhibit emergent collective behavior — all without top-down orchestration.

## Latest Results (Production Release)

| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Tests passing | 32/34 | **99/99** | +67 tests, 100% pass rate |
| Graph edges (100 docs) | 255,888 | **4,472** | **-98.3%** density reduction |
| Best P@5 | 0.658 (TF-IDF) | **0.742** (Hybrid) | **+12.8%** |
| Best MRR | 0.714 (Graph) | **0.800** (Hybrid) | **+12.0%** |
| Genome parameters | 5 | **8** | +3 wiring strategy params |
| Query types | 1 | **5** | BFS, Hybrid, Path, Centrality, Bridge |
| MCP tools | 0 | **3** | remember, recall, explore |

## What It Does

Feed the colony documents. Agents digest them into concepts, wire a knowledge graph through co-activation (Hebbian learning), share vocabulary across agent boundaries (horizontal gene transfer), and detect anomalies (negative selection). The graph structure IS the memory — frequently used connections strengthen, unused ones decay.

```
Documents → Agents digest → Concepts extracted → Graph wired → Knowledge emerges
                ↑                                      ↓
                └──── Transfer, Symbiosis, Dissolution ─┘
```

## Quick Start

### Run the Demos

```bash
# Build
cargo build

# Run the proof-of-concept (120-tick simulation)
cargo run --bin phago-poc

# Run all tests (99 tests)
cargo test --workspace

# Open the interactive visualization (generated by POC)
open output/phago-colony.html
```

### Use as a Library

Add to your `Cargo.toml`:

```toml
[dependencies]
phago = { git = "https://github.com/Clemens865/Phago_Project.git" }
```

Basic usage with the prelude:

```rust
use phago::prelude::*;

fn main() {
    let mut colony = Colony::new();

    // Ingest documents
    colony.ingest_document("doc1", "Cell membrane transport proteins", Position::new(0.0, 0.0));
    colony.ingest_document("doc2", "Protein folding and membrane insertion", Position::new(1.0, 0.0));

    // Spawn digesters and run
    colony.spawn(Box::new(Digester::new(Position::new(0.0, 0.0)).with_max_idle(30)));
    colony.run(30);

    // Query with hybrid scoring
    let results = hybrid_query(&colony, "membrane protein", &HybridConfig {
        alpha: 0.5, max_results: 5, candidate_multiplier: 3,
    });

    for r in results {
        println!("{} (score: {:.3})", r.label, r.final_score);
    }
}
```

See [`docs/INTEGRATION_GUIDE.md`](docs/INTEGRATION_GUIDE.md) for complete examples and API reference.

### Production Features

- **Single import**: `use phago::prelude::*` gives you everything
- **Structured errors**: `Result<T, PhagoError>` with typed error categories
- **Deterministic testing**: `Digester::with_seed(pos, seed)` for reproducible simulations
- **Session persistence**: Save/restore colony state across sessions (JSON or SQLite)
- **SQLite persistence**: `ColonyBuilder` with auto-save for production deployments
- **Async runtime**: `AsyncColony` with `TickTimer` for real-time visualization
- **MCP adapter**: Ready for external LLM/agent integration
- **Semantic embeddings**: Vector-based concept extraction (optional `semantic` feature)

### SQLite Persistence (Phase 10)

Enable durable storage with automatic save/load:

```toml
[dependencies]
phago-runtime = { version = "0.1", features = ["sqlite"] }
```

```rust
use phago_runtime::prelude::*;

// Create colony with persistent storage
let mut colony = ColonyBuilder::new()
    .with_persistence("knowledge.db")  // SQLite file
    .auto_save(true)                   // Save on drop
    .build()?;

// Use normally — persistence is automatic
colony.ingest_document("title", "content", Position::new(0.0, 0.0));
colony.run(100);
colony.save()?;  // Explicit save (also happens on drop)

// Later: reload with full state preserved
let colony2 = ColonyBuilder::new()
    .with_persistence("knowledge.db")
    .build()?;
```

### Async Runtime (Phase 10)

Enable controlled-rate simulation for visualization:

```toml
[dependencies]
phago-runtime = { version = "0.1", features = ["async"] }
```

```rust
use phago_runtime::prelude::*;
use phago_runtime::async_runtime::{run_in_local, TickTimer};

#[tokio::main]
async fn main() {
    let colony = Colony::new();

    // Fast async simulation
    run_in_local(colony, |ac| async move {
        ac.run_async(100).await
    }).await;

    // Or controlled tick rate for visualization
    let colony2 = Colony::new();
    run_in_local(colony2, |ac| async move {
        let mut timer = TickTimer::new(100);  // 100ms per tick
        timer.run_timed(&ac, 50).await;
    }).await;
}
```

### Semantic Embeddings (Phase 9)

Enable vector embeddings for semantic understanding:

```toml
[dependencies]
phago = { version = "0.1", features = ["semantic"] }
```

```rust
use phago::prelude::*;
use std::sync::Arc;

// Create an embedder (SimpleEmbedder or API-backed)
let embedder: Arc<dyn Embedder> = Arc::new(SimpleEmbedder::new(256));

// SemanticDigester uses embeddings for concept extraction
let mut digester = SemanticDigester::new(Position::new(0.0, 0.0), embedder.clone());
let concepts = digester.digest_text("The mitochondria is the powerhouse of the cell.".into());

// Find semantically similar concepts
let similar = digester.find_similar("cellular energy", 5);
```

The `semantic` feature adds:
- **SimpleEmbedder** — Hash-based embeddings (no dependencies)
- **SemanticDigester** — Embedding-backed agent for semantic concept extraction
- **Chunker** — Document chunking with configurable overlap
- **Similarity functions** — cosine_similarity, euclidean_distance, normalize_l2

### LLM Integration (Phase 9.2)

Enable LLM-backed concept extraction:

```toml
[dependencies]
# Local LLM (Ollama)
phago = { version = "0.1", features = ["llm-local"] }

# Cloud APIs (Claude, OpenAI)
phago = { version = "0.1", features = ["llm-api"] }

# All backends
phago = { version = "0.1", features = ["llm-full"] }
```

```rust,ignore
use phago::prelude::*;

// Local Ollama backend (no API key needed)
let ollama = OllamaBackend::localhost().with_model("llama3.2");
let concepts = ollama.extract_concepts("Cell membrane transport").await?;

// Claude backend
let claude = ClaudeBackend::new("sk-ant-...").sonnet();
let concepts = claude.extract_concepts("Cell membrane transport").await?;

// OpenAI backend
let openai = OpenAiBackend::new("sk-...").gpt4o_mini();
let concepts = openai.extract_concepts("Cell membrane transport").await?;
```

The `llm` features add:
- **OllamaBackend** — Local LLM via Ollama (no API key needed)
- **ClaudeBackend** — Anthropic Claude API
- **OpenAiBackend** — OpenAI GPT API
- **LlmBackend trait** — Common interface for all backends
- **Concept extraction** — Extract structured concepts from text
- **Relationship identification** — Find relationships between concepts
- **Query expansion** — Expand queries for better recall

## The Ten Biological Primitives

| Primitive | Biological Analog | What It Does |
|-----------|-------------------|-------------|
| **DIGEST** | Phagocytosis | Consume input, extract fragments, present to graph |
| **APOPTOSE** | Programmed cell death | Self-assess health, gracefully self-terminate |
| **SENSE** | Chemotaxis | Detect signals, follow gradients |
| **TRANSFER** | Horizontal gene transfer | Export/import vocabulary between agents |
| **EMERGE** | Quorum sensing | Detect threshold, activate collective behavior |
| **WIRE** | Hebbian learning | Strengthen used connections, prune unused |
| **SYMBIOSE** | Endosymbiosis | Integrate another agent as permanent symbiont |
| **STIGMERGE** | Stigmergy | Coordinate through environmental traces |
| **NEGATE** | Negative selection | Learn self-model, detect anomalies by exclusion |
| **DISSOLVE** | Holobiont boundary | Modulate agent-substrate boundaries |

## Agent Types

- **Digester** — Consumes documents, extracts keywords, presents concepts to the knowledge graph. Implements DIGEST + SENSE + APOPTOSE + TRANSFER + SYMBIOSE + DISSOLVE.
- **Synthesizer** — Dormant until quorum reached, then identifies bridge concepts and topic clusters. Implements EMERGE + SENSE + APOPTOSE.
- **Sentinel** — Learns what "normal" looks like, flags anomalies by deviation from self-model. Implements NEGATE + SENSE + APOPTOSE.

## Research Branches

Four falsifiable hypotheses, each with a working prototype, benchmark, visualization, and papers.

### 1. Bio-RAG — Self-Reinforcing Retrieval

Hebbian-reinforced knowledge graph retrieval with hybrid scoring (TF-IDF + graph re-ranking).

```bash
cargo run --bin phago-bio-rag-demo
```

| Metric | Graph-only | TF-IDF | **Hybrid** |
|--------|-----------|--------|------------|
| P@5 | 0.280 | 0.742 | **0.742** |
| MRR | 0.650 | 0.775 | **0.800** |
| NDCG@10 | 0.357 | 0.404 | **0.410** |

**Key insight:** The graph's value is not in replacing TF-IDF but in *re-ranking* candidates using structural context. Hybrid scoring beats pure TF-IDF on MRR (first relevant result ranked higher).

### 2. Agent Evolution — Evolutionary Agents Through Apoptosis

Agents evolving through intrinsic selection pressure (death + mutation + inheritance) produce richer knowledge graphs.

```bash
cargo run --bin phago-agent-evolution-demo
```

| Metric (tick 300) | Evolved | Static | Random |
|-------------------|---------|--------|--------|
| Nodes | 1,582 | 864 | 1,191 |
| Edges | 101,824 | 8,769 | 38,399 |
| Clustering coeff. | 0.969 | 0.948 | 0.970 |
| Spawns / Generations | 140 / 135 | 0 / 0 | 144 / 144 |

### 3. KG Training — Knowledge Graph to Training Data

Hebbian-weighted triples with curriculum ordering for language model fine-tuning.

```bash
cargo run --bin phago-kg-training-demo
```

| Metric | Value |
|--------|-------|
| Communities detected | 548 |
| NMI vs ground truth | 0.170 |
| Triples exported | 252,641 |
| Foundation coherence | 100% same-community |
| Weight ratio (foundation/periphery) | 1.3x |

### 4. Agentic Memory — Persistent Code Knowledge

Self-organizing code knowledge graph that persists across sessions.

```bash
cargo run --bin phago-agentic-memory-demo
```

| Metric | Value |
|--------|-------|
| Code elements extracted | 830 |
| Graph nodes / edges | 659 / 33,490 |
| Session persistence | 100% fidelity |
| Graph P@5 | 0.140 |

## New Features (Ralph Loop Phase 1)

### Hebbian LTP Model (Tentative Edge Wiring)
- First co-occurrence creates edge at **0.1 weight** (tentative)
- Subsequent co-occurrences reinforce: `weight += 0.1`
- Single-document edges decay quickly under synaptic pruning
- Cross-document reinforced edges survive

### Multi-Objective Fitness
4-dimensional evolution:
- **30% Productivity** — concepts + edges per tick
- **30% Novelty** — novel concepts / total concepts
- **20% Quality** — strong edges (co_act ≥ 2) / total edges
- **20% Connectivity** — bridge edges / total edges

### Structural Queries
```rust
// Path queries — "What connects A to B?"
graph.shortest_path(&from, &to) -> Option<(Vec<NodeId>, f64)>

// Centrality queries — "What's most important?"
graph.betweenness_centrality(100) -> Vec<(NodeId, f64)>

// Bridge queries — "What concepts connect domains?"
graph.bridge_nodes(10) -> Vec<(NodeId, f64)>

// Component queries — "How many disconnected regions?"
graph.connected_components() -> usize
```

### MCP Integration
External LLMs/agents can interact via typed request/response API:
- `phago_remember(title, content, ticks)` — ingest document
- `phago_recall(query, max_results, alpha)` — hybrid query
- `phago_explore(type: path|centrality|bridges|stats)` — structural queries

## Architecture

```
crates/
├── phago/            # Unified facade crate (use this!)
├── phago-cli/        # Command-line interface (ingest, query, stats, session)
├── phago-core/       # Traits (10 primitives) + shared types + error handling
├── phago-runtime/    # Colony, substrate, topology, corpus, sessions, SQLite, async
├── phago-agents/     # Digester, Sentinel, Synthesizer, SemanticDigester, genome, evolution
├── phago-embeddings/ # Vector embeddings (SimpleEmbedder, OnnxEmbedder, API providers)
├── phago-llm/        # LLM integration (Ollama, Claude, OpenAI)
├── phago-rag/        # Query engine, scoring, baselines, hybrid, MCP adapter
├── phago-viz/        # Self-contained HTML visualization (D3.js)
└── phago-wasm/       # WASM integration (future)
poc/
├── knowledge-ecosystem/   # Original proof of concept
├── bio-rag-demo/          # Branch 1: self-reinforcing RAG
├── agent-evolution-demo/  # Branch 2: evolutionary agents
├── kg-training-demo/      # Branch 3: KG → training data
├── agentic-memory-demo/   # Branch 4: persistent code knowledge
└── data/corpus/           # 100-doc test corpus (4 topics × 25 docs)
docs/papers/               # White papers + explainers for each branch
```

### Colony Lifecycle (per tick)

1. **Sense** — All agents observe substrate (signals, documents, traces)
2. **Act** — Colony processes agent actions (move, digest, present, wire)
3. **Transfer** — Agents export/integrate vocabulary, attempt symbiosis
4. **Dissolve** — Mature agents modulate boundaries, reinforce graph nodes
5. **Death** — Remove agents that self-assessed for termination
6. **Decay** — Signals, traces, and edge weights decay; weak edges pruned

### Key Design Choices

- **Rust ownership = biological resource management.** `move` semantics model consumption (you can't eat something twice). `Drop` models apoptosis. No garbage collector = deterministic death.
- **The graph IS the memory.** No separate storage layer. The topology of the knowledge graph, shaped by Hebbian learning, encodes all accumulated knowledge.
- **No LLMs in the loop.** The v0.1 primitives must prove emergence without external intelligence. The framework is designed for LLM-backed agents in future versions.

## Quantitative Proof (Phase 5)

Running `cargo run --bin phago-poc` produces metrics proving the model works:

| Metric | What It Proves |
|--------|---------------|
| **Transfer Effect** | Vocabulary sharing across agents (shared terms ratio, export/integration counts) |
| **Dissolution Effect** | Boundary modulation reinforces knowledge (concept vs non-concept access ratio) |
| **Graph Richness** | Colony builds meaningful structure (density, clustering coefficient, bridge concepts) |
| **Vocabulary Spread** | Knowledge propagates across agents (Gini coefficient of vocabulary sizes) |

The POC also generates `output/phago-colony.html` — an interactive D3.js visualization with:
- Force-directed knowledge graph
- Agent spatial canvas
- Event timeline
- Metrics dashboard with tick slider

## Implementation Status

| Phase | Status | Description |
|-------|--------|-------------|
| 0 — Scaffold | ✅ Done | Workspace, 10 primitive traits, shared types |
| 1 — First Cell | ✅ Done | Digester agent, apoptosis, colony lifecycle |
| 2 — Self-Organization | ✅ Done | Chemotaxis, document ingestion, Hebbian wiring |
| 3 — Emergence | ✅ Done | Synthesizer (quorum sensing), Sentinel (negative selection) |
| 4 — Cooperation | ✅ Done | Transfer, Symbiosis, Dissolution |
| 5 — Prove It Works | ✅ Done | Metrics, visualization, hardening tests, performance optimization |
| 6 — Research Branches | ✅ Done | 4 branches with prototypes, benchmarks, papers |
| 7 — Production Ready | ✅ Done | Facade crate, preludes, error types, deterministic testing |
| 8 — Distribution | ✅ Done | Published to crates.io, CLI tool with all commands |
| 9.1 — Embeddings | ✅ Done | phago-embeddings crate, SemanticDigester agent |
| 9.2 — LLM Integration | ✅ Done | phago-llm crate (Ollama, Claude, OpenAI) |
| 9.3 — Vector Wiring | ✅ Done | SemanticWiringConfig, similarity-based edge weights |
| 10.1 — Agent Serialization | ✅ Done | SerializableAgent trait, session persistence with agents |
| 10.2 — SQLite Persistence | ✅ Done | ColonyBuilder, auto-save, WAL mode, full roundtrip |
| 10.3 — Async Runtime | ✅ Done | AsyncColony, TickTimer, run_in_local, spawn_simulation_local |

## Tests

```bash
# All tests
cargo test --workspace

# With all features (sqlite + async)
cargo test --workspace --features "sqlite,async"

# By category
cargo test --test transfer_tests       # Vocabulary export/import
cargo test --test symbiosis_tests      # Agent absorption
cargo test --test dissolution_tests    # Boundary modulation
cargo test --test phase4_integration   # Full colony integration
cargo test -p phago-runtime metrics    # Quantitative metrics
cargo test -p phago-viz                # HTML visualization

# Benchmarks (with features)
cargo test --release --features "sqlite,async" -p phago-runtime --test benchmarks -- --nocapture
```

### Phase 10 Benchmark Results

| Category | Metric | Result |
|----------|--------|--------|
| **Throughput** | Ticks/sec (small colony) | 733 |
| **SQLite** | Save/load time | <1ms |
| **Async** | Overhead vs sync | <5% |
| **Serialization** | 200 agents | 8µs |
| **Semantic wiring** | Overhead | ~11% |

## Documentation

- [`docs/INTEGRATION_GUIDE.md`]docs/INTEGRATION_GUIDE.md**How to use Phago** — installation, examples, API reference
- [`docs/papers/phago-whitepaper-v2.md`]docs/papers/phago-whitepaper-v2.md**Main whitepaper (v2.0)** — comprehensive technical paper
- [`docs/EXECUTIVE_SUMMARY.md`]docs/EXECUTIVE_SUMMARY.md — Latest results and roadmap
- [`docs/COMPETITIVE_ANALYSIS.md`]docs/COMPETITIVE_ANALYSIS.md — Where Phago wins vs traditional approaches
- [`docs/USE_CASES.md`]docs/USE_CASES.md — Practical applications
- [`docs/WHITEPAPER.md`]docs/WHITEPAPER.md — Original theoretical foundation
- [`docs/PRD.md`]docs/PRD.md — Product requirements and specifications
- [`docs/BUILD_PLAN.md`]docs/BUILD_PLAN.md — Phased implementation roadmap

### Research Papers

| Branch | White Paper | Explainer |
|--------|-----------|-----------|
| Bio-RAG | [`bio-rag-whitepaper.md`]docs/papers/bio-rag-whitepaper.md | [`bio-rag-explainer.md`]docs/papers/bio-rag-explainer.md |
| Agent Evolution | [`agent-evolution-whitepaper.md`]docs/papers/agent-evolution-whitepaper.md | [`agent-evolution-explainer.md`]docs/papers/agent-evolution-explainer.md |
| KG Training | [`kg-training-whitepaper.md`]docs/papers/kg-training-whitepaper.md | [`kg-training-explainer.md`]docs/papers/kg-training-explainer.md |
| Agentic Memory | [`agentic-memory-whitepaper.md`]docs/papers/agentic-memory-whitepaper.md | [`agentic-memory-explainer.md`]docs/papers/agentic-memory-explainer.md |

## License

MIT