phago 1.0.0

Self-evolving knowledge substrates through biological computing primitives
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
# Phago — Biological Computing Primitives

**Version 1.0.0 | Production-Ready**

A Rust framework that maps cellular biology mechanisms to computational operations. Agents self-organize, consume documents, build a Hebbian knowledge graph, share vocabulary, detect anomalies, and exhibit emergent collective behavior — all without top-down orchestration. Now with distributed multi-node sharding for horizontal scaling.

## Key Results (v1.0.0)

| Metric | Value | Notes |
|--------|-------|-------|
| Tests passing | **155+** | 100% pass rate across 14 crates |
| Graph edge reduction | **98.3%** | 256k to 4.5k via Hebbian LTP |
| Hybrid MRR | **0.800** | Beats TF-IDF (0.775) on first-result ranking |
| Hybrid P@5 | **0.742** | Matches TF-IDF precision |
| Evolved vs static edges | **11.6x** | Self-healing through agent evolution |
| Community detection NMI | **1.000** | Perfect topic recovery (Louvain) |
| Session persistence | **100%** | Full temporal state fidelity |
| Distributed shards | **3+** | Consistent hashing, ghost nodes, cross-shard queries |

## What It Does

Feed the colony documents. Agents digest them into concepts, wire a knowledge graph through co-activation (Hebbian learning), share vocabulary across agent boundaries (horizontal gene transfer), and detect anomalies (negative selection). The graph structure IS the memory — frequently used connections strengthen, unused ones decay.

```
Documents → Agents digest → Concepts extracted → Graph wired → Knowledge emerges
                ↑                                      ↓
                └──── Transfer, Symbiosis, Dissolution ─┘
```

## Quick Start

### Run the Demos

```bash
# Build
cargo build

# Run the proof-of-concept (120-tick simulation)
cargo run --bin phago-poc

# Run all tests
cargo test --workspace --exclude phago-python --exclude phago-web

# Build with distributed feature
cargo build -p phago --features distributed

# Run distributed benchmarks
cargo run --bin phago-bench -- quick

# Open the interactive visualization (generated by POC)
open output/phago-colony.html
```

### Use as a Library

Add to your `Cargo.toml`:

```toml
[dependencies]
phago = { git = "https://github.com/Clemens865/Phago_Project.git" }

# With distributed support
phago = { git = "https://github.com/Clemens865/Phago_Project.git", features = ["distributed"] }
```

Basic usage with the prelude:

```rust
use phago::prelude::*;

fn main() {
    let mut colony = Colony::new();

    // Ingest documents
    colony.ingest_document("doc1", "Cell membrane transport proteins", Position::new(0.0, 0.0));
    colony.ingest_document("doc2", "Protein folding and membrane insertion", Position::new(1.0, 0.0));

    // Spawn digesters and run
    colony.spawn(Box::new(Digester::new(Position::new(0.0, 0.0)).with_max_idle(30)));
    colony.run(30);

    // Query with hybrid scoring
    let results = hybrid_query(&colony, "membrane protein", &HybridConfig {
        alpha: 0.5, max_results: 5, candidate_multiplier: 3,
    });

    for r in results {
        println!("{} (score: {:.3})", r.label, r.final_score);
    }
}
```

See [`docs/INTEGRATION_GUIDE.md`](docs/INTEGRATION_GUIDE.md) for complete examples and API reference.

### Production Features

- **Single import**: `use phago::prelude::*` gives you everything
- **Structured errors**: `Result<T, PhagoError>` with typed error categories
- **Deterministic testing**: `Digester::with_seed(pos, seed)` for reproducible simulations
- **Session persistence**: Save/restore colony state across sessions (JSON or SQLite)
- **SQLite persistence**: `ColonyBuilder` with auto-save for production deployments
- **Async runtime**: `AsyncColony` with `TickTimer` for real-time visualization
- **MCP adapter**: Ready for external LLM/agent integration
- **Semantic embeddings**: Vector-based concept extraction (optional `semantic` feature)
- **Distributed colony**: Multi-node sharding with consistent hashing (optional `distributed` feature)
- **Vector DB integration**: Qdrant, Pinecone, Weaviate adapters
- **Streaming ingestion**: Async channels with backpressure and file watching
- **Web dashboard**: Axum + D3.js real-time colony visualization
- **Python bindings**: PyO3 with LangChain and LlamaIndex adapters
- **Louvain communities**: Perfect topic clustering (NMI = 1.0)

### SQLite Persistence (Phase 10)

Enable durable storage with automatic save/load:

```toml
[dependencies]
phago-runtime = { version = "1.0", features = ["sqlite"] }
```

```rust
use phago_runtime::prelude::*;

// Create colony with persistent storage
let mut colony = ColonyBuilder::new()
    .with_persistence("knowledge.db")  // SQLite file
    .auto_save(true)                   // Save on drop
    .build()?;

// Use normally — persistence is automatic
colony.ingest_document("title", "content", Position::new(0.0, 0.0));
colony.run(100);
colony.save()?;  // Explicit save (also happens on drop)

// Later: reload with full state preserved
let colony2 = ColonyBuilder::new()
    .with_persistence("knowledge.db")
    .build()?;
```

### Async Runtime (Phase 10)

Enable controlled-rate simulation for visualization:

```toml
[dependencies]
phago-runtime = { version = "1.0", features = ["async"] }
```

```rust
use phago_runtime::prelude::*;
use phago_runtime::async_runtime::{run_in_local, TickTimer};

#[tokio::main]
async fn main() {
    let colony = Colony::new();

    // Fast async simulation
    run_in_local(colony, |ac| async move {
        ac.run_async(100).await
    }).await;

    // Or controlled tick rate for visualization
    let colony2 = Colony::new();
    run_in_local(colony2, |ac| async move {
        let mut timer = TickTimer::new(100);  // 100ms per tick
        timer.run_timed(&ac, 50).await;
    }).await;
}
```

### Semantic Embeddings (Phase 9)

Enable vector embeddings for semantic understanding:

```toml
[dependencies]
phago = { version = "1.0", features = ["semantic"] }
```

```rust
use phago::prelude::*;
use std::sync::Arc;

// Create an embedder (SimpleEmbedder or API-backed)
let embedder: Arc<dyn Embedder> = Arc::new(SimpleEmbedder::new(256));

// SemanticDigester uses embeddings for concept extraction
let mut digester = SemanticDigester::new(Position::new(0.0, 0.0), embedder.clone());
let concepts = digester.digest_text("The mitochondria is the powerhouse of the cell.".into());

// Find semantically similar concepts
let similar = digester.find_similar("cellular energy", 5);
```

The `semantic` feature adds:
- **SimpleEmbedder** — Hash-based embeddings (no dependencies)
- **SemanticDigester** — Embedding-backed agent for semantic concept extraction
- **Chunker** — Document chunking with configurable overlap
- **Similarity functions** — cosine_similarity, euclidean_distance, normalize_l2

### LLM Integration (Phase 9.2)

Enable LLM-backed concept extraction:

```toml
[dependencies]
# Local LLM (Ollama)
phago = { version = "1.0", features = ["llm-local"] }

# Cloud APIs (Claude, OpenAI)
phago = { version = "1.0", features = ["llm-api"] }

# All backends
phago = { version = "1.0", features = ["llm-full"] }
```

```rust,ignore
use phago::prelude::*;

// Local Ollama backend (no API key needed)
let ollama = OllamaBackend::localhost().with_model("llama3.2");
let concepts = ollama.extract_concepts("Cell membrane transport").await?;

// Claude backend
let claude = ClaudeBackend::new("sk-ant-...").sonnet();
let concepts = claude.extract_concepts("Cell membrane transport").await?;

// OpenAI backend
let openai = OpenAiBackend::new("sk-...").gpt4o_mini();
let concepts = openai.extract_concepts("Cell membrane transport").await?;
```

The `llm` features add:
- **OllamaBackend** — Local LLM via Ollama (no API key needed)
- **ClaudeBackend** — Anthropic Claude API
- **OpenAiBackend** — OpenAI GPT API
- **LlmBackend trait** — Common interface for all backends
- **Concept extraction** — Extract structured concepts from text
- **Relationship identification** — Find relationships between concepts
- **Query expansion** — Expand queries for better recall

## The Ten Biological Primitives

| Primitive | Biological Analog | What It Does |
|-----------|-------------------|-------------|
| **DIGEST** | Phagocytosis | Consume input, extract fragments, present to graph |
| **APOPTOSE** | Programmed cell death | Self-assess health, gracefully self-terminate |
| **SENSE** | Chemotaxis | Detect signals, follow gradients |
| **TRANSFER** | Horizontal gene transfer | Export/import vocabulary between agents |
| **EMERGE** | Quorum sensing | Detect threshold, activate collective behavior |
| **WIRE** | Hebbian learning | Strengthen used connections, prune unused |
| **SYMBIOSE** | Endosymbiosis | Integrate another agent as permanent symbiont |
| **STIGMERGE** | Stigmergy | Coordinate through environmental traces |
| **NEGATE** | Negative selection | Learn self-model, detect anomalies by exclusion |
| **DISSOLVE** | Holobiont boundary | Modulate agent-substrate boundaries |

## Agent Types

- **Digester** — Consumes documents, extracts keywords, presents concepts to the knowledge graph. Implements DIGEST + SENSE + APOPTOSE + TRANSFER + SYMBIOSE + DISSOLVE.
- **Synthesizer** — Dormant until quorum reached, then identifies bridge concepts and topic clusters. Implements EMERGE + SENSE + APOPTOSE.
- **Sentinel** — Learns what "normal" looks like, flags anomalies by deviation from self-model. Implements NEGATE + SENSE + APOPTOSE.

## Research Branches

Four falsifiable hypotheses, each with a working prototype, benchmark, visualization, and papers.

### 1. Bio-RAG — Self-Reinforcing Retrieval

Hebbian-reinforced knowledge graph retrieval with hybrid scoring (TF-IDF + graph re-ranking).

```bash
cargo run --bin phago-bio-rag-demo
```

| Metric | Graph-only | TF-IDF | **Hybrid** |
|--------|-----------|--------|------------|
| P@5 | 0.280 | 0.742 | **0.742** |
| MRR | 0.650 | 0.775 | **0.800** |
| NDCG@10 | 0.357 | 0.404 | **0.410** |

**Key insight:** The graph's value is not in replacing TF-IDF but in *re-ranking* candidates using structural context. Hybrid scoring beats pure TF-IDF on MRR (first relevant result ranked higher).

### 2. Agent Evolution — Evolutionary Agents Through Apoptosis

Agents evolving through intrinsic selection pressure (death + mutation + inheritance) produce richer knowledge graphs.

```bash
cargo run --bin phago-agent-evolution-demo
```

| Metric (tick 300) | Evolved | Static | Random |
|-------------------|---------|--------|--------|
| Nodes | 1,582 | 864 | 1,191 |
| Edges | 101,824 | 8,769 | 38,399 |
| Clustering coeff. | 0.969 | 0.948 | 0.970 |
| Spawns / Generations | 140 / 135 | 0 / 0 | 144 / 144 |

### 3. KG Training — Knowledge Graph to Training Data

Hebbian-weighted triples with Louvain community detection and curriculum ordering for LLM fine-tuning.

```bash
cargo run --bin phago-kg-training-demo
```

| Metric | Before (Label Prop) | After (Louvain) |
|--------|--------------------|--------------------|
| Communities | 1 mega + 547 singletons | Correct structure |
| NMI vs ground truth | 0.170 | **1.000** (perfect) |
| Modularity | N/A | 0.609-0.816 |
| Triples exported | 252,641 | 252,641 |
| Foundation coherence | 100% | 100% |

### 4. Agentic Memory — Persistent Code Knowledge

Self-organizing code knowledge graph that persists across sessions.

```bash
cargo run --bin phago-agentic-memory-demo
```

| Metric | Value |
|--------|-------|
| Code elements extracted | 830 |
| Graph nodes / edges | 659 / 33,490 |
| Session persistence | 100% fidelity |
| Graph P@5 | 0.140 |

## New Features (Ralph Loop Phase 1)

### Hebbian LTP Model (Tentative Edge Wiring)
- First co-occurrence creates edge at **0.1 weight** (tentative)
- Subsequent co-occurrences reinforce: `weight += 0.1`
- Single-document edges decay quickly under synaptic pruning
- Cross-document reinforced edges survive

### Multi-Objective Fitness
4-dimensional evolution:
- **30% Productivity** — concepts + edges per tick
- **30% Novelty** — novel concepts / total concepts
- **20% Quality** — strong edges (co_act ≥ 2) / total edges
- **20% Connectivity** — bridge edges / total edges

### Structural Queries
```rust
// Path queries — "What connects A to B?"
graph.shortest_path(&from, &to) -> Option<(Vec<NodeId>, f64)>

// Centrality queries — "What's most important?"
graph.betweenness_centrality(100) -> Vec<(NodeId, f64)>

// Bridge queries — "What concepts connect domains?"
graph.bridge_nodes(10) -> Vec<(NodeId, f64)>

// Component queries — "How many disconnected regions?"
graph.connected_components() -> usize
```

### Distributed Colony (v1.0.0)

Scale horizontally across multiple nodes:

```bash
# Start coordinator
cargo run --bin phago -- cluster start-coordinator --port 9000

# Start shards (in separate terminals)
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9001
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9002

# Check cluster status
cargo run --bin phago -- cluster status --coordinator 127.0.0.1:9000

# Or use Docker Compose
cd deploy && docker-compose up
```

Architecture:
- **Consistent hash ring** with 150 virtual nodes per shard for even distribution
- **Ghost nodes** for lazy-resolved cross-shard edge references
- **Phase-synchronized ticks** (Sense/Act/Decay/Advance) via barrier coordination
- **Two-phase distributed TF-IDF** with scatter-gather for globally accurate scoring
- **tarpc RPC** with connection pooling for inter-shard communication

### MCP Integration
External LLMs/agents can interact via typed request/response API:
- `phago_remember(title, content, ticks)` — ingest document
- `phago_recall(query, max_results, alpha)` — hybrid query
- `phago_explore(type: path|centrality|bridges|stats)` — structural queries

## Architecture

```
crates/
├── phago/              # Unified facade crate (use this!)
├── phago-cli/          # CLI (ingest, query, stats, session, cluster)
├── phago-core/         # Traits (10 primitives) + shared types + Louvain
├── phago-runtime/      # Colony, substrate, topology, sessions, SQLite, async, streaming
├── phago-agents/       # Digester, Sentinel, Synthesizer, SemanticDigester, genome
├── phago-embeddings/   # Vector embeddings (Simple, ONNX, API providers)
├── phago-llm/          # LLM integration (Ollama, Claude, OpenAI)
├── phago-rag/          # Query engine, hybrid scoring, MCP adapter
├── phago-viz/          # Self-contained HTML visualization (D3.js)
├── phago-web/          # Axum web dashboard + WebSocket
├── phago-python/       # PyO3 bindings (LangChain, LlamaIndex)
├── phago-vectors/      # Vector DB adapters (Qdrant, Pinecone, Weaviate)
├── phago-distributed/  # Multi-node sharding, tarpc RPC, consistent hashing
└── phago-wasm/         # WASM integration (future)
poc/
├── knowledge-ecosystem/   # Full system demo (120-tick simulation)
├── bio-rag-demo/          # Hybrid retrieval benchmark
├── agent-evolution-demo/  # Evolutionary agents experiment
├── kg-training-demo/      # Curriculum ordering with Louvain
├── agentic-memory-demo/   # Persistent code knowledge
└── data/corpus/           # 100-doc test corpus (4 topics × 25 docs)
deploy/
└── docker-compose.yml     # Distributed cluster deployment
docs/
├── ABOUT_PHAGO.md         # Comprehensive project paper
├── papers/                # Research branch whitepapers
└── ...                    # Integration guide, executive summary, etc.
```

### Colony Lifecycle (per tick)

1. **Sense** — All agents observe substrate (signals, documents, traces)
2. **Act** — Colony processes agent actions (move, digest, present, wire)
3. **Transfer** — Agents export/integrate vocabulary, attempt symbiosis
4. **Dissolve** — Mature agents modulate boundaries, reinforce graph nodes
5. **Death** — Remove agents that self-assessed for termination
6. **Decay** — Signals, traces, and edge weights decay; weak edges pruned

### Key Design Choices

- **Rust ownership = biological resource management.** `move` semantics model consumption (you can't eat something twice). `Drop` models apoptosis. No garbage collector = deterministic death.
- **The graph IS the memory.** No separate storage layer. The topology of the knowledge graph, shaped by Hebbian learning, encodes all accumulated knowledge.
- **No LLMs in the loop.** The v0.1 primitives must prove emergence without external intelligence. The framework is designed for LLM-backed agents in future versions.

## Quantitative Proof (Phase 5)

Running `cargo run --bin phago-poc` produces metrics proving the model works:

| Metric | What It Proves |
|--------|---------------|
| **Transfer Effect** | Vocabulary sharing across agents (shared terms ratio, export/integration counts) |
| **Dissolution Effect** | Boundary modulation reinforces knowledge (concept vs non-concept access ratio) |
| **Graph Richness** | Colony builds meaningful structure (density, clustering coefficient, bridge concepts) |
| **Vocabulary Spread** | Knowledge propagates across agents (Gini coefficient of vocabulary sizes) |

The POC also generates `output/phago-colony.html` — an interactive D3.js visualization with:
- Force-directed knowledge graph
- Agent spatial canvas
- Event timeline
- Metrics dashboard with tick slider

## Implementation Status

| Phase | Version | Status | Description |
|-------|---------|--------|-------------|
| 0-4 — Core Framework | 0.1.0 | ✅ Done | 10 primitives, 3 agent types, colony lifecycle |
| 5-6 — Research | 0.2.0 | ✅ Done | 4 branches with prototypes, benchmarks, papers |
| 7-8 — Production | 0.2.0 | ✅ Done | Facade crate, CLI, preludes, error types |
| 9 — Semantic Intelligence | 0.3.0 | ✅ Done | Embeddings, LLM backends, semantic wiring |
| 10 — Persistence & Scale | 0.3.0 | ✅ Done | SQLite, async runtime, agent serialization |
| Config File Support | 0.3.0 | ✅ Done | phago.toml with ColonyBuilder integration |
| Web Dashboard | 0.4.0 | ✅ Done | Axum + D3.js real-time colony visualization |
| Python Bindings | 0.5.0 | ✅ Done | PyO3 with LangChain and LlamaIndex adapters |
| Louvain Communities | 0.5.0 | ✅ Done | Perfect NMI = 1.0 on synthetic benchmarks |
| Streaming Ingestion | 0.6.0 | ✅ Done | Async channels, backpressure, file watching |
| Vector DB Integration | 0.7.0 | ✅ Done | Qdrant, Pinecone, Weaviate adapters |
| **Distributed Colony** | **1.0.0** | ✅ Done | **Sharding, tarpc RPC, consistent hashing, ghost nodes** |

## Tests

```bash
# All tests (excludes phago-python which requires maturin)
cargo test --workspace --exclude phago-python --exclude phago-web

# Distributed crate tests (146 unit + 9 integration)
cargo test -p phago-distributed

# By category
cargo test --test transfer_tests       # Vocabulary export/import
cargo test --test symbiosis_tests      # Agent absorption
cargo test --test dissolution_tests    # Boundary modulation
cargo test --test phase4_integration   # Full colony integration
cargo test -p phago-runtime metrics    # Quantitative metrics

# Distributed benchmarks
cargo run --bin phago-bench -- quick
```

### Benchmark Results

| Category | Metric | Result |
|----------|--------|--------|
| **Throughput** | Ticks/sec (small colony) | 733 |
| **SQLite** | Save/load time | <1ms |
| **Async** | Overhead vs sync | <5% |
| **Serialization** | 200 agents | 8µs |
| **Semantic wiring** | Overhead | ~11% |

## Documentation

- [`docs/ABOUT_PHAGO.md`]docs/ABOUT_PHAGO.md**About Phago** — comprehensive project paper (v1.0.0)
- [`docs/INTEGRATION_GUIDE.md`]docs/INTEGRATION_GUIDE.md**How to use Phago** — installation, examples, API reference
- [`docs/papers/phago-whitepaper-v2.md`]docs/papers/phago-whitepaper-v2.md**Main whitepaper (v2.0)** — technical paper
- [`docs/EXECUTIVE_SUMMARY.md`]docs/EXECUTIVE_SUMMARY.md — Latest results and roadmap
- [`docs/COMPETITIVE_ANALYSIS.md`]docs/COMPETITIVE_ANALYSIS.md — Where Phago wins vs traditional approaches
- [`docs/USE_CASES.md`]docs/USE_CASES.md — Practical applications
- [`docs/WHITEPAPER.md`]docs/WHITEPAPER.md — Original theoretical foundation
- [`docs/NEXT_PRIORITIES.md`]docs/NEXT_PRIORITIES.md — Development plan (all 7 priorities complete)

### Research Papers

| Branch | White Paper | Explainer |
|--------|-----------|-----------|
| Bio-RAG | [`bio-rag-whitepaper.md`]docs/papers/bio-rag-whitepaper.md | [`bio-rag-explainer.md`]docs/papers/bio-rag-explainer.md |
| Agent Evolution | [`agent-evolution-whitepaper.md`]docs/papers/agent-evolution-whitepaper.md | [`agent-evolution-explainer.md`]docs/papers/agent-evolution-explainer.md |
| KG Training | [`kg-training-whitepaper.md`]docs/papers/kg-training-whitepaper.md | [`kg-training-explainer.md`]docs/papers/kg-training-explainer.md |
| Agentic Memory | [`agentic-memory-whitepaper.md`]docs/papers/agentic-memory-whitepaper.md | [`agentic-memory-explainer.md`]docs/papers/agentic-memory-explainer.md |

## License

MIT