Phago — Biological Computing Primitives
Status: Beta / Production-Ready
A framework that maps cellular biology mechanisms to computational operations. Agents self-organize, consume documents, build a Hebbian knowledge graph, share vocabulary, detect anomalies, and exhibit emergent collective behavior — all without top-down orchestration.
Latest Results (Production Release)
| Metric | Before | After | Change |
|---|---|---|---|
| Tests passing | 32/34 | 99/99 | +67 tests, 100% pass rate |
| Graph edges (100 docs) | 255,888 | 4,472 | -98.3% density reduction |
| Best P@5 | 0.658 (TF-IDF) | 0.742 (Hybrid) | +12.8% |
| Best MRR | 0.714 (Graph) | 0.800 (Hybrid) | +12.0% |
| Genome parameters | 5 | 8 | +3 wiring strategy params |
| Query types | 1 | 5 | BFS, Hybrid, Path, Centrality, Bridge |
| MCP tools | 0 | 3 | remember, recall, explore |
What It Does
Feed the colony documents. Agents digest them into concepts, wire a knowledge graph through co-activation (Hebbian learning), share vocabulary across agent boundaries (horizontal gene transfer), and detect anomalies (negative selection). The graph structure IS the memory — frequently used connections strengthen, unused ones decay.
Documents → Agents digest → Concepts extracted → Graph wired → Knowledge emerges
↑ ↓
└──── Transfer, Symbiosis, Dissolution ─┘
Quick Start
Run the Demos
# Build
# Run the proof-of-concept (120-tick simulation)
# Run all tests (99 tests)
# Open the interactive visualization (generated by POC)
Use as a Library
Add to your Cargo.toml:
[]
= { = "https://github.com/Clemens865/Phago_Project.git" }
Basic usage with the prelude:
use *;
See docs/INTEGRATION_GUIDE.md for complete examples and API reference.
Production Features
- Single import:
use phago::prelude::*gives you everything - Structured errors:
Result<T, PhagoError>with typed error categories - Deterministic testing:
Digester::with_seed(pos, seed)for reproducible simulations - Session persistence: Save/restore colony state across sessions (JSON or SQLite)
- SQLite persistence:
ColonyBuilderwith auto-save for production deployments - Async runtime:
AsyncColonywithTickTimerfor real-time visualization - MCP adapter: Ready for external LLM/agent integration
- Semantic embeddings: Vector-based concept extraction (optional
semanticfeature)
SQLite Persistence (Phase 10)
Enable durable storage with automatic save/load:
[]
= { = "0.1", = ["sqlite"] }
use *;
// Create colony with persistent storage
let mut colony = new
.with_persistence // SQLite file
.auto_save // Save on drop
.build?;
// Use normally — persistence is automatic
colony.ingest_document;
colony.run;
colony.save?; // Explicit save (also happens on drop)
// Later: reload with full state preserved
let colony2 = new
.with_persistence
.build?;
Async Runtime (Phase 10)
Enable controlled-rate simulation for visualization:
[]
= { = "0.1", = ["async"] }
use *;
use ;
async
Semantic Embeddings (Phase 9)
Enable vector embeddings for semantic understanding:
[]
= { = "0.1", = ["semantic"] }
use *;
use Arc;
// Create an embedder (SimpleEmbedder or API-backed)
let embedder: = new;
// SemanticDigester uses embeddings for concept extraction
let mut digester = new;
let concepts = digester.digest_text;
// Find semantically similar concepts
let similar = digester.find_similar;
The semantic feature adds:
- SimpleEmbedder — Hash-based embeddings (no dependencies)
- SemanticDigester — Embedding-backed agent for semantic concept extraction
- Chunker — Document chunking with configurable overlap
- Similarity functions — cosine_similarity, euclidean_distance, normalize_l2
LLM Integration (Phase 9.2)
Enable LLM-backed concept extraction:
[]
# Local LLM (Ollama)
= { = "0.1", = ["llm-local"] }
# Cloud APIs (Claude, OpenAI)
= { = "0.1", = ["llm-api"] }
# All backends
= { = "0.1", = ["llm-full"] }
use *;
// Local Ollama backend (no API key needed)
let ollama = localhost.with_model;
let concepts = ollama.extract_concepts.await?;
// Claude backend
let claude = new.sonnet;
let concepts = claude.extract_concepts.await?;
// OpenAI backend
let openai = new.gpt4o_mini;
let concepts = openai.extract_concepts.await?;
The llm features add:
- OllamaBackend — Local LLM via Ollama (no API key needed)
- ClaudeBackend — Anthropic Claude API
- OpenAiBackend — OpenAI GPT API
- LlmBackend trait — Common interface for all backends
- Concept extraction — Extract structured concepts from text
- Relationship identification — Find relationships between concepts
- Query expansion — Expand queries for better recall
The Ten Biological Primitives
| Primitive | Biological Analog | What It Does |
|---|---|---|
| DIGEST | Phagocytosis | Consume input, extract fragments, present to graph |
| APOPTOSE | Programmed cell death | Self-assess health, gracefully self-terminate |
| SENSE | Chemotaxis | Detect signals, follow gradients |
| TRANSFER | Horizontal gene transfer | Export/import vocabulary between agents |
| EMERGE | Quorum sensing | Detect threshold, activate collective behavior |
| WIRE | Hebbian learning | Strengthen used connections, prune unused |
| SYMBIOSE | Endosymbiosis | Integrate another agent as permanent symbiont |
| STIGMERGE | Stigmergy | Coordinate through environmental traces |
| NEGATE | Negative selection | Learn self-model, detect anomalies by exclusion |
| DISSOLVE | Holobiont boundary | Modulate agent-substrate boundaries |
Agent Types
- Digester — Consumes documents, extracts keywords, presents concepts to the knowledge graph. Implements DIGEST + SENSE + APOPTOSE + TRANSFER + SYMBIOSE + DISSOLVE.
- Synthesizer — Dormant until quorum reached, then identifies bridge concepts and topic clusters. Implements EMERGE + SENSE + APOPTOSE.
- Sentinel — Learns what "normal" looks like, flags anomalies by deviation from self-model. Implements NEGATE + SENSE + APOPTOSE.
Research Branches
Four falsifiable hypotheses, each with a working prototype, benchmark, visualization, and papers.
1. Bio-RAG — Self-Reinforcing Retrieval
Hebbian-reinforced knowledge graph retrieval with hybrid scoring (TF-IDF + graph re-ranking).
| Metric | Graph-only | TF-IDF | Hybrid |
|---|---|---|---|
| P@5 | 0.280 | 0.742 | 0.742 |
| MRR | 0.650 | 0.775 | 0.800 |
| NDCG@10 | 0.357 | 0.404 | 0.410 |
Key insight: The graph's value is not in replacing TF-IDF but in re-ranking candidates using structural context. Hybrid scoring beats pure TF-IDF on MRR (first relevant result ranked higher).
2. Agent Evolution — Evolutionary Agents Through Apoptosis
Agents evolving through intrinsic selection pressure (death + mutation + inheritance) produce richer knowledge graphs.
| Metric (tick 300) | Evolved | Static | Random |
|---|---|---|---|
| Nodes | 1,582 | 864 | 1,191 |
| Edges | 101,824 | 8,769 | 38,399 |
| Clustering coeff. | 0.969 | 0.948 | 0.970 |
| Spawns / Generations | 140 / 135 | 0 / 0 | 144 / 144 |
3. KG Training — Knowledge Graph to Training Data
Hebbian-weighted triples with curriculum ordering for language model fine-tuning.
| Metric | Value |
|---|---|
| Communities detected | 548 |
| NMI vs ground truth | 0.170 |
| Triples exported | 252,641 |
| Foundation coherence | 100% same-community |
| Weight ratio (foundation/periphery) | 1.3x |
4. Agentic Memory — Persistent Code Knowledge
Self-organizing code knowledge graph that persists across sessions.
| Metric | Value |
|---|---|
| Code elements extracted | 830 |
| Graph nodes / edges | 659 / 33,490 |
| Session persistence | 100% fidelity |
| Graph P@5 | 0.140 |
New Features (Ralph Loop Phase 1)
Hebbian LTP Model (Tentative Edge Wiring)
- First co-occurrence creates edge at 0.1 weight (tentative)
- Subsequent co-occurrences reinforce:
weight += 0.1 - Single-document edges decay quickly under synaptic pruning
- Cross-document reinforced edges survive
Multi-Objective Fitness
4-dimensional evolution:
- 30% Productivity — concepts + edges per tick
- 30% Novelty — novel concepts / total concepts
- 20% Quality — strong edges (co_act ≥ 2) / total edges
- 20% Connectivity — bridge edges / total edges
Structural Queries
// Path queries — "What connects A to B?"
graph.shortest_path // Centrality queries — "What's most important?"
graph.betweenness_centrality // Bridge queries — "What concepts connect domains?"
graph.bridge_nodes // Component queries — "How many disconnected regions?"
graph.connected_components
MCP Integration
External LLMs/agents can interact via typed request/response API:
phago_remember(title, content, ticks)— ingest documentphago_recall(query, max_results, alpha)— hybrid queryphago_explore(type: path|centrality|bridges|stats)— structural queries
Architecture
crates/
├── phago/ # Unified facade crate (use this!)
├── phago-cli/ # Command-line interface (ingest, query, stats, session)
├── phago-core/ # Traits (10 primitives) + shared types + error handling
├── phago-runtime/ # Colony, substrate, topology, corpus, sessions, SQLite, async
├── phago-agents/ # Digester, Sentinel, Synthesizer, SemanticDigester, genome, evolution
├── phago-embeddings/ # Vector embeddings (SimpleEmbedder, OnnxEmbedder, API providers)
├── phago-llm/ # LLM integration (Ollama, Claude, OpenAI)
├── phago-rag/ # Query engine, scoring, baselines, hybrid, MCP adapter
├── phago-viz/ # Self-contained HTML visualization (D3.js)
└── phago-wasm/ # WASM integration (future)
poc/
├── knowledge-ecosystem/ # Original proof of concept
├── bio-rag-demo/ # Branch 1: self-reinforcing RAG
├── agent-evolution-demo/ # Branch 2: evolutionary agents
├── kg-training-demo/ # Branch 3: KG → training data
├── agentic-memory-demo/ # Branch 4: persistent code knowledge
└── data/corpus/ # 100-doc test corpus (4 topics × 25 docs)
docs/papers/ # White papers + explainers for each branch
Colony Lifecycle (per tick)
- Sense — All agents observe substrate (signals, documents, traces)
- Act — Colony processes agent actions (move, digest, present, wire)
- Transfer — Agents export/integrate vocabulary, attempt symbiosis
- Dissolve — Mature agents modulate boundaries, reinforce graph nodes
- Death — Remove agents that self-assessed for termination
- Decay — Signals, traces, and edge weights decay; weak edges pruned
Key Design Choices
- Rust ownership = biological resource management.
movesemantics model consumption (you can't eat something twice).Dropmodels apoptosis. No garbage collector = deterministic death. - The graph IS the memory. No separate storage layer. The topology of the knowledge graph, shaped by Hebbian learning, encodes all accumulated knowledge.
- No LLMs in the loop. The v0.1 primitives must prove emergence without external intelligence. The framework is designed for LLM-backed agents in future versions.
Quantitative Proof (Phase 5)
Running cargo run --bin phago-poc produces metrics proving the model works:
| Metric | What It Proves |
|---|---|
| Transfer Effect | Vocabulary sharing across agents (shared terms ratio, export/integration counts) |
| Dissolution Effect | Boundary modulation reinforces knowledge (concept vs non-concept access ratio) |
| Graph Richness | Colony builds meaningful structure (density, clustering coefficient, bridge concepts) |
| Vocabulary Spread | Knowledge propagates across agents (Gini coefficient of vocabulary sizes) |
The POC also generates output/phago-colony.html — an interactive D3.js visualization with:
- Force-directed knowledge graph
- Agent spatial canvas
- Event timeline
- Metrics dashboard with tick slider
Implementation Status
| Phase | Status | Description |
|---|---|---|
| 0 — Scaffold | ✅ Done | Workspace, 10 primitive traits, shared types |
| 1 — First Cell | ✅ Done | Digester agent, apoptosis, colony lifecycle |
| 2 — Self-Organization | ✅ Done | Chemotaxis, document ingestion, Hebbian wiring |
| 3 — Emergence | ✅ Done | Synthesizer (quorum sensing), Sentinel (negative selection) |
| 4 — Cooperation | ✅ Done | Transfer, Symbiosis, Dissolution |
| 5 — Prove It Works | ✅ Done | Metrics, visualization, hardening tests, performance optimization |
| 6 — Research Branches | ✅ Done | 4 branches with prototypes, benchmarks, papers |
| 7 — Production Ready | ✅ Done | Facade crate, preludes, error types, deterministic testing |
| 8 — Distribution | ✅ Done | Published to crates.io, CLI tool with all commands |
| 9.1 — Embeddings | ✅ Done | phago-embeddings crate, SemanticDigester agent |
| 9.2 — LLM Integration | ✅ Done | phago-llm crate (Ollama, Claude, OpenAI) |
| 9.3 — Vector Wiring | ✅ Done | SemanticWiringConfig, similarity-based edge weights |
| 10.1 — Agent Serialization | ✅ Done | SerializableAgent trait, session persistence with agents |
| 10.2 — SQLite Persistence | ✅ Done | ColonyBuilder, auto-save, WAL mode, full roundtrip |
| 10.3 — Async Runtime | ✅ Done | AsyncColony, TickTimer, run_in_local, spawn_simulation_local |
Tests
# All tests
# With all features (sqlite + async)
# By category
# Benchmarks (with features)
Phase 10 Benchmark Results
| Category | Metric | Result |
|---|---|---|
| Throughput | Ticks/sec (small colony) | 733 |
| SQLite | Save/load time | <1ms |
| Async | Overhead vs sync | <5% |
| Serialization | 200 agents | 8µs |
| Semantic wiring | Overhead | ~11% |
Documentation
docs/INTEGRATION_GUIDE.md— How to use Phago — installation, examples, API referencedocs/papers/phago-whitepaper-v2.md— Main whitepaper (v2.0) — comprehensive technical paperdocs/EXECUTIVE_SUMMARY.md— Latest results and roadmapdocs/COMPETITIVE_ANALYSIS.md— Where Phago wins vs traditional approachesdocs/USE_CASES.md— Practical applicationsdocs/WHITEPAPER.md— Original theoretical foundationdocs/PRD.md— Product requirements and specificationsdocs/BUILD_PLAN.md— Phased implementation roadmap
Research Papers
| Branch | White Paper | Explainer |
|---|---|---|
| Bio-RAG | bio-rag-whitepaper.md |
bio-rag-explainer.md |
| Agent Evolution | agent-evolution-whitepaper.md |
agent-evolution-explainer.md |
| KG Training | kg-training-whitepaper.md |
kg-training-explainer.md |
| Agentic Memory | agentic-memory-whitepaper.md |
agentic-memory-explainer.md |
License
MIT