Phago — Biological Computing Primitives

Version 1.0.0 | Production-Ready

A Rust framework that maps cellular biology mechanisms to computational operations. Agents self-organize, consume documents, build a Hebbian knowledge graph, share vocabulary, detect anomalies, and exhibit emergent collective behavior — all without top-down orchestration. Now with distributed multi-node sharding for horizontal scaling.

Key Results (v1.0.0)

Metric	Value	Notes
Tests passing	155+	100% pass rate across 14 crates
Graph edge reduction	98.3%	256k to 4.5k via Hebbian LTP
Hybrid MRR	0.800	Beats TF-IDF (0.775) on first-result ranking
Hybrid P@5	0.742	Matches TF-IDF precision
Evolved vs static edges	11.6x	Self-healing through agent evolution
Community detection NMI	1.000	Perfect topic recovery (Louvain)
Session persistence	100%	Full temporal state fidelity
Distributed shards	3+	Consistent hashing, ghost nodes, cross-shard queries

What It Does

Feed the colony documents. Agents digest them into concepts, wire a knowledge graph through co-activation (Hebbian learning), share vocabulary across agent boundaries (horizontal gene transfer), and detect anomalies (negative selection). The graph structure IS the memory — frequently used connections strengthen, unused ones decay.

Documents → Agents digest → Concepts extracted → Graph wired → Knowledge emerges
                ↑                                      ↓
                └──── Transfer, Symbiosis, Dissolution ─┘

Quick Start

Run the Demos

# Build
cargo build

# Run the proof-of-concept (120-tick simulation)
cargo run --bin phago-poc

# Run all tests
cargo test --workspace --exclude phago-python --exclude phago-web

# Build with distributed feature
cargo build -p phago --features distributed

# Run distributed benchmarks
cargo run --bin phago-bench -- quick

# Open the interactive visualization (generated by POC)
open output/phago-colony.html

Use as a Library

Add to your Cargo.toml:

[dependencies]
phago = { git = "https://github.com/Clemens865/Phago_Project.git" }

# With distributed support
phago = { git = "https://github.com/Clemens865/Phago_Project.git", features = ["distributed"] }

Basic usage with the prelude:

use phago::prelude::*;

fn main() {
    let mut colony = Colony::new();

    // Ingest documents
    colony.ingest_document("doc1", "Cell membrane transport proteins", Position::new(0.0, 0.0));
    colony.ingest_document("doc2", "Protein folding and membrane insertion", Position::new(1.0, 0.0));

    // Spawn digesters and run
    colony.spawn(Box::new(Digester::new(Position::new(0.0, 0.0)).with_max_idle(30)));
    colony.run(30);

    // Query with hybrid scoring
    let results = hybrid_query(&colony, "membrane protein", &HybridConfig {
        alpha: 0.5, max_results: 5, candidate_multiplier: 3,
    });

    for r in results {
        println!("{} (score: {:.3})", r.label, r.final_score);
    }
}

See docs/INTEGRATION_GUIDE.md for complete examples and API reference.

Production Features

Single import: use phago::prelude::* gives you everything
Structured errors: Result<T, PhagoError> with typed error categories
Deterministic testing: Digester::with_seed(pos, seed) for reproducible simulations
Session persistence: Save/restore colony state across sessions (JSON or SQLite)
SQLite persistence: ColonyBuilder with auto-save for production deployments
Async runtime: AsyncColony with TickTimer for real-time visualization
MCP adapter: Ready for external LLM/agent integration
Semantic embeddings: Vector-based concept extraction (optional semantic feature)
Distributed colony: Multi-node sharding with consistent hashing (optional distributed feature)
Vector DB integration: Qdrant, Pinecone, Weaviate adapters
Streaming ingestion: Async channels with backpressure and file watching
Web dashboard: Axum + D3.js real-time colony visualization
Python bindings: PyO3 with LangChain and LlamaIndex adapters
Louvain communities: Perfect topic clustering (NMI = 1.0)

SQLite Persistence (Phase 10)

Enable durable storage with automatic save/load:

[dependencies]
phago-runtime = { version = "1.0", features = ["sqlite"] }

use phago_runtime::prelude::*;

// Create colony with persistent storage
let mut colony = ColonyBuilder::new()
    .with_persistence("knowledge.db")  // SQLite file
    .auto_save(true)                   // Save on drop
    .build()?;

// Use normally — persistence is automatic
colony.ingest_document("title", "content", Position::new(0.0, 0.0));
colony.run(100);
colony.save()?;  // Explicit save (also happens on drop)

// Later: reload with full state preserved
let colony2 = ColonyBuilder::new()
    .with_persistence("knowledge.db")
    .build()?;

Async Runtime (Phase 10)

Enable controlled-rate simulation for visualization:

[dependencies]
phago-runtime = { version = "1.0", features = ["async"] }

use phago_runtime::prelude::*;
use phago_runtime::async_runtime::{run_in_local, TickTimer};

#[tokio::main]
async fn main() {
    let colony = Colony::new();

    // Fast async simulation
    run_in_local(colony, |ac| async move {
        ac.run_async(100).await
    }).await;

    // Or controlled tick rate for visualization
    let colony2 = Colony::new();
    run_in_local(colony2, |ac| async move {
        let mut timer = TickTimer::new(100);  // 100ms per tick
        timer.run_timed(&ac, 50).await;
    }).await;
}

Semantic Embeddings (Phase 9)

Enable vector embeddings for semantic understanding:

[dependencies]
phago = { version = "1.0", features = ["semantic"] }

use phago::prelude::*;
use std::sync::Arc;

// Create an embedder (SimpleEmbedder or API-backed)
let embedder: Arc<dyn Embedder> = Arc::new(SimpleEmbedder::new(256));

// SemanticDigester uses embeddings for concept extraction
let mut digester = SemanticDigester::new(Position::new(0.0, 0.0), embedder.clone());
let concepts = digester.digest_text("The mitochondria is the powerhouse of the cell.".into());

// Find semantically similar concepts
let similar = digester.find_similar("cellular energy", 5);

The semantic feature adds:

SimpleEmbedder — Hash-based embeddings (no dependencies)
SemanticDigester — Embedding-backed agent for semantic concept extraction
Chunker — Document chunking with configurable overlap
Similarity functions — cosine_similarity, euclidean_distance, normalize_l2

LLM Integration (Phase 9.2)

Enable LLM-backed concept extraction:

[dependencies]
# Local LLM (Ollama)
phago = { version = "1.0", features = ["llm-local"] }

# Cloud APIs (Claude, OpenAI)
phago = { version = "1.0", features = ["llm-api"] }

# All backends
phago = { version = "1.0", features = ["llm-full"] }

use phago::prelude::*;

// Local Ollama backend (no API key needed)
let ollama = OllamaBackend::localhost().with_model("llama3.2");
let concepts = ollama.extract_concepts("Cell membrane transport").await?;

// Claude backend
let claude = ClaudeBackend::new("sk-ant-...").sonnet();
let concepts = claude.extract_concepts("Cell membrane transport").await?;

// OpenAI backend
let openai = OpenAiBackend::new("sk-...").gpt4o_mini();
let concepts = openai.extract_concepts("Cell membrane transport").await?;

The llm features add:

OllamaBackend — Local LLM via Ollama (no API key needed)
ClaudeBackend — Anthropic Claude API
OpenAiBackend — OpenAI GPT API
LlmBackend trait — Common interface for all backends
Concept extraction — Extract structured concepts from text
Relationship identification — Find relationships between concepts
Query expansion — Expand queries for better recall

The Ten Biological Primitives

Primitive	Biological Analog	What It Does
DIGEST	Phagocytosis	Consume input, extract fragments, present to graph
APOPTOSE	Programmed cell death	Self-assess health, gracefully self-terminate
SENSE	Chemotaxis	Detect signals, follow gradients
TRANSFER	Horizontal gene transfer	Export/import vocabulary between agents
EMERGE	Quorum sensing	Detect threshold, activate collective behavior
WIRE	Hebbian learning	Strengthen used connections, prune unused
SYMBIOSE	Endosymbiosis	Integrate another agent as permanent symbiont
STIGMERGE	Stigmergy	Coordinate through environmental traces
NEGATE	Negative selection	Learn self-model, detect anomalies by exclusion
DISSOLVE	Holobiont boundary	Modulate agent-substrate boundaries

Agent Types

Digester — Consumes documents, extracts keywords, presents concepts to the knowledge graph. Implements DIGEST + SENSE + APOPTOSE + TRANSFER + SYMBIOSE + DISSOLVE.
Synthesizer — Dormant until quorum reached, then identifies bridge concepts and topic clusters. Implements EMERGE + SENSE + APOPTOSE.
Sentinel — Learns what "normal" looks like, flags anomalies by deviation from self-model. Implements NEGATE + SENSE + APOPTOSE.

Research Branches

Four falsifiable hypotheses, each with a working prototype, benchmark, visualization, and papers.

1. Bio-RAG — Self-Reinforcing Retrieval

Hebbian-reinforced knowledge graph retrieval with hybrid scoring (TF-IDF + graph re-ranking).

cargo run --bin phago-bio-rag-demo

Metric	Graph-only	TF-IDF	Hybrid
P@5	0.280	0.742	0.742
MRR	0.650	0.775	0.800
NDCG@10	0.357	0.404	0.410

Key insight: The graph's value is not in replacing TF-IDF but in re-ranking candidates using structural context. Hybrid scoring beats pure TF-IDF on MRR (first relevant result ranked higher).

2. Agent Evolution — Evolutionary Agents Through Apoptosis

Agents evolving through intrinsic selection pressure (death + mutation + inheritance) produce richer knowledge graphs.

cargo run --bin phago-agent-evolution-demo

Metric (tick 300)	Evolved	Static	Random
Nodes	1,582	864	1,191
Edges	101,824	8,769	38,399
Clustering coeff.	0.969	0.948	0.970
Spawns / Generations	140 / 135	0 / 0	144 / 144

3. KG Training — Knowledge Graph to Training Data

Hebbian-weighted triples with Louvain community detection and curriculum ordering for LLM fine-tuning.

cargo run --bin phago-kg-training-demo

Metric	Before (Label Prop)	After (Louvain)
Communities	1 mega + 547 singletons	Correct structure
NMI vs ground truth	0.170	1.000 (perfect)
Modularity	N/A	0.609-0.816
Triples exported	252,641	252,641
Foundation coherence	100%	100%

4. Agentic Memory — Persistent Code Knowledge

Self-organizing code knowledge graph that persists across sessions.

cargo run --bin phago-agentic-memory-demo

Metric	Value
Code elements extracted	830
Graph nodes / edges	659 / 33,490
Session persistence	100% fidelity
Graph P@5	0.140

New Features (Ralph Loop Phase 1)

Hebbian LTP Model (Tentative Edge Wiring)

First co-occurrence creates edge at 0.1 weight (tentative)
Subsequent co-occurrences reinforce: weight += 0.1
Single-document edges decay quickly under synaptic pruning
Cross-document reinforced edges survive

Multi-Objective Fitness

4-dimensional evolution:

30% Productivity — concepts + edges per tick
30% Novelty — novel concepts / total concepts
20% Quality — strong edges (co_act ≥ 2) / total edges
20% Connectivity — bridge edges / total edges

Structural Queries

// Path queries — "What connects A to B?"
graph.shortest_path(&from, &to) -> Option<(Vec<NodeId>, f64)>

// Centrality queries — "What's most important?"
graph.betweenness_centrality(100) -> Vec<(NodeId, f64)>

// Bridge queries — "What concepts connect domains?"
graph.bridge_nodes(10) -> Vec<(NodeId, f64)>

// Component queries — "How many disconnected regions?"
graph.connected_components() -> usize

Distributed Colony (v1.0.0)

Scale horizontally across multiple nodes:

# Start coordinator
cargo run --bin phago -- cluster start-coordinator --port 9000

# Start shards (in separate terminals)
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9001
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9002

# Check cluster status
cargo run --bin phago -- cluster status --coordinator 127.0.0.1:9000

# Or use Docker Compose
cd deploy && docker-compose up

Architecture:

Consistent hash ring with 150 virtual nodes per shard for even distribution
Ghost nodes for lazy-resolved cross-shard edge references
Phase-synchronized ticks (Sense/Act/Decay/Advance) via barrier coordination
Two-phase distributed TF-IDF with scatter-gather for globally accurate scoring
tarpc RPC with connection pooling for inter-shard communication

MCP Integration

External LLMs/agents can interact via typed request/response API:

phago_remember(title, content, ticks) — ingest document
phago_recall(query, max_results, alpha) — hybrid query
phago_explore(type: path|centrality|bridges|stats) — structural queries

Architecture

crates/
├── phago/              # Unified facade crate (use this!)
├── phago-cli/          # CLI (ingest, query, stats, session, cluster)
├── phago-core/         # Traits (10 primitives) + shared types + Louvain
├── phago-runtime/      # Colony, substrate, topology, sessions, SQLite, async, streaming
├── phago-agents/       # Digester, Sentinel, Synthesizer, SemanticDigester, genome
├── phago-embeddings/   # Vector embeddings (Simple, ONNX, API providers)
├── phago-llm/          # LLM integration (Ollama, Claude, OpenAI)
├── phago-rag/          # Query engine, hybrid scoring, MCP adapter
├── phago-viz/          # Self-contained HTML visualization (D3.js)
├── phago-web/          # Axum web dashboard + WebSocket
├── phago-python/       # PyO3 bindings (LangChain, LlamaIndex)
├── phago-vectors/      # Vector DB adapters (Qdrant, Pinecone, Weaviate)
├── phago-distributed/  # Multi-node sharding, tarpc RPC, consistent hashing
└── phago-wasm/         # WASM integration (future)
poc/
├── knowledge-ecosystem/   # Full system demo (120-tick simulation)
├── bio-rag-demo/          # Hybrid retrieval benchmark
├── agent-evolution-demo/  # Evolutionary agents experiment
├── kg-training-demo/      # Curriculum ordering with Louvain
├── agentic-memory-demo/   # Persistent code knowledge
└── data/corpus/           # 100-doc test corpus (4 topics × 25 docs)
deploy/
└── docker-compose.yml     # Distributed cluster deployment
docs/
├── ABOUT_PHAGO.md         # Comprehensive project paper
├── papers/                # Research branch whitepapers
└── ...                    # Integration guide, executive summary, etc.

Colony Lifecycle (per tick)

Sense — All agents observe substrate (signals, documents, traces)
Act — Colony processes agent actions (move, digest, present, wire)
Transfer — Agents export/integrate vocabulary, attempt symbiosis
Dissolve — Mature agents modulate boundaries, reinforce graph nodes
Death — Remove agents that self-assessed for termination
Decay — Signals, traces, and edge weights decay; weak edges pruned

Key Design Choices

Rust ownership = biological resource management. move semantics model consumption (you can't eat something twice). Drop models apoptosis. No garbage collector = deterministic death.
The graph IS the memory. No separate storage layer. The topology of the knowledge graph, shaped by Hebbian learning, encodes all accumulated knowledge.
No LLMs in the loop. The v0.1 primitives must prove emergence without external intelligence. The framework is designed for LLM-backed agents in future versions.

Quantitative Proof (Phase 5)

Running cargo run --bin phago-poc produces metrics proving the model works:

Metric	What It Proves
Transfer Effect	Vocabulary sharing across agents (shared terms ratio, export/integration counts)
Dissolution Effect	Boundary modulation reinforces knowledge (concept vs non-concept access ratio)
Graph Richness	Colony builds meaningful structure (density, clustering coefficient, bridge concepts)
Vocabulary Spread	Knowledge propagates across agents (Gini coefficient of vocabulary sizes)

The POC also generates output/phago-colony.html — an interactive D3.js visualization with:

Force-directed knowledge graph
Agent spatial canvas
Event timeline
Metrics dashboard with tick slider

Implementation Status

Phase	Version	Status	Description
0-4 — Core Framework	0.1.0	✅ Done	10 primitives, 3 agent types, colony lifecycle
5-6 — Research	0.2.0	✅ Done	4 branches with prototypes, benchmarks, papers
7-8 — Production	0.2.0	✅ Done	Facade crate, CLI, preludes, error types
9 — Semantic Intelligence	0.3.0	✅ Done	Embeddings, LLM backends, semantic wiring
10 — Persistence & Scale	0.3.0	✅ Done	SQLite, async runtime, agent serialization
Config File Support	0.3.0	✅ Done	phago.toml with ColonyBuilder integration
Web Dashboard	0.4.0	✅ Done	Axum + D3.js real-time colony visualization
Python Bindings	0.5.0	✅ Done	PyO3 with LangChain and LlamaIndex adapters
Louvain Communities	0.5.0	✅ Done	Perfect NMI = 1.0 on synthetic benchmarks
Streaming Ingestion	0.6.0	✅ Done	Async channels, backpressure, file watching
Vector DB Integration	0.7.0	✅ Done	Qdrant, Pinecone, Weaviate adapters
Distributed Colony	1.0.0	✅ Done	Sharding, tarpc RPC, consistent hashing, ghost nodes

Tests

# All tests (excludes phago-python which requires maturin)
cargo test --workspace --exclude phago-python --exclude phago-web

# Distributed crate tests (146 unit + 9 integration)
cargo test -p phago-distributed

# By category
cargo test --test transfer_tests       # Vocabulary export/import
cargo test --test symbiosis_tests      # Agent absorption
cargo test --test dissolution_tests    # Boundary modulation
cargo test --test phase4_integration   # Full colony integration
cargo test -p phago-runtime metrics    # Quantitative metrics

# Distributed benchmarks
cargo run --bin phago-bench -- quick

Benchmark Results

Category	Metric	Result
Throughput	Ticks/sec (small colony)	733
SQLite	Save/load time	<1ms
Async	Overhead vs sync	<5%
Serialization	200 agents	8µs
Semantic wiring	Overhead	~11%

Documentation

docs/ABOUT_PHAGO.md — About Phago — comprehensive project paper (v1.0.0)
docs/INTEGRATION_GUIDE.md — How to use Phago — installation, examples, API reference
docs/papers/phago-whitepaper-v2.md — Main whitepaper (v2.0) — technical paper
docs/EXECUTIVE_SUMMARY.md — Latest results and roadmap
docs/COMPETITIVE_ANALYSIS.md — Where Phago wins vs traditional approaches
docs/USE_CASES.md — Practical applications
docs/WHITEPAPER.md — Original theoretical foundation
docs/NEXT_PRIORITIES.md — Development plan (all 7 priorities complete)

Research Papers

Branch	White Paper	Explainer
Bio-RAG	`bio-rag-whitepaper.md`	`bio-rag-explainer.md`
Agent Evolution	`agent-evolution-whitepaper.md`	`agent-evolution-explainer.md`
KG Training	`kg-training-whitepaper.md`	`kg-training-explainer.md`
Agentic Memory	`agentic-memory-whitepaper.md`	`agentic-memory-explainer.md`

License

MIT

phago 1.0.0