phago 1.0.0

Self-evolving knowledge substrates through biological computing primitives
Documentation

Phago — Biological Computing Primitives

Version 1.0.0 | Production-Ready

A Rust framework that maps cellular biology mechanisms to computational operations. Agents self-organize, consume documents, build a Hebbian knowledge graph, share vocabulary, detect anomalies, and exhibit emergent collective behavior — all without top-down orchestration. Now with distributed multi-node sharding for horizontal scaling.

Key Results (v1.0.0)

Metric Value Notes
Tests passing 155+ 100% pass rate across 14 crates
Graph edge reduction 98.3% 256k to 4.5k via Hebbian LTP
Hybrid MRR 0.800 Beats TF-IDF (0.775) on first-result ranking
Hybrid P@5 0.742 Matches TF-IDF precision
Evolved vs static edges 11.6x Self-healing through agent evolution
Community detection NMI 1.000 Perfect topic recovery (Louvain)
Session persistence 100% Full temporal state fidelity
Distributed shards 3+ Consistent hashing, ghost nodes, cross-shard queries

What It Does

Feed the colony documents. Agents digest them into concepts, wire a knowledge graph through co-activation (Hebbian learning), share vocabulary across agent boundaries (horizontal gene transfer), and detect anomalies (negative selection). The graph structure IS the memory — frequently used connections strengthen, unused ones decay.

Documents → Agents digest → Concepts extracted → Graph wired → Knowledge emerges
                ↑                                      ↓
                └──── Transfer, Symbiosis, Dissolution ─┘

Quick Start

Run the Demos

# Build
cargo build

# Run the proof-of-concept (120-tick simulation)
cargo run --bin phago-poc

# Run all tests
cargo test --workspace --exclude phago-python --exclude phago-web

# Build with distributed feature
cargo build -p phago --features distributed

# Run distributed benchmarks
cargo run --bin phago-bench -- quick

# Open the interactive visualization (generated by POC)
open output/phago-colony.html

Use as a Library

Add to your Cargo.toml:

[dependencies]
phago = { git = "https://github.com/Clemens865/Phago_Project.git" }

# With distributed support
phago = { git = "https://github.com/Clemens865/Phago_Project.git", features = ["distributed"] }

Basic usage with the prelude:

use phago::prelude::*;

fn main() {
    let mut colony = Colony::new();

    // Ingest documents
    colony.ingest_document("doc1", "Cell membrane transport proteins", Position::new(0.0, 0.0));
    colony.ingest_document("doc2", "Protein folding and membrane insertion", Position::new(1.0, 0.0));

    // Spawn digesters and run
    colony.spawn(Box::new(Digester::new(Position::new(0.0, 0.0)).with_max_idle(30)));
    colony.run(30);

    // Query with hybrid scoring
    let results = hybrid_query(&colony, "membrane protein", &HybridConfig {
        alpha: 0.5, max_results: 5, candidate_multiplier: 3,
    });

    for r in results {
        println!("{} (score: {:.3})", r.label, r.final_score);
    }
}

See docs/INTEGRATION_GUIDE.md for complete examples and API reference.

Production Features

  • Single import: use phago::prelude::* gives you everything
  • Structured errors: Result<T, PhagoError> with typed error categories
  • Deterministic testing: Digester::with_seed(pos, seed) for reproducible simulations
  • Session persistence: Save/restore colony state across sessions (JSON or SQLite)
  • SQLite persistence: ColonyBuilder with auto-save for production deployments
  • Async runtime: AsyncColony with TickTimer for real-time visualization
  • MCP adapter: Ready for external LLM/agent integration
  • Semantic embeddings: Vector-based concept extraction (optional semantic feature)
  • Distributed colony: Multi-node sharding with consistent hashing (optional distributed feature)
  • Vector DB integration: Qdrant, Pinecone, Weaviate adapters
  • Streaming ingestion: Async channels with backpressure and file watching
  • Web dashboard: Axum + D3.js real-time colony visualization
  • Python bindings: PyO3 with LangChain and LlamaIndex adapters
  • Louvain communities: Perfect topic clustering (NMI = 1.0)

SQLite Persistence (Phase 10)

Enable durable storage with automatic save/load:

[dependencies]
phago-runtime = { version = "1.0", features = ["sqlite"] }
use phago_runtime::prelude::*;

// Create colony with persistent storage
let mut colony = ColonyBuilder::new()
    .with_persistence("knowledge.db")  // SQLite file
    .auto_save(true)                   // Save on drop
    .build()?;

// Use normally — persistence is automatic
colony.ingest_document("title", "content", Position::new(0.0, 0.0));
colony.run(100);
colony.save()?;  // Explicit save (also happens on drop)

// Later: reload with full state preserved
let colony2 = ColonyBuilder::new()
    .with_persistence("knowledge.db")
    .build()?;

Async Runtime (Phase 10)

Enable controlled-rate simulation for visualization:

[dependencies]
phago-runtime = { version = "1.0", features = ["async"] }
use phago_runtime::prelude::*;
use phago_runtime::async_runtime::{run_in_local, TickTimer};

#[tokio::main]
async fn main() {
    let colony = Colony::new();

    // Fast async simulation
    run_in_local(colony, |ac| async move {
        ac.run_async(100).await
    }).await;

    // Or controlled tick rate for visualization
    let colony2 = Colony::new();
    run_in_local(colony2, |ac| async move {
        let mut timer = TickTimer::new(100);  // 100ms per tick
        timer.run_timed(&ac, 50).await;
    }).await;
}

Semantic Embeddings (Phase 9)

Enable vector embeddings for semantic understanding:

[dependencies]
phago = { version = "1.0", features = ["semantic"] }
use phago::prelude::*;
use std::sync::Arc;

// Create an embedder (SimpleEmbedder or API-backed)
let embedder: Arc<dyn Embedder> = Arc::new(SimpleEmbedder::new(256));

// SemanticDigester uses embeddings for concept extraction
let mut digester = SemanticDigester::new(Position::new(0.0, 0.0), embedder.clone());
let concepts = digester.digest_text("The mitochondria is the powerhouse of the cell.".into());

// Find semantically similar concepts
let similar = digester.find_similar("cellular energy", 5);

The semantic feature adds:

  • SimpleEmbedder — Hash-based embeddings (no dependencies)
  • SemanticDigester — Embedding-backed agent for semantic concept extraction
  • Chunker — Document chunking with configurable overlap
  • Similarity functions — cosine_similarity, euclidean_distance, normalize_l2

LLM Integration (Phase 9.2)

Enable LLM-backed concept extraction:

[dependencies]
# Local LLM (Ollama)
phago = { version = "1.0", features = ["llm-local"] }

# Cloud APIs (Claude, OpenAI)
phago = { version = "1.0", features = ["llm-api"] }

# All backends
phago = { version = "1.0", features = ["llm-full"] }
use phago::prelude::*;

// Local Ollama backend (no API key needed)
let ollama = OllamaBackend::localhost().with_model("llama3.2");
let concepts = ollama.extract_concepts("Cell membrane transport").await?;

// Claude backend
let claude = ClaudeBackend::new("sk-ant-...").sonnet();
let concepts = claude.extract_concepts("Cell membrane transport").await?;

// OpenAI backend
let openai = OpenAiBackend::new("sk-...").gpt4o_mini();
let concepts = openai.extract_concepts("Cell membrane transport").await?;

The llm features add:

  • OllamaBackend — Local LLM via Ollama (no API key needed)
  • ClaudeBackend — Anthropic Claude API
  • OpenAiBackend — OpenAI GPT API
  • LlmBackend trait — Common interface for all backends
  • Concept extraction — Extract structured concepts from text
  • Relationship identification — Find relationships between concepts
  • Query expansion — Expand queries for better recall

The Ten Biological Primitives

Primitive Biological Analog What It Does
DIGEST Phagocytosis Consume input, extract fragments, present to graph
APOPTOSE Programmed cell death Self-assess health, gracefully self-terminate
SENSE Chemotaxis Detect signals, follow gradients
TRANSFER Horizontal gene transfer Export/import vocabulary between agents
EMERGE Quorum sensing Detect threshold, activate collective behavior
WIRE Hebbian learning Strengthen used connections, prune unused
SYMBIOSE Endosymbiosis Integrate another agent as permanent symbiont
STIGMERGE Stigmergy Coordinate through environmental traces
NEGATE Negative selection Learn self-model, detect anomalies by exclusion
DISSOLVE Holobiont boundary Modulate agent-substrate boundaries

Agent Types

  • Digester — Consumes documents, extracts keywords, presents concepts to the knowledge graph. Implements DIGEST + SENSE + APOPTOSE + TRANSFER + SYMBIOSE + DISSOLVE.
  • Synthesizer — Dormant until quorum reached, then identifies bridge concepts and topic clusters. Implements EMERGE + SENSE + APOPTOSE.
  • Sentinel — Learns what "normal" looks like, flags anomalies by deviation from self-model. Implements NEGATE + SENSE + APOPTOSE.

Research Branches

Four falsifiable hypotheses, each with a working prototype, benchmark, visualization, and papers.

1. Bio-RAG — Self-Reinforcing Retrieval

Hebbian-reinforced knowledge graph retrieval with hybrid scoring (TF-IDF + graph re-ranking).

cargo run --bin phago-bio-rag-demo
Metric Graph-only TF-IDF Hybrid
P@5 0.280 0.742 0.742
MRR 0.650 0.775 0.800
NDCG@10 0.357 0.404 0.410

Key insight: The graph's value is not in replacing TF-IDF but in re-ranking candidates using structural context. Hybrid scoring beats pure TF-IDF on MRR (first relevant result ranked higher).

2. Agent Evolution — Evolutionary Agents Through Apoptosis

Agents evolving through intrinsic selection pressure (death + mutation + inheritance) produce richer knowledge graphs.

cargo run --bin phago-agent-evolution-demo
Metric (tick 300) Evolved Static Random
Nodes 1,582 864 1,191
Edges 101,824 8,769 38,399
Clustering coeff. 0.969 0.948 0.970
Spawns / Generations 140 / 135 0 / 0 144 / 144

3. KG Training — Knowledge Graph to Training Data

Hebbian-weighted triples with Louvain community detection and curriculum ordering for LLM fine-tuning.

cargo run --bin phago-kg-training-demo
Metric Before (Label Prop) After (Louvain)
Communities 1 mega + 547 singletons Correct structure
NMI vs ground truth 0.170 1.000 (perfect)
Modularity N/A 0.609-0.816
Triples exported 252,641 252,641
Foundation coherence 100% 100%

4. Agentic Memory — Persistent Code Knowledge

Self-organizing code knowledge graph that persists across sessions.

cargo run --bin phago-agentic-memory-demo
Metric Value
Code elements extracted 830
Graph nodes / edges 659 / 33,490
Session persistence 100% fidelity
Graph P@5 0.140

New Features (Ralph Loop Phase 1)

Hebbian LTP Model (Tentative Edge Wiring)

  • First co-occurrence creates edge at 0.1 weight (tentative)
  • Subsequent co-occurrences reinforce: weight += 0.1
  • Single-document edges decay quickly under synaptic pruning
  • Cross-document reinforced edges survive

Multi-Objective Fitness

4-dimensional evolution:

  • 30% Productivity — concepts + edges per tick
  • 30% Novelty — novel concepts / total concepts
  • 20% Quality — strong edges (co_act ≥ 2) / total edges
  • 20% Connectivity — bridge edges / total edges

Structural Queries

// Path queries — "What connects A to B?"
graph.shortest_path(&from, &to) -> Option<(Vec<NodeId>, f64)>

// Centrality queries — "What's most important?"
graph.betweenness_centrality(100) -> Vec<(NodeId, f64)>

// Bridge queries — "What concepts connect domains?"
graph.bridge_nodes(10) -> Vec<(NodeId, f64)>

// Component queries — "How many disconnected regions?"
graph.connected_components() -> usize

Distributed Colony (v1.0.0)

Scale horizontally across multiple nodes:

# Start coordinator
cargo run --bin phago -- cluster start-coordinator --port 9000

# Start shards (in separate terminals)
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9001
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9002

# Check cluster status
cargo run --bin phago -- cluster status --coordinator 127.0.0.1:9000

# Or use Docker Compose
cd deploy && docker-compose up

Architecture:

  • Consistent hash ring with 150 virtual nodes per shard for even distribution
  • Ghost nodes for lazy-resolved cross-shard edge references
  • Phase-synchronized ticks (Sense/Act/Decay/Advance) via barrier coordination
  • Two-phase distributed TF-IDF with scatter-gather for globally accurate scoring
  • tarpc RPC with connection pooling for inter-shard communication

MCP Integration

External LLMs/agents can interact via typed request/response API:

  • phago_remember(title, content, ticks) — ingest document
  • phago_recall(query, max_results, alpha) — hybrid query
  • phago_explore(type: path|centrality|bridges|stats) — structural queries

Architecture

crates/
├── phago/              # Unified facade crate (use this!)
├── phago-cli/          # CLI (ingest, query, stats, session, cluster)
├── phago-core/         # Traits (10 primitives) + shared types + Louvain
├── phago-runtime/      # Colony, substrate, topology, sessions, SQLite, async, streaming
├── phago-agents/       # Digester, Sentinel, Synthesizer, SemanticDigester, genome
├── phago-embeddings/   # Vector embeddings (Simple, ONNX, API providers)
├── phago-llm/          # LLM integration (Ollama, Claude, OpenAI)
├── phago-rag/          # Query engine, hybrid scoring, MCP adapter
├── phago-viz/          # Self-contained HTML visualization (D3.js)
├── phago-web/          # Axum web dashboard + WebSocket
├── phago-python/       # PyO3 bindings (LangChain, LlamaIndex)
├── phago-vectors/      # Vector DB adapters (Qdrant, Pinecone, Weaviate)
├── phago-distributed/  # Multi-node sharding, tarpc RPC, consistent hashing
└── phago-wasm/         # WASM integration (future)
poc/
├── knowledge-ecosystem/   # Full system demo (120-tick simulation)
├── bio-rag-demo/          # Hybrid retrieval benchmark
├── agent-evolution-demo/  # Evolutionary agents experiment
├── kg-training-demo/      # Curriculum ordering with Louvain
├── agentic-memory-demo/   # Persistent code knowledge
└── data/corpus/           # 100-doc test corpus (4 topics × 25 docs)
deploy/
└── docker-compose.yml     # Distributed cluster deployment
docs/
├── ABOUT_PHAGO.md         # Comprehensive project paper
├── papers/                # Research branch whitepapers
└── ...                    # Integration guide, executive summary, etc.

Colony Lifecycle (per tick)

  1. Sense — All agents observe substrate (signals, documents, traces)
  2. Act — Colony processes agent actions (move, digest, present, wire)
  3. Transfer — Agents export/integrate vocabulary, attempt symbiosis
  4. Dissolve — Mature agents modulate boundaries, reinforce graph nodes
  5. Death — Remove agents that self-assessed for termination
  6. Decay — Signals, traces, and edge weights decay; weak edges pruned

Key Design Choices

  • Rust ownership = biological resource management. move semantics model consumption (you can't eat something twice). Drop models apoptosis. No garbage collector = deterministic death.
  • The graph IS the memory. No separate storage layer. The topology of the knowledge graph, shaped by Hebbian learning, encodes all accumulated knowledge.
  • No LLMs in the loop. The v0.1 primitives must prove emergence without external intelligence. The framework is designed for LLM-backed agents in future versions.

Quantitative Proof (Phase 5)

Running cargo run --bin phago-poc produces metrics proving the model works:

Metric What It Proves
Transfer Effect Vocabulary sharing across agents (shared terms ratio, export/integration counts)
Dissolution Effect Boundary modulation reinforces knowledge (concept vs non-concept access ratio)
Graph Richness Colony builds meaningful structure (density, clustering coefficient, bridge concepts)
Vocabulary Spread Knowledge propagates across agents (Gini coefficient of vocabulary sizes)

The POC also generates output/phago-colony.html — an interactive D3.js visualization with:

  • Force-directed knowledge graph
  • Agent spatial canvas
  • Event timeline
  • Metrics dashboard with tick slider

Implementation Status

Phase Version Status Description
0-4 — Core Framework 0.1.0 ✅ Done 10 primitives, 3 agent types, colony lifecycle
5-6 — Research 0.2.0 ✅ Done 4 branches with prototypes, benchmarks, papers
7-8 — Production 0.2.0 ✅ Done Facade crate, CLI, preludes, error types
9 — Semantic Intelligence 0.3.0 ✅ Done Embeddings, LLM backends, semantic wiring
10 — Persistence & Scale 0.3.0 ✅ Done SQLite, async runtime, agent serialization
Config File Support 0.3.0 ✅ Done phago.toml with ColonyBuilder integration
Web Dashboard 0.4.0 ✅ Done Axum + D3.js real-time colony visualization
Python Bindings 0.5.0 ✅ Done PyO3 with LangChain and LlamaIndex adapters
Louvain Communities 0.5.0 ✅ Done Perfect NMI = 1.0 on synthetic benchmarks
Streaming Ingestion 0.6.0 ✅ Done Async channels, backpressure, file watching
Vector DB Integration 0.7.0 ✅ Done Qdrant, Pinecone, Weaviate adapters
Distributed Colony 1.0.0 ✅ Done Sharding, tarpc RPC, consistent hashing, ghost nodes

Tests

# All tests (excludes phago-python which requires maturin)
cargo test --workspace --exclude phago-python --exclude phago-web

# Distributed crate tests (146 unit + 9 integration)
cargo test -p phago-distributed

# By category
cargo test --test transfer_tests       # Vocabulary export/import
cargo test --test symbiosis_tests      # Agent absorption
cargo test --test dissolution_tests    # Boundary modulation
cargo test --test phase4_integration   # Full colony integration
cargo test -p phago-runtime metrics    # Quantitative metrics

# Distributed benchmarks
cargo run --bin phago-bench -- quick

Benchmark Results

Category Metric Result
Throughput Ticks/sec (small colony) 733
SQLite Save/load time <1ms
Async Overhead vs sync <5%
Serialization 200 agents 8µs
Semantic wiring Overhead ~11%

Documentation

Research Papers

Branch White Paper Explainer
Bio-RAG bio-rag-whitepaper.md bio-rag-explainer.md
Agent Evolution agent-evolution-whitepaper.md agent-evolution-explainer.md
KG Training kg-training-whitepaper.md kg-training-explainer.md
Agentic Memory agentic-memory-whitepaper.md agentic-memory-explainer.md

License

MIT