phago 1.0.0 - Docs.rs

# Phago — Biological Computing Primitives

**Version 1.0.0 | Production-Ready**

A Rust framework that maps cellular biology mechanisms to computational operations. Agents self-organize, consume documents, build a Hebbian knowledge graph, share vocabulary, detect anomalies, and exhibit emergent collective behavior — all without top-down orchestration. Now with distributed multi-node sharding for horizontal scaling.

## Key Results (v1.0.0)

| Metric | Value | Notes |
|--------|-------|-------|
| Tests passing | **155+** | 100% pass rate across 14 crates |
| Graph edge reduction | **98.3%** | 256k to 4.5k via Hebbian LTP |
| Hybrid MRR | **0.800** | Beats TF-IDF (0.775) on first-result ranking |
| Hybrid P@5 | **0.742** | Matches TF-IDF precision |
| Evolved vs static edges | **11.6x** | Self-healing through agent evolution |
| Community detection NMI | **1.000** | Perfect topic recovery (Louvain) |
| Session persistence | **100%** | Full temporal state fidelity |
| Distributed shards | **3+** | Consistent hashing, ghost nodes, cross-shard queries |

## What It Does

Feed the colony documents. Agents digest them into concepts, wire a knowledge graph through co-activation (Hebbian learning), share vocabulary across agent boundaries (horizontal gene transfer), and detect anomalies (negative selection). The graph structure IS the memory — frequently used connections strengthen, unused ones decay.

```
Documents → Agents digest → Concepts extracted → Graph wired → Knowledge emerges
                ↑                                      ↓
                └──── Transfer, Symbiosis, Dissolution ─┘
```

## Quick Start

### Run the Demos

```bash
# Build
cargo build

# Run the proof-of-concept (120-tick simulation)
cargo run --bin phago-poc

# Run all tests
cargo test --workspace --exclude phago-python --exclude phago-web

# Build with distributed feature
cargo build -p phago --features distributed

# Run distributed benchmarks
cargo run --bin phago-bench -- quick

# Open the interactive visualization (generated by POC)
open output/phago-colony.html
```

### Use as a Library

Add to your `Cargo.toml`:

```toml
[dependencies]
phago = { git = "https://github.com/Clemens865/Phago_Project.git" }

# With distributed support
phago = { git = "https://github.com/Clemens865/Phago_Project.git", features = ["distributed"] }
```

Basic usage with the prelude:

```rust
use phago::prelude::*;

fn main() {
    let mut colony = Colony::new();

    // Ingest documents
    colony.ingest_document("doc1", "Cell membrane transport proteins", Position::new(0.0, 0.0));
    colony.ingest_document("doc2", "Protein folding and membrane insertion", Position::new(1.0, 0.0));

    // Spawn digesters and run
    colony.spawn(Box::new(Digester::new(Position::new(0.0, 0.0)).with_max_idle(30)));
    colony.run(30);

    // Query with hybrid scoring
    let results = hybrid_query(&colony, "membrane protein", &HybridConfig {
        alpha: 0.5, max_results: 5, candidate_multiplier: 3,
    });

    for r in results {
        println!("{} (score: {:.3})", r.label, r.final_score);
    }
}
```

See [`docs/INTEGRATION_GUIDE.md`](docs/INTEGRATION_GUIDE.md) for complete examples and API reference.

### Production Features

- **Single import**: `use phago::prelude::*` gives you everything
- **Structured errors**: `Result<T, PhagoError>` with typed error categories
- **Deterministic testing**: `Digester::with_seed(pos, seed)` for reproducible simulations
- **Session persistence**: Save/restore colony state across sessions (JSON or SQLite)
- **SQLite persistence**: `ColonyBuilder` with auto-save for production deployments
- **Async runtime**: `AsyncColony` with `TickTimer` for real-time visualization
- **MCP adapter**: Ready for external LLM/agent integration
- **Semantic embeddings**: Vector-based concept extraction (optional `semantic` feature)
- **Distributed colony**: Multi-node sharding with consistent hashing (optional `distributed` feature)
- **Vector DB integration**: Qdrant, Pinecone, Weaviate adapters
- **Streaming ingestion**: Async channels with backpressure and file watching
- **Web dashboard**: Axum + D3.js real-time colony visualization
- **Python bindings**: PyO3 with LangChain and LlamaIndex adapters
- **Louvain communities**: Perfect topic clustering (NMI = 1.0)

### SQLite Persistence (Phase 10)

Enable durable storage with automatic save/load:

```toml
[dependencies]
phago-runtime = { version = "1.0", features = ["sqlite"] }
```

```rust
use phago_runtime::prelude::*;

// Create colony with persistent storage
let mut colony = ColonyBuilder::new()
    .with_persistence("knowledge.db")  // SQLite file
    .auto_save(true)                   // Save on drop
    .build()?;

// Use normally — persistence is automatic
colony.ingest_document("title", "content", Position::new(0.0, 0.0));
colony.run(100);
colony.save()?;  // Explicit save (also happens on drop)

// Later: reload with full state preserved
let colony2 = ColonyBuilder::new()
    .with_persistence("knowledge.db")
    .build()?;
```

### Async Runtime (Phase 10)

Enable controlled-rate simulation for visualization:

```toml
[dependencies]
phago-runtime = { version = "1.0", features = ["async"] }
```

```rust
use phago_runtime::prelude::*;
use phago_runtime::async_runtime::{run_in_local, TickTimer};

#[tokio::main]
async fn main() {
    let colony = Colony::new();

    // Fast async simulation
    run_in_local(colony, |ac| async move {
        ac.run_async(100).await
    }).await;

    // Or controlled tick rate for visualization
    let colony2 = Colony::new();
    run_in_local(colony2, |ac| async move {
        let mut timer = TickTimer::new(100);  // 100ms per tick
        timer.run_timed(&ac, 50).await;
    }).await;
}
```

### Semantic Embeddings (Phase 9)

Enable vector embeddings for semantic understanding:

```toml
[dependencies]
phago = { version = "1.0", features = ["semantic"] }
```

```rust
use phago::prelude::*;
use std::sync::Arc;

// Create an embedder (SimpleEmbedder or API-backed)
let embedder: Arc<dyn Embedder> = Arc::new(SimpleEmbedder::new(256));

// SemanticDigester uses embeddings for concept extraction
let mut digester = SemanticDigester::new(Position::new(0.0, 0.0), embedder.clone());
let concepts = digester.digest_text("The mitochondria is the powerhouse of the cell.".into());

// Find semantically similar concepts
let similar = digester.find_similar("cellular energy", 5);
```

The `semantic` feature adds:
- **SimpleEmbedder** — Hash-based embeddings (no dependencies)
- **SemanticDigester** — Embedding-backed agent for semantic concept extraction
- **Chunker** — Document chunking with configurable overlap
- **Similarity functions** — cosine_similarity, euclidean_distance, normalize_l2

### LLM Integration (Phase 9.2)

Enable LLM-backed concept extraction:

```toml
[dependencies]
# Local LLM (Ollama)
phago = { version = "1.0", features = ["llm-local"] }

# Cloud APIs (Claude, OpenAI)
phago = { version = "1.0", features = ["llm-api"] }

# All backends
phago = { version = "1.0", features = ["llm-full"] }
```

```rust,ignore
use phago::prelude::*;

// Local Ollama backend (no API key needed)
let ollama = OllamaBackend::localhost().with_model("llama3.2");
let concepts = ollama.extract_concepts("Cell membrane transport").await?;

// Claude backend
let claude = ClaudeBackend::new("sk-ant-...").sonnet();
let concepts = claude.extract_concepts("Cell membrane transport").await?;

// OpenAI backend
let openai = OpenAiBackend::new("sk-...").gpt4o_mini();
let concepts = openai.extract_concepts("Cell membrane transport").await?;
```

The `llm` features add:
- **OllamaBackend** — Local LLM via Ollama (no API key needed)
- **ClaudeBackend** — Anthropic Claude API
- **OpenAiBackend** — OpenAI GPT API
- **LlmBackend trait** — Common interface for all backends
- **Concept extraction** — Extract structured concepts from text
- **Relationship identification** — Find relationships between concepts
- **Query expansion** — Expand queries for better recall

## The Ten Biological Primitives

| Primitive | Biological Analog | What It Does |
|-----------|-------------------|-------------|
| **DIGEST** | Phagocytosis | Consume input, extract fragments, present to graph |
| **APOPTOSE** | Programmed cell death | Self-assess health, gracefully self-terminate |
| **SENSE** | Chemotaxis | Detect signals, follow gradients |
| **TRANSFER** | Horizontal gene transfer | Export/import vocabulary between agents |
| **EMERGE** | Quorum sensing | Detect threshold, activate collective behavior |
| **WIRE** | Hebbian learning | Strengthen used connections, prune unused |
| **SYMBIOSE** | Endosymbiosis | Integrate another agent as permanent symbiont |
| **STIGMERGE** | Stigmergy | Coordinate through environmental traces |
| **NEGATE** | Negative selection | Learn self-model, detect anomalies by exclusion |
| **DISSOLVE** | Holobiont boundary | Modulate agent-substrate boundaries |

## Agent Types

- **Digester** — Consumes documents, extracts keywords, presents concepts to the knowledge graph. Implements DIGEST + SENSE + APOPTOSE + TRANSFER + SYMBIOSE + DISSOLVE.
- **Synthesizer** — Dormant until quorum reached, then identifies bridge concepts and topic clusters. Implements EMERGE + SENSE + APOPTOSE.
- **Sentinel** — Learns what "normal" looks like, flags anomalies by deviation from self-model. Implements NEGATE + SENSE + APOPTOSE.

## Research Branches

Four falsifiable hypotheses, each with a working prototype, benchmark, visualization, and papers.

### 1. Bio-RAG — Self-Reinforcing Retrieval

Hebbian-reinforced knowledge graph retrieval with hybrid scoring (TF-IDF + graph re-ranking).

```bash
cargo run --bin phago-bio-rag-demo
```

| Metric | Graph-only | TF-IDF | **Hybrid** |
|--------|-----------|--------|------------|
| P@5 | 0.280 | 0.742 | **0.742** |
| MRR | 0.650 | 0.775 | **0.800** |
| NDCG@10 | 0.357 | 0.404 | **0.410** |

**Key insight:** The graph's value is not in replacing TF-IDF but in *re-ranking* candidates using structural context. Hybrid scoring beats pure TF-IDF on MRR (first relevant result ranked higher).

### 2. Agent Evolution — Evolutionary Agents Through Apoptosis

Agents evolving through intrinsic selection pressure (death + mutation + inheritance) produce richer knowledge graphs.

```bash
cargo run --bin phago-agent-evolution-demo
```

| Metric (tick 300) | Evolved | Static | Random |
|-------------------|---------|--------|--------|
| Nodes | 1,582 | 864 | 1,191 |
| Edges | 101,824 | 8,769 | 38,399 |
| Clustering coeff. | 0.969 | 0.948 | 0.970 |
| Spawns / Generations | 140 / 135 | 0 / 0 | 144 / 144 |

### 3. KG Training — Knowledge Graph to Training Data

Hebbian-weighted triples with Louvain community detection and curriculum ordering for LLM fine-tuning.

```bash
cargo run --bin phago-kg-training-demo
```

| Metric | Before (Label Prop) | After (Louvain) |
|--------|--------------------|--------------------|
| Communities | 1 mega + 547 singletons | Correct structure |
| NMI vs ground truth | 0.170 | **1.000** (perfect) |
| Modularity | N/A | 0.609-0.816 |
| Triples exported | 252,641 | 252,641 |
| Foundation coherence | 100% | 100% |

### 4. Agentic Memory — Persistent Code Knowledge

Self-organizing code knowledge graph that persists across sessions.

```bash
cargo run --bin phago-agentic-memory-demo
```

| Metric | Value |
|--------|-------|
| Code elements extracted | 830 |
| Graph nodes / edges | 659 / 33,490 |
| Session persistence | 100% fidelity |
| Graph P@5 | 0.140 |

## New Features (Ralph Loop Phase 1)

### Hebbian LTP Model (Tentative Edge Wiring)
- First co-occurrence creates edge at **0.1 weight** (tentative)
- Subsequent co-occurrences reinforce: `weight += 0.1`
- Single-document edges decay quickly under synaptic pruning
- Cross-document reinforced edges survive

### Multi-Objective Fitness
4-dimensional evolution:
- **30% Productivity** — concepts + edges per tick
- **30% Novelty** — novel concepts / total concepts
- **20% Quality** — strong edges (co_act ≥ 2) / total edges
- **20% Connectivity** — bridge edges / total edges

### Structural Queries
```rust
// Path queries — "What connects A to B?"
graph.shortest_path(&from, &to) -> Option<(Vec<NodeId>, f64)>

// Centrality queries — "What's most important?"
graph.betweenness_centrality(100) -> Vec<(NodeId, f64)>

// Bridge queries — "What concepts connect domains?"
graph.bridge_nodes(10) -> Vec<(NodeId, f64)>

// Component queries — "How many disconnected regions?"
graph.connected_components() -> usize
```

### Distributed Colony (v1.0.0)

Scale horizontally across multiple nodes:

```bash
# Start coordinator
cargo run --bin phago -- cluster start-coordinator --port 9000

# Start shards (in separate terminals)
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9001
cargo run --bin phago -- cluster start-shard --coordinator 127.0.0.1:9000 --port 9002

# Check cluster status
cargo run --bin phago -- cluster status --coordinator 127.0.0.1:9000

# Or use Docker Compose
cd deploy && docker-compose up
```

Architecture:
- **Consistent hash ring** with 150 virtual nodes per shard for even distribution
- **Ghost nodes** for lazy-resolved cross-shard edge references
- **Phase-synchronized ticks** (Sense/Act/Decay/Advance) via barrier coordination
- **Two-phase distributed TF-IDF** with scatter-gather for globally accurate scoring
- **tarpc RPC** with connection pooling for inter-shard communication

### MCP Integration
External LLMs/agents can interact via typed request/response API:
- `phago_remember(title, content, ticks)` — ingest document
- `phago_recall(query, max_results, alpha)` — hybrid query
- `phago_explore(type: path|centrality|bridges|stats)` — structural queries

## Architecture

```
crates/
├── phago/              # Unified facade crate (use this!)
├── phago-cli/          # CLI (ingest, query, stats, session, cluster)
├── phago-core/         # Traits (10 primitives) + shared types + Louvain
├── phago-runtime/      # Colony, substrate, topology, sessions, SQLite, async, streaming
├── phago-agents/       # Digester, Sentinel, Synthesizer, SemanticDigester, genome
├── phago-embeddings/   # Vector embeddings (Simple, ONNX, API providers)
├── phago-llm/          # LLM integration (Ollama, Claude, OpenAI)
├── phago-rag/          # Query engine, hybrid scoring, MCP adapter
├── phago-viz/          # Self-contained HTML visualization (D3.js)
├── phago-web/          # Axum web dashboard + WebSocket
├── phago-python/       # PyO3 bindings (LangChain, LlamaIndex)
├── phago-vectors/      # Vector DB adapters (Qdrant, Pinecone, Weaviate)
├── phago-distributed/  # Multi-node sharding, tarpc RPC, consistent hashing
└── phago-wasm/         # WASM integration (future)
poc/
├── knowledge-ecosystem/   # Full system demo (120-tick simulation)
├── bio-rag-demo/          # Hybrid retrieval benchmark
├── agent-evolution-demo/  # Evolutionary agents experiment
├── kg-training-demo/      # Curriculum ordering with Louvain
├── agentic-memory-demo/   # Persistent code knowledge
└── data/corpus/           # 100-doc test corpus (4 topics × 25 docs)
deploy/
└── docker-compose.yml     # Distributed cluster deployment
docs/
├── ABOUT_PHAGO.md         # Comprehensive project paper
├── papers/                # Research branch whitepapers
└── ...                    # Integration guide, executive summary, etc.
```

### Colony Lifecycle (per tick)

1. **Sense** — All agents observe substrate (signals, documents, traces)
2. **Act** — Colony processes agent actions (move, digest, present, wire)
3. **Transfer** — Agents export/integrate vocabulary, attempt symbiosis
4. **Dissolve** — Mature agents modulate boundaries, reinforce graph nodes
5. **Death** — Remove agents that self-assessed for termination
6. **Decay** — Signals, traces, and edge weights decay; weak edges pruned

### Key Design Choices

- **Rust ownership = biological resource management.** `move` semantics model consumption (you can't eat something twice). `Drop` models apoptosis. No garbage collector = deterministic death.
- **The graph IS the memory.** No separate storage layer. The topology of the knowledge graph, shaped by Hebbian learning, encodes all accumulated knowledge.
- **No LLMs in the loop.** The v0.1 primitives must prove emergence without external intelligence. The framework is designed for LLM-backed agents in future versions.

## Quantitative Proof (Phase 5)

Running `cargo run --bin phago-poc` produces metrics proving the model works:

| Metric | What It Proves |
|--------|---------------|
| **Transfer Effect** | Vocabulary sharing across agents (shared terms ratio, export/integration counts) |
| **Dissolution Effect** | Boundary modulation reinforces knowledge (concept vs non-concept access ratio) |
| **Graph Richness** | Colony builds meaningful structure (density, clustering coefficient, bridge concepts) |
| **Vocabulary Spread** | Knowledge propagates across agents (Gini coefficient of vocabulary sizes) |

The POC also generates `output/phago-colony.html` — an interactive D3.js visualization with:
- Force-directed knowledge graph
- Agent spatial canvas
- Event timeline
- Metrics dashboard with tick slider

## Implementation Status

| Phase | Version | Status | Description |
|-------|---------|--------|-------------|
| 0-4 — Core Framework | 0.1.0 | ✅ Done | 10 primitives, 3 agent types, colony lifecycle |
| 5-6 — Research | 0.2.0 | ✅ Done | 4 branches with prototypes, benchmarks, papers |
| 7-8 — Production | 0.2.0 | ✅ Done | Facade crate, CLI, preludes, error types |
| 9 — Semantic Intelligence | 0.3.0 | ✅ Done | Embeddings, LLM backends, semantic wiring |
| 10 — Persistence & Scale | 0.3.0 | ✅ Done | SQLite, async runtime, agent serialization |
| Config File Support | 0.3.0 | ✅ Done | phago.toml with ColonyBuilder integration |
| Web Dashboard | 0.4.0 | ✅ Done | Axum + D3.js real-time colony visualization |
| Python Bindings | 0.5.0 | ✅ Done | PyO3 with LangChain and LlamaIndex adapters |
| Louvain Communities | 0.5.0 | ✅ Done | Perfect NMI = 1.0 on synthetic benchmarks |
| Streaming Ingestion | 0.6.0 | ✅ Done | Async channels, backpressure, file watching |
| Vector DB Integration | 0.7.0 | ✅ Done | Qdrant, Pinecone, Weaviate adapters |
| **Distributed Colony** | **1.0.0** | ✅ Done | **Sharding, tarpc RPC, consistent hashing, ghost nodes** |

## Tests

```bash
# All tests (excludes phago-python which requires maturin)
cargo test --workspace --exclude phago-python --exclude phago-web

# Distributed crate tests (146 unit + 9 integration)
cargo test -p phago-distributed

# By category
cargo test --test transfer_tests       # Vocabulary export/import
cargo test --test symbiosis_tests      # Agent absorption
cargo test --test dissolution_tests    # Boundary modulation
cargo test --test phase4_integration   # Full colony integration
cargo test -p phago-runtime metrics    # Quantitative metrics

# Distributed benchmarks
cargo run --bin phago-bench -- quick
```

### Benchmark Results

| Category | Metric | Result |
|----------|--------|--------|
| **Throughput** | Ticks/sec (small colony) | 733 |
| **SQLite** | Save/load time | <1ms |
| **Async** | Overhead vs sync | <5% |
| **Serialization** | 200 agents | 8µs |
| **Semantic wiring** | Overhead | ~11% |

## Documentation

- [`docs/ABOUT_PHAGO.md`](docs/ABOUT_PHAGO.md) — **About Phago** — comprehensive project paper (v1.0.0)
- [`docs/INTEGRATION_GUIDE.md`](docs/INTEGRATION_GUIDE.md) — **How to use Phago** — installation, examples, API reference
- [`docs/papers/phago-whitepaper-v2.md`](docs/papers/phago-whitepaper-v2.md) — **Main whitepaper (v2.0)** — technical paper
- [`docs/EXECUTIVE_SUMMARY.md`](docs/EXECUTIVE_SUMMARY.md) — Latest results and roadmap
- [`docs/COMPETITIVE_ANALYSIS.md`](docs/COMPETITIVE_ANALYSIS.md) — Where Phago wins vs traditional approaches
- [`docs/USE_CASES.md`](docs/USE_CASES.md) — Practical applications
- [`docs/WHITEPAPER.md`](docs/WHITEPAPER.md) — Original theoretical foundation
- [`docs/NEXT_PRIORITIES.md`](docs/NEXT_PRIORITIES.md) — Development plan (all 7 priorities complete)

### Research Papers

| Branch | White Paper | Explainer |
|--------|-----------|-----------|
| Bio-RAG | [`bio-rag-whitepaper.md`](docs/papers/bio-rag-whitepaper.md) | [`bio-rag-explainer.md`](docs/papers/bio-rag-explainer.md) |
| Agent Evolution | [`agent-evolution-whitepaper.md`](docs/papers/agent-evolution-whitepaper.md) | [`agent-evolution-explainer.md`](docs/papers/agent-evolution-explainer.md) |
| KG Training | [`kg-training-whitepaper.md`](docs/papers/kg-training-whitepaper.md) | [`kg-training-explainer.md`](docs/papers/kg-training-explainer.md) |
| Agentic Memory | [`agentic-memory-whitepaper.md`](docs/papers/agentic-memory-whitepaper.md) | [`agentic-memory-explainer.md`](docs/papers/agentic-memory-explainer.md) |

## License

MIT