coreason-runtime 0.1.0

Kinetic Plane execution engine for the CoReason Tripartite Cybernetic Manifold
Documentation
# 🚀 Deployment & Infrastructure: The Enterprise Manifold

The `coreason-runtime` is not designed to be run as a fragile local script in production. It is engineered as a mathematically isolated, air-gapped **Deployment Manifold**.

This guide explains how to deploy the entire cybernetic mesh (Runtime + Orchestrator + Tensor Engine) onto a bare-metal Linux host or a VM Hypervisor (like Proxmox) utilizing hardware GPU passthrough.

---

## 1. Prerequisites

Before booting the mesh, your host machine must meet the SOTA 2026 infrastructure standards:

* **OS:** Linux (Ubuntu 24.04 LTS or Debian 12 recommended).
* **Orchestration:** Docker Engine (v24+) and Docker Compose (v2+).
* **Hardware:** A physical NVIDIA GPU (e.g., RTX 3090/4090, A100, H100).
* **Drivers:** [NVIDIA Container Toolkit]https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html installed and configured for the **Container Device Interface (CDI)**.

---

## 2. Host Preparation: The NVIDIA CDI

Because the Cognitive Topology requires sub-millisecond Time-To-First-Token (TTFT) for its active inference loop, we cannot use CPU-bound models or heavy virtualization abstraction layers. We pass the physical PCIe lanes directly to the `sglang` container.

Ensure your Docker daemon is configured to use the NVIDIA runtime. Generate the CDI specification on your host:

```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

*Note: This allows our `compose.yaml` to request `capabilities: [gpu]` natively without deprecated `--gpus all` flags.*

---

## 3. Environment Projection

Clone the repository to your host machine and project the environment topology.

We recommend running the auto-detection script to automatically inspect host capabilities (such as NVIDIA GPU) and generate/configure the `.env` file:

```bash
# On Linux/macOS
chmod +x ./scripts/detect_capabilities.sh
./scripts/detect_capabilities.sh
```

Or copy the example environment file manually:

```bash
cp .env.example .env
```

Open `.env` and configure the necessary variables.
**Crucially, you must provide an `HF_TOKEN`** (HuggingFace User Access Token) for SGLang, and specify the GPU-specific profile and extras if you are manually configuring the environment without the auto-detection script:

```env
# .env
SGLANG_URL=http://sglang:30000
LANCEDB_URI=/app/data/lancedb
ECOSYSTEM_REGISTRY_URL=http://ecosystem:8080
TELEMETRY_BROKER_URL=http://localhost:8000
TEMPORAL_HOST=temporal:7233
HF_TOKEN=hf_your_secure_token_here

# Hardware configuration (automatically configured by scripts)
COMPOSE_PROFILES=gpu
EXTRAS=inference
```

---

## 4. Booting the Mesh

With the environment projected and the GPU primed, you can boot the entire organism. We execute this via the root `compose.yaml`:

```bash
docker compose up -d --build
```


### The Topography of the Mesh
Once booted, the manifold isolates the components into three microservices:

1. **`runtime` (Port `8000`):** The FastAPI edge and Temporal Worker. This is the *only* service that exposes an API to the public network/frontend IDE. It runs securely as a non-root user (`UID 10000`).
2. **`sglang` (Port `30000`):** The Cognitive Engine. It monopolizes the GPU and binds only to the internal Docker bridge network. It is mathematically invisible to external port scanners.
3. **`temporal` (Port `7233` / Web UI `8233`):** The Orchestration Substrate. Handles the heavy gRPC task queue routing and durable state serialization.

---

## 5. Epistemic Persistence & Volume Mounts

If a container crashes, the Linux OOM-killer terminates a process, or you upgrade the daemon, the Cognitive Topology **must not experience amnesia**.

The `compose.yaml` maps physical host directories into the containers via strict bind mounts. Because the runtime executes as the unprivileged `coreason` user (for WASM sandbox security), you must ensure the host directories have the correct permissions:

```bash
# Create the physical data directories on the host
mkdir -p data/lancedb data/bronze data/silver data/gold

# Grant read/write access to the internal container user (UID 10000)
sudo chown -R 10000:10000 data/
```

* **`data/lancedb/`**: The continuous Epistemic Vector Ledger.
* **`data/bronze/`, `silver/`, `gold/`**: The Medallion ETL intelligence matrices.

---

## 6. Observability & Analytics

Once the mesh is running, you have two primary observability vectors:

### A. Real-Time Topology Topography (Temporal)
To watch the deterministic AST execution, visualize retries, or manually intervene in a stalled workflow, navigate to the Temporal Web UI:
👉 **`http://<your-host-ip>:8233`**

### B. Offline Epistemic Auditing (The Medallion Pipeline)
The daemon automatically funnels Server-Sent Events (SSE) telemetry into local `.parquet` files. To audit token economics, epistemic drift rates, or node latency, simply point a Jupyter Notebook or a local Polars script at your host's `data/gold/` directory:

```python
import polars as pl

# Audit total token spend and node activation
df = pl.read_parquet("./data/gold/metrics.parquet")
print(df)
```

---

## 7. Edge & Mesh Topologies (Raspberry Pi & Server-Client Hybrid)

The `coreason-runtime` can be projected onto resource-constrained environments (like a Raspberry Pi) and orchestrated in hybrid or decentralized topologies.

### A. Single Edge Device / Raspberry Pi Specs
* **Minimum Specifications**:
  * **Device**: Raspberry Pi 4 or 5 (ARM64 architecture).
  * **RAM**: **8 GB RAM minimum** (required to load the Python daemon, LanceDB, and execute serialization).
  * **Storage**: High-end microSD (Class 10, UHS-I U3) or external SSD.
  * **OS**: 64-bit Linux (ARM64 mandatory).
* **CPU-Only Optimization**:
  * Disable local GPU-bound services (like `sglang` containers).
  * Route model inference to remote endpoints or Tier 2 cloud providers by updating capability profiles in `.env`.
* **Lightweight Sandbox**:
  * Local capability execution uses the minimalist [pi_bridge.py]file:///C:/files/git/github/coreason-ai/coreason-runtime/src/coreason_runtime/execution_plane/gateway_bridge/pi_bridge.py client (backed by `pi-agent-core` or `@mariozechner/pi-coding-agent` via `npx`).
  * Basic primitives (`read_file`, `write_file`, `edit_file`, and `bash`) are exposed while isolating the edge device from memory exhaustion.

### B. Server-Client Hybrid (Control & Extension)
A resource-constrained edge device (client) can control and extend its execution into a highly capable server (server) using the **Tiered Routing Circuit** and **Model Context Protocol (MCP)** bridges.

* **Task Queue Partitioning**:
  * A shared Temporal cluster manages the workflow state.
  * The small edge device runs a worker subscribing to a local `edge-task-queue` (executing physical sensor I/O or edge actuators).
  * The large server runs a worker subscribing to a `heavy-compute-queue` (executing model inference, PyTorch training, or heavy vector database indexing).
  * Workflows orchestrate execution across both queues seamlessly.
* **Master MCP Gateway**:
  * The edge device queries a remote `coreason-master-gateway` via the [master_mcp.py]file:///C:/files/git/github/coreason-ai/coreason-runtime/src/coreason_runtime/execution_plane/gateway_bridge/master_mcp.py bridge over secure mTLS.
  * Deep prompts, tools, and vector lookups are delegated to the server while leaving logical control on the edge.

### C. Mesh Topology (Decentralized Federation)
Decentralized groups of edge nodes coordinate without a single point of failure using consensus orchestration:
* **Consensus Workflows**: The [consensus_federation_execution_workflow.py]file:///C:/files/git/github/coreason-ai/coreason-runtime/src/coreason_runtime/orchestration/workflows/consensus_federation_execution_workflow.py unrolls voting policies into child workflows.
* **Cryptographic Verification**: Untrusted nodes provide Proof-of-Valid-Inference (PoVI) receipts using zk-ML frameworks (e.g., EZKL) over which Byzantine Fault Tolerance is computed.

---

## 8. Viral Agent Migration & Body Shedding

The cybernetic manifold supports **Viral Migration**, enabling an agent to dynamically package its entire state (memory, LanceDB capability tables, workflow context, database specifications) and transmit it to another machine—effectively growing, shedding its body, or migrating to a new hardware host.

This is executed deterministically by the `HollowPlaneBridgeWorkflow`:

### A. Migration Primitives
State packaging and rehydration are handled via two core activities registered on the worker:
1. **Serialization (`serialize_agent_state_activity`)**:
   * **Database Fetching**: Extracts the agent's graph representation from Neo4j (nodes and relationships).
   * **LanceDB Compression**: Converts local LanceDB capability tables to a PyArrow table and compresses them into a base64-encoded Arrow IPC stream.
   * **Canonicalization**: Packages the data into a JSON Canonicalization Scheme (JCS) payload.
   * **Cryptographic Signature**: Signs the JCS payload using an Ed25519 private key to ensure absolute authenticity during transit.
   * **Blob Persistence**: Writes the finalized state package to the `UniversalBlobStore` using OpenDAL.
2. **Rehydration (`rehydrate_agent_state_activity`)**:
   * **Blob Retrieval**: Reads the migration package from the OpenDAL blob store.
   * **Signature Verification**: Validates the payload signature using the source agent's Ed25519 public key.
   * **Database Injection**: Reconstructs the Neo4j graph nodes and relationships on the destination host.
   * **LanceDB Rehydration**: Decompresses the base64-encoded Arrow IPC stream and overwrites the local capability table.

### B. Execution Flow
The migration sequence is orchestrated as follows:

```mermaid
sequenceDiagram
    participant S as Source Node (Worker 1)
    participant B as OpenDAL Blob Store
    participant T as Temporal Orchestrator
    participant D as Destination Node (Worker 2)

    T->>S: Execute serialize_agent_state_activity
    S->>S: Package Neo4j Graph & LanceDB Table
    S->>S: Sign payload with Ed25519
    S->>B: Persist state package via OpenDAL
    S-->>T: Return snapshot key and signature
    T->>D: Execute rehydrate_agent_state_activity (via target task queue)
    B->>D: Fetch state package
    D->>D: Verify Ed25519 signature
    D->>D: Rehydrate Neo4j Graph & LanceDB Table
    D-->>T: Return success (Body Shedding complete)
```