# Runtime Architecture
Understanding the Symbi runtime system architecture and core components.
---
## Overview
The Symbi runtime system provides a secure, scalable, and policy-aware execution environment for autonomous agents. Built on Rust for performance and safety, it implements a multi-tier security model with comprehensive audit capabilities.
### Core Principles
- **Security by Default**: Every operation is subject to policy enforcement
- **Zero Trust**: All components and communications are verified
- **Complete Auditability**: Every action is logged with cryptographic integrity
- **Policy-Driven**: Declarative policies control all system behavior
- **High Performance**: Rust-native implementation for production workloads
---
## System Architecture
```mermaid
graph TB
subgraph "Runtime Core"
ARS[Agent Runtime Scheduler]
ALC[Agent Lifecycle Controller]
ARM[Agent Resource Manager]
ACB[Agent Communication Bus]
AEH[Agent Error Handler]
end
subgraph "Reasoning Loop"
RL[ReasoningLoopRunner]
IP[Inference Provider]
PG[Policy Gate]
AE[Action Executor]
KBR[Knowledge Bridge]
end
subgraph "Context and Knowledge"
ACM[Agent Context Manager]
VDB[Vector Database]
RAG[RAG Engine]
KB[Knowledge Base]
end
subgraph "Security and Policy"
PE[Policy Engine]
AT[Audit Trail]
SO[Sandbox Orchestrator]
CRYPTO[Crypto Operations]
end
subgraph "External Integration"
MCP[MCP Client]
TV[Tool Verification]
API[HTTP API]
end
subgraph "Sandbox Tiers"
T1[Tier 1: Docker]
T2[Tier 2: gVisor]
end
ARS --> ACM
ARS --> PE
ARS --> RL
ALC --> SO
ACB --> CRYPTO
ACM --> VDB
ACM --> RAG
RL --> IP
RL --> PG
RL --> AE
KBR --> ACM
KBR --> RL
PG --> PE
SO --> T1
SO --> T2
MCP --> TV
PE --> AT
```
---
## Core Components
### Agent Runtime Scheduler
The central orchestrator responsible for managing agent execution with real task execution and graceful shutdown capabilities.
**Key Responsibilities:**
- **Task Scheduling**: Priority-based scheduling with resource awareness
- **Real Task Execution**: Actual process spawning and monitoring with comprehensive metrics
- **Load Balancing**: Distribution across available resources
- **Resource Allocation**: Memory, CPU, and I/O assignment
- **Policy Coordination**: Integration with policy enforcement
- **Graceful Shutdown**: Coordinated shutdown with resource cleanup
**Execution Modes:**
- **Ephemeral**: Run-once tasks that terminate after completion
- **Persistent**: Long-running agents with continuous monitoring
- **Scheduled**: Interval-based execution with automatic rescheduling
- **Event-Driven**: Triggered execution based on system events
**Performance Characteristics:**
- Support for 10,000+ logical agents in memory (concurrent sandbox executions depend on host resources and sandbox tier — Docker containers require more resources than in-process agents)
- Sub-millisecond scheduling decisions
- Priority-based preemption
- Resource-aware placement
- Real-time process monitoring and health checks
- Graceful shutdown with 30-second timeout before force termination
**Real Task Execution Features:**
- Process spawning with secure execution environments
- Resource monitoring (memory, CPU usage) every 5 seconds
- Task timeout enforcement with configurable limits
- Comprehensive execution metrics and statistics
- Process health monitoring and automatic failure detection
**Graceful Shutdown Process:**
1. **Stop New Tasks**: Prevent new agent scheduling
2. **Graceful Termination**: Attempt graceful shutdown of running agents (30s timeout)
3. **Force Termination**: Force-kill remaining processes if needed
4. **Metrics Flush**: Save performance and usage statistics
5. **Resource Cleanup**: Release allocated resources and cleanup state
6. **Queue Cleanup**: Clear pending agent queue
```rust
pub struct AgentScheduler {
priority_queues: Vec<PriorityQueue<AgentTask>>,
resource_pool: ResourcePool,
policy_engine: Arc<PolicyEngine>,
load_balancer: LoadBalancer,
task_manager: Arc<TaskManager>,
}
impl AgentScheduler {
pub async fn schedule_agent(&self, config: AgentConfig) -> Result<AgentId>;
pub async fn shutdown_agent(&self, agent_id: AgentId) -> Result<()>;
pub async fn get_system_status(&self) -> SystemStatus;
pub async fn shutdown(&self) -> Result<()>;
}
```
### Agent Lifecycle Controller
Manages the complete lifecycle of agents from initialization to termination.
**Lifecycle States:**
1. **Initializing**: Parsing DSL and validating configuration
2. **Ready**: Waiting for task assignment
3. **Running**: Actively executing tasks
4. **Suspended**: Paused due to policy violation or resource constraints
5. **Terminated**: Gracefully shutdown or forcibly stopped
```mermaid
stateDiagram-v2
[*] --> Initializing
Initializing --> Ready: Valid Config
Initializing --> Failed: Invalid Config
Ready --> Running: Task Assigned
Running --> Suspended: Policy Violation
Running --> Ready: Task Complete
Suspended --> Running: Policy Cleared
Suspended --> Terminated: Manual Override
Ready --> Terminated: Shutdown
Failed --> Terminated
Terminated --> [*]
```
### Resource Management
**Resource Types Managed:**
- **Memory**: Heap allocation with limits and monitoring
- **CPU**: Core allocation and utilization tracking
- **Disk I/O**: Read/write bandwidth limits
- **Network I/O**: Bandwidth and connection limits
- **Execution Time**: Timeout enforcement
**Resource Allocation Strategies:**
- **First Fit**: Fastest allocation for low-latency scenarios
- **Best Fit**: Optimal utilization for resource efficiency
- **Priority-Based**: Guarantee resources for critical agents
```rust
pub struct ResourceLimits {
pub memory_mb: usize,
pub cpu_cores: f32,
pub disk_io_mbps: usize,
pub network_io_mbps: usize,
pub execution_timeout: Duration,
}
```
---
## Multi-Tier Security
### Sandbox Architecture
The runtime implements two security tiers based on operation risk:
#### Tier 1: Docker Isolation
**Use Case**: Low-risk operations, development tasks
- Container-based isolation
- Resource limits and capability dropping
- Network isolation and read-only filesystems
- Suitable for trusted code with minimal security requirements
#### Tier 2: gVisor Isolation
**Use Case**: Standard production tasks, data processing
- User-space kernel with system call interception
- Memory protection and I/O virtualization
- Enhanced security with minimal performance impact
- Default tier for most agent operations
> **Note**: Additional isolation tiers are available in Enterprise editions for maximum security requirements.
---
## Communication System
### Message Types
The runtime supports multiple communication patterns:
**Direct Messaging**: Point-to-point communication with delivery guarantees
```rust
let response = agent_bus.send_message(
target_agent_id,
SecureMessage::new(payload)
).await?;
```
**Publish-Subscribe**: Topic-based event distribution
```rust
agent_bus.publish("data_processing.completed", event_data).await?;
agent_bus.subscribe("security.alerts", alert_handler).await?;
```
**Request-Response**: Synchronous communication with timeout
```rust
let result = agent_bus.request(
target_agent,
request_payload,
timeout_duration
).await?;
```
### Security Features
**Message Encryption**: AES-256-GCM for payload protection
**Digital Signatures**: Ed25519 signatures for authenticity
**Message Routing**: Policy-based routing controls
**Rate Limiting**: Per-agent message rate enforcement
```rust
pub struct SecureMessage {
pub id: MessageId,
pub sender: AgentId,
pub recipient: Option<AgentId>,
pub encrypted_payload: Vec<u8>,
pub signature: Ed25519Signature,
pub timestamp: SystemTime,
}
```
### Communication Policy Gate
The `CommunicationPolicyGate` sits between the DSL builtins and the CommunicationBus. All five inter-agent builtins (`ask`, `delegate`, `send_to`, `parallel`, `race`) are routed through it:
1. **Policy evaluation** — Cedar-style rules checked before any message is sent
2. **Message creation** — `SecureMessage` with Ed25519 signature and AES-256-GCM encryption
3. **Delivery tracking** — message status and audit trail via the CommunicationBus
4. **Response logging** — request/response pairs tracked with `RequestId` correlation
When the policy gate is not configured (e.g., standalone REPL), builtins behave identically to their original implementation — no policy check, no message tracking. This preserves backward compatibility.
The `ReasoningBuiltinContext` carries three optional fields:
- `sender_agent_id` — identity of the calling agent
- `comm_bus` — reference to the CommunicationBus for message routing
- `comm_policy` — reference to the CommunicationPolicyGate for authorization
### Cross-instance agent messaging
The in-process `CommunicationBus` has a distributed counterpart — `RemoteCommunicationBus` — that forwards the same message types (`ask`, `send_to`, `delegate`, `parallel`, `race`) over HTTP between separate runtime instances. This is what lets a coordinator deployed on one host talk to workers deployed on another without giving up policy enforcement, signatures, or audit trails.
Key properties:
- **Same contract** — `RemoteCommunicationBus` implements the same trait as the local bus, so agent code and DSL builtins don't change between in-process and cross-instance topologies.
- **HTTP messaging endpoints** — exposed on the runtime HTTP API and wired into `RuntimeBridge`'s default context, so `symbi up` in one location can receive messages from `symbi up` elsewhere.
- **AgentPin-anchored identity** — senders present an AgentPin ES256 token; recipients verify against the sender's domain-anchored key before the policy gate runs.
- **SchemaPin verification** — any tool manifests referenced across instances are verified against their pinned signatures before execution.
- **Audit** — remote message sends and receives are logged with the same cryptographic tamper-evident format as local messages, so the audit trail follows the message hop.
Deployment topology is typically a coordinator instance plus one or more worker instances, each deployed via `symbi shell /deploy …` (Beta) to Docker, Cloud Run, or App Runner. See the [Symbi Shell deployment guide](/symbi-shell#deployment-beta).
---
## Context & Knowledge Systems
### Agent Context Manager
Provides sophisticated persistent memory and knowledge management for agents with comprehensive search capabilities and access control.
**Context Types:**
- **Short-term Memory**: Recent interactions and immediate context
- **Long-term Memory**: Persistent knowledge and learned patterns
- **Working Memory**: Active processing and temporary state
- **Episodic Memory**: Structured experiences with events and outcomes
- **Semantic Memory**: Concepts, relationships, and structured knowledge
- **Shared Knowledge**: Cross-agent knowledge sharing with access control
**Advanced Search Modes:**
- **Keyword Search**: Text-based search with relevance scoring
- **Temporal Search**: Time-range based queries with recency factors
- **Similarity Search**: Vector-based semantic similarity using embeddings
- **Hybrid Search**: Combined keyword and similarity search with weighted scoring
**Access Control & Policy Integration:**
- **Policy Engine Integration**: Connected to resource access policies
- **Agent-Scoped Access**: Isolated contexts per agent with secure boundaries
- **Knowledge Sharing Controls**: Granular permissions for cross-agent knowledge access
- **Access Level Management**: Public, Restricted, Confidential, and Secret classifications
**Importance Calculation Algorithm:**
- **Multi-Factor Scoring**: Base importance, access frequency, recency, and user feedback
- **Memory Type Weighting**: Different importance multipliers per memory type
- **Age Decay**: Exponential decay with configurable half-life per memory type
- **Access Pattern Analysis**: Logarithmic scaling for frequently accessed items
**Context Archiving & Retention:**
- **Automatic Archiving**: Policy-driven archiving of old memory items
- **Retention Policies**: Configurable retention periods per data type
- **Compressed Storage**: Gzip compression for archived data
- **Incremental Cleanup**: Background cleanup with retention statistics
**Context Statistics & Monitoring:**
- **Memory Usage Tracking**: Accurate byte-level memory calculations
- **Retention Analytics**: Items eligible for archiving and deletion
- **Performance Metrics**: Context retrieval latency and throughput
- **Health Monitoring**: Context manager health checks and status
**Knowledge Search Fallback:**
- **Primary Vector Search**: Semantic search via vector database when available
- **Fallback Keyword Search**: Text-based search when vector DB unavailable
- **Relevance Scoring**: Sophisticated scoring combining multiple factors
- **Trust Score Calculation**: Trust metrics for shared knowledge items
```rust
pub trait ContextManager {
async fn store_context(&self, agent_id: AgentId, context: AgentContext) -> Result<ContextId>;
async fn retrieve_context(&self, agent_id: AgentId, session_id: Option<SessionId>) -> Result<Option<AgentContext>>;
async fn query_context(&self, agent_id: AgentId, query: ContextQuery) -> Result<Vec<ContextItem>>;
async fn update_memory(&self, agent_id: AgentId, updates: Vec<MemoryUpdate>) -> Result<()>;
async fn search_knowledge(&self, agent_id: AgentId, query: &str, limit: usize) -> Result<Vec<KnowledgeItem>>;
async fn share_knowledge(&self, from_agent: AgentId, to_agent: AgentId, knowledge_id: KnowledgeId, access_level: AccessLevel) -> Result<()>;
async fn archive_context(&self, agent_id: AgentId, before: SystemTime) -> Result<u32>;
async fn get_context_stats(&self, agent_id: AgentId) -> Result<ContextStats>;
async fn shutdown(&self) -> Result<()>;
}
```
### RAG Engine Integration
**RAG Pipeline:**
1. **Query Analysis**: Understanding agent information needs
2. **Vector Search**: Semantic similarity search in knowledge base
3. **Document Retrieval**: Fetching relevant knowledge documents
4. **Context Ranking**: Relevance scoring and filtering
5. **Response Generation**: Context-augmented response synthesis
**Performance Targets:**
- Context retrieval: <50ms average
- Vector search: <100ms for 1M+ embeddings
- RAG pipeline: <500ms end-to-end
### Vector Database
**Supported Operations:**
- **Semantic Search**: Similarity-based document retrieval
- **Metadata Filtering**: Constraint-based search refinement
- **Batch Operations**: Efficient bulk operations
- **Real-time Updates**: Dynamic knowledge base updates
**Vector Database Abstraction:**
Symbi uses a pluggable vector database backend. **LanceDB** is the zero-config default (embedded, no external service required). **Qdrant** is available as an optional backend behind the `vector-qdrant` feature flag.
| LanceDB (default) | _built-in_ | None (zero-config) | Development, single-node, embedded deployments |
| Qdrant | `vector-qdrant` | `SYMBIONT_VECTOR_HOST` | Distributed production clusters |
```rust
pub struct VectorConfig {
pub backend: VectorBackend, // LanceDB (default) or Qdrant
pub dimension: usize, // 1536 for OpenAI embeddings
pub distance_metric: DistanceMetric::Cosine,
pub index_type: IndexType::HNSW,
pub data_path: PathBuf, // LanceDB storage path
}
```
---
## Agentic Reasoning Loop
The reasoning loop implements an **Observe-Reason-Gate-Act (ORGA)** cycle that drives autonomous agent behavior. It unifies LLM inference, policy enforcement, tool execution, and knowledge management into a single, type-safe loop.
For a complete guide, see the [Reasoning Loop Guide](reasoning-loop.md).
### Architecture Overview
```mermaid
graph LR
subgraph "ORGA Cycle"
R[Reasoning\nLLM Inference] --> P[Policy Check\nGate Evaluation]
P --> D[Tool Dispatching\nAction Execution]
D --> O[Observing\nResult Collection]
O --> R
end
subgraph "Knowledge Bridge"
KB[Knowledge\nContext Manager]
KT[recall_knowledge\nstore_knowledge]
end
subgraph "Infrastructure"
CB[Circuit Breakers]
J[Durable Journal]
M[Metrics and Tracing]
end
KB -->|inject context| R
KT -->|tool calls| D
CB --> D
J --> R
J --> P
J --> D
M --> R
```
### Typestate-Enforced Phase Transitions
Phase transitions are enforced at compile time using zero-sized type markers. Invalid transitions (e.g., dispatching tools without reasoning first) are structurally impossible:
```
AgentLoop<Reasoning> → AgentLoop<PolicyCheck> → AgentLoop<ToolDispatching> → AgentLoop<Observing>
↑ │
└──────────────────────── LoopContinuation::Continue ──────────────────────┘
```
Each phase consumes `self` and produces the next phase, making skipping phases a compile error.
### ReasoningLoopRunner
The main entry point wires together all components:
```rust
pub struct ReasoningLoopRunner {
pub provider: Arc<dyn InferenceProvider>, // Cloud or SLM inference
pub policy_gate: Arc<dyn ReasoningPolicyGate>, // Action evaluation
pub executor: Arc<dyn ActionExecutor>, // Tool dispatch
pub context_manager: Arc<dyn ContextManager>, // Token budget
pub circuit_breakers: Arc<CircuitBreakerRegistry>,
pub journal: Arc<dyn JournalWriter>, // Durable event log
pub knowledge_bridge: Option<Arc<KnowledgeBridge>>, // Optional knowledge integration
}
```
### Knowledge-Reasoning Bridge
When a `KnowledgeBridge` is provided, the reasoning loop gains access to the agent's knowledge store:
- **Before each reasoning step**: Relevant knowledge is retrieved and injected as a system message
- **During tool dispatch**: `recall_knowledge` and `store_knowledge` tool calls are intercepted by `KnowledgeAwareExecutor`
- **After loop completion**: Conversation learnings are persisted as episodic memory
The bridge is fully opt-in — without it, the loop behaves identically to before.
### Loop Phases
| **Reasoning** | `phases.rs` | LLM inference produces proposed actions (tool calls or text response) |
| **Policy Check** | `policy_bridge.rs` | Each action evaluated: Allow, Deny, or Modify |
| **Tool Dispatching** | `executor.rs` | Approved actions executed in parallel with circuit breakers |
| **Observing** | `phases.rs` | Results collected, loop continues or terminates |
### Supporting Infrastructure
| Circuit Breakers | `circuit_breaker.rs` | Failure thresholds, recovery timeouts, half-open probing |
| Durable Journal | `loop_types.rs` | Sequenced event log for replay and debugging |
| Human Critic | `human_critic.rs` | Human-in-the-loop approval for sensitive actions |
| Cedar Gate | `cedar_gate.rs` | Cedar policy engine integration for fine-grained authorization |
| Saga Pattern | `saga.rs` | Multi-step distributed operations with checkpoint/rollback |
| Agent Registry | `agent_registry.rs` | Persistent agent metadata with lifecycle management |
| Tracing | `tracing_spans.rs` | OpenTelemetry distributed tracing for each loop phase |
| Metrics | `metrics.rs` | Iteration counts, token usage, latency histograms |
| Tool Profile | `tool_profile.rs` | Glob-based tool filtering before LLM sees them (`orga-adaptive`) |
| Progress Tracker | `progress_tracker.rs` | Per-step reattempt limits with stuck-loop detection (`orga-adaptive`) |
| Pre-Hydration | `pre_hydrate.rs` | Deterministic context pre-fetch from task references (`orga-adaptive`) |
---
## MCP Integration
### Model Context Protocol Client
Enables agents to access external tools and resources securely.
**Core Capabilities:**
- **Server Discovery**: Automatic discovery of available MCP servers
- **Tool Management**: Dynamic tool discovery and invocation
- **Resource Access**: Secure access to external data sources
- **Protocol Handling**: Full MCP specification compliance
### Tool Discovery Process
```mermaid
sequenceDiagram
participant Agent
participant MCP as MCP Client
participant Server as MCP Server
participant Verifier as Tool Verifier
Agent->>MCP: Request Tools
MCP->>Server: Connect and List Tools
Server-->>MCP: Tool Definitions
MCP->>Verifier: Verify Tool Schemas
Verifier-->>MCP: Verification Results
MCP-->>Agent: Verified Tools
Agent->>MCP: Invoke Tool
MCP->>Server: Tool Invocation
Server-->>MCP: Tool Response
MCP-->>Agent: Verified Response
```
### Tool Verification with SchemaPin
**Verification Process:**
1. **Schema Discovery**: Retrieve tool schema from MCP server
2. **Signature Verification**: Verify cryptographic signature
3. **Trust-On-First-Use**: Pin trusted keys for future verification
4. **Policy Enforcement**: Apply tool usage policies
5. **Audit Logging**: Log all tool interactions
```rust
pub struct ToolVerifier {
key_store: SchemaPinKeyStore,
policy_engine: Arc<PolicyEngine>,
audit_logger: AuditLogger,
}
impl ToolVerifier {
pub async fn verify_tool(&self, tool: &MCPTool) -> VerificationResult;
pub async fn enforce_policies(&self, agent_id: AgentId, tool: &MCPTool) -> PolicyResult;
}
```
---
## Policy Enforcement
### Policy Engine Architecture
**Policy Types:**
- **Access Control**: Who can access what resources
- **Data Flow**: How data moves through the system
- **Resource Usage**: Limits on computational resources
- **Audit Requirements**: What must be logged and how
**Policy Evaluation:**
```rust
pub enum PolicyDecision {
Allow,
Deny { reason: String },
AllowWithConditions { conditions: Vec<PolicyCondition> },
}
pub trait PolicyEngine {
async fn evaluate_policy(&self, context: PolicyContext, action: Action) -> PolicyDecision;
async fn register_policy(&self, policy: Policy) -> Result<PolicyId>;
}
```
### Real-time Enforcement
**Enforcement Points:**
- Agent creation and configuration
- Message sending and routing
- Resource allocation requests
- External tool invocation
- Data access operations
**Performance:**
- Policy evaluation: <1ms per decision
- Batch evaluation: 10,000+ decisions per second
- Real-time updates: Policy changes propagated instantly
---
## Audit and Compliance
### Cryptographic Audit Trail
**Audit Event Structure:**
```rust
pub struct AuditEvent {
pub event_id: Uuid,
pub timestamp: SystemTime,
pub agent_id: AgentId,
pub event_type: AuditEventType,
pub details: AuditDetails,
pub signature: Ed25519Signature,
pub chain_hash: Hash,
}
```
**Integrity Guarantees:**
- **Digital Signatures**: Ed25519 signatures on all events
- **Hash Chaining**: Events linked in immutable chain
- **Timestamp Verification**: Cryptographic timestamps
- **Batch Verification**: Efficient bulk verification
### Compliance Features
**Regulatory Support:**
- **HIPAA**: Healthcare data protection compliance
- **GDPR**: European data protection requirements
- **SOX**: Financial audit trail requirements
- **Custom**: Configurable compliance frameworks
**Audit Capabilities:**
- Real-time event streaming
- Historical event querying
- Compliance report generation
- Integrity verification
---
## Performance Characteristics
### Scalability Metrics
**Agent Management:**
- **Concurrent Agents**: 10,000+ logical agents (in-process); sandboxed agents scale with host resources
- **Agent Startup**: <1s for standard agents
- **Memory Usage**: 1-5MB per agent (varies by configuration)
- **CPU Overhead**: <5% system overhead for runtime
**Communication Performance:**
- **Message Throughput**: 100,000+ messages/second
- **Message Latency**: <10ms for local routing
- **Encryption Overhead**: <1ms per message
- **Memory Pooling**: Zero-allocation message passing
**Context & Knowledge:**
- **Context Retrieval**: <50ms average
- **Vector Search**: <100ms for 1M+ embeddings
- **Knowledge Updates**: Real-time with <10ms latency
- **Storage Efficiency**: Compressed embeddings with 80% size reduction
### Resource Management
**Memory Management:**
- **Allocation Strategy**: Pool-based allocation for performance
- **Garbage Collection**: Incremental cleanup with bounded pause times
- **Memory Protection**: Guard pages and overflow detection
- **Leak Prevention**: Automatic cleanup and monitoring
**CPU Utilization:**
- **Scheduler Overhead**: <2% CPU for 10,000 agents
- **Context Switching**: Hardware-assisted virtual threads
- **Load Balancing**: Dynamic load distribution
- **Priority Scheduling**: Real-time and batch processing tiers
---
## Configuration
### Runtime Configuration
```toml
[runtime]
max_concurrent_agents = 10000
scheduler_threads = 8
message_buffer_size = 1048576
gc_interval_ms = 100
[security]
default_sandbox_tier = "gvisor"
enforce_policies = true
audit_enabled = true
crypto_provider = "ring"
[context]
vector_backend = "lancedb" # "lancedb" (default) or "qdrant"
vector_data_path = "./data/vectors" # LanceDB storage path
embedding_dimension = 1536
context_cache_size = "1GB"
knowledge_retention_days = 365
# Optional: only needed when vector_backend = "qdrant"
# [context.qdrant]
# host = "localhost"
# port = 6334
[mcp]
discovery_enabled = true
tool_verification = "strict"
connection_timeout_s = 30
max_concurrent_connections = 100
```
### Environment Variables
```bash
# Core runtime
export SYMBI_LOG_LEVEL=info
export SYMBI_RUNTIME_MODE=production
export SYMBI_CONFIG_PATH=/etc/symbi/config.toml
# Security
export SYMBI_CRYPTO_PROVIDER=ring
export SYMBI_AUDIT_STORAGE=/var/log/symbi/audit
# Vector database (LanceDB is the zero-config default)
export SYMBIONT_VECTOR_BACKEND=lancedb # or "qdrant"
export SYMBIONT_VECTOR_DATA_PATH=./data/vectors # LanceDB storage path
# Optional: only needed when using Qdrant backend
# export SYMBIONT_VECTOR_HOST=localhost
# export SYMBIONT_VECTOR_PORT=6334
# External dependencies
export OPENAI_API_KEY=your_api_key_here
export MCP_SERVER_DISCOVERY=enabled
```
---
## Monitoring and Observability
### Metrics Collection
**System Metrics:**
- Agent count and resource usage
- Message throughput and latency
- Policy evaluation performance
- Security event rates
**Business Metrics:**
- Task completion rates
- Error frequencies by type
- Resource utilization efficiency
- Compliance audit results
**Integration:**
- **Prometheus**: Metrics collection and alerting
- **Grafana**: Visualization and dashboards
- **Jaeger**: Distributed tracing
- **ELK Stack**: Log aggregation and analysis
### Health Monitoring
```rust
pub struct HealthStatus {
pub overall_status: SystemStatus,
pub component_health: HashMap<String, ComponentHealth>,
pub resource_utilization: ResourceUtilization,
pub recent_errors: Vec<ErrorSummary>,
}
pub async fn health_check() -> HealthStatus {
// Comprehensive system health assessment
}
```
---
## Deployment
### Container Deployment
```dockerfile
FROM rust:1.88-slim as builder
WORKDIR /app
COPY . .
RUN cargo build --release --features production
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates
COPY --from=builder /app/target/release/symbi /usr/local/bin/
EXPOSE 8080
CMD ["symbi", "mcp", "--config", "/etc/symbi/config.toml"]
```
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: symbi-runtime
spec:
replicas: 3
selector:
matchLabels:
app: symbi-runtime
template:
metadata:
labels:
app: symbi-runtime
spec:
containers:
- name: runtime
image: ghcr.io/thirdkeyai/symbi:latest
ports:
- containerPort: 8080
env:
- name: SYMBI_RUNTIME_MODE
value: "production"
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2"
```
---
## Development and Testing
### Local Development
```bash
# Start dependencies (LanceDB is embedded — no external service needed)
docker-compose up -d redis postgres
# Run in development mode
RUST_LOG=debug cargo run --example full_system
# Run tests
cargo test --all --features test-utils
```
### Integration Testing
The runtime includes comprehensive test suites:
- **Unit Tests**: Component-level testing
- **Integration Tests**: Cross-component testing
- **Performance Tests**: Load and stress testing
- **Security Tests**: Penetration and compliance testing
```bash
# Run all test suites
cargo test --workspace
# Run performance benchmarks
cargo bench
# Run security tests
cargo test --features security-tests
```
---
## Next Steps
- **[Reasoning Loop Guide](/reasoning-loop)** - Complete guide to the agentic reasoning loop
- **[Security Model](/security-model)** - Deep dive into security implementation
- **[Contributing](/contributing)** - Development and contribution guidelines
- **[API Reference](/api-reference)** - Complete API documentation
- **[Examples](https://github.com/thirdkeyai/symbiont/tree/main/runtime/examples)** - Runtime examples and tutorials
The runtime architecture provides a robust foundation for building secure, scalable AI agents. Its modular design and comprehensive security model make it suitable for both development and production environments.