oxios 0.1.1 - Docs.rs

# Oxios Security, DevOps & Production Readiness Analysis

**Date:** 2026-05-06  
**Version:** 0.2.0-alpha  
**Total Source Lines:** ~21,141 (Rust, excluding target/)  
**Crates Analyzed:** oxios-kernel, oxios-ouroboros, oxios-gateway, oxios-web (channel), oxios-frontend  

---

## Executive Summary

Oxios is at **alpha stage** — well-architected with solid security primitives (RBAC, access manager, audit logging), comprehensive test coverage (120+ test functions), and proper tracing/observability. However, it has critical production blockers: **no authentication on the HTTP API**, **permissive CORS**, **no CI/CD**, **no containerization/deployment config**, and **`anyhow` everywhere** without typed error handling. The codebase is clean (zero `unsafe`, zero `TODO/FIXME/HACK`) but needs hardening before any production use.

---

## 1. `unwrap()`, `panic!()`, `expect()` Audit

### Production Code (non-test)

| Category | Count | Severity |
|----------|-------|----------|
| `unwrap()` in production code | **0** | ✅ Clean |
| `panic!()` in production code | **0** | ✅ Clean |
| `expect()` in production code | **3** | ⚠️ Low |

The 3 `expect()` calls in production code are in `orchestrator.rs`:

```
crates/oxios-kernel/src/orchestrator.rs:119   sessions.get(&session_id).expect("session exists");
crates/oxios-kernel/src/orchestrator.rs:304   current_seed.as_ref().expect("seed exists");
crates/oxios-kernel/src/orchestrator.rs:390   current_seed.expect("at least one seed exists");
```

**Risk:** These represent invariant assumptions. If violated (race condition, logic bug), the orchestrator panics and takes down the process.

**Recommendation:** Replace with proper error propagation using `anyhow::Context` or `bail!()`.

### Test Code

| Category | Count | Context |
|----------|-------|---------|
| `unwrap()` in tests | ~80+ | Expected in test code |
| `expect()` in tests | ~25+ | Expected — with descriptive messages |
| `panic!()` in tests | 1 | `assert_eq!` pattern match |

Test usage is appropriate — `unwrap()` and `expect()` in test code is idiomatic Rust.

### Other Production `expect()` Call Sites

- `server.rs`: `.expect("Invalid bind address")` — acceptable at startup
- `mcp.rs`: `.expect("stdin/stdout not captured")` — reasonable for MCP protocol
- `main.rs`: `.expect("Failed to install Ctrl+C handler")` — acceptable at startup

---

## 2. TODO / FIXME / HACK Comments

**Result: ZERO found.** ✅

No TODO, FIXME, or HACK comments in the codebase. This is exceptionally clean.

---

## 3. `unsafe` Blocks

**Result: ZERO found.** ✅

No `unsafe` blocks anywhere in the codebase. This is excellent from a memory safety perspective.

---

## 4. Dependency Version Analysis

### Workspace Dependencies (`Cargo.toml`)

| Dependency | Version | Pinned? | Notes |
|------------|---------|---------|-------|
| tokio | `"1"` | Minor-wide | ✅ Standard |
| futures | `"0.3"` | Minor-wide | ✅ Standard |
| serde | `"1"` | Major-wide | ✅ Standard |
| serde_json | `"1"` | Major-wide | ✅ Standard |
| toml | `"0.8"` | Minor-wide | ✅ Standard |
| uuid | `"1"` | Major-wide | ✅ Standard |
| tracing | `"0.1"` | Minor-wide | ✅ Standard |
| tracing-subscriber | `"0.3"` | Minor-wide | ✅ Standard |
| anyhow | `"1"` | Major-wide | ✅ Standard |
| thiserror | `"1"` | Major-wide | ✅ Standard |
| chrono | `"0.4"` | Minor-wide | ✅ Standard |
| parking_lot | `"0.12"` | Minor-wide | ✅ Standard |
| axum | `"0.8"` | Minor-wide | ✅ Standard |
| tower-http | `"0.6"` | Minor-wide | ✅ Standard |
| clap | `"4"` | Major-wide | ✅ Standard |
| reqwest | `"0.12"` | Minor-wide | ✅ Standard |
| dioxus | `"0.7"` | Minor-wide | ✅ Frontend |
| gloo-net | `"0.6"` | Minor-wide | ✅ Frontend |

### Path Dependencies

| Dependency | Source | Risk |
|------------|--------|------|
| oxi-ai | `path = "../oxi/oxi-ai"` | ⚠️ Local path — not published to crates.io |
| oxi-agent | `path = "../oxi/oxi-agent"` | ⚠️ Local path — not published to crates.io |

**Risk:** Path dependencies mean builds are not reproducible without the sibling `oxi/` directory. No version pinning for these critical dependencies.

### Security Concerns

- **No `cargo-audit` integration** — no CI to check for known CVEs in dependencies
- **`reqwest` with `json` feature** — network client present, ensure it's not used for untrusted endpoints without validation
- **`Cargo.lock` committed** (3303 lines, 328 packages) — ✅ Good for reproducible builds

**Recommendation:** 
1. Add `cargo-audit` to CI pipeline
2. Consider publishing oxi-ai/oxi-agent or using git dependencies with tag-based versioning
3. Run `cargo outdated` periodically

---

## 5. CI/CD Configuration

**Result: ❌ NONE**

No CI/CD configuration found:
- No `.github/workflows/`
- No `.gitlab-ci.yml`
- No `Jenkinsfile`
- No `.ci/` directory
- No Makefile or Justfile with CI targets

**Recommendation — Minimum CI Pipeline:**

```yaml
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
  check:
    runs-on: macos-latest  # Apple Container requires macOS
    steps:
      - uses: actions/checkout@v4
        with: { path: 'oxios' }
      - uses: actions/checkout@v4
        with: { repository: 'owner/oxi', path: 'oxi' }
      - run: cargo fmt --check
      - run: cargo clippy --workspace -- -D warnings
      - run: cargo test --workspace
      - run: cargo audit
```

---

## 6. Dockerfile / Deployment Configuration

**Result: ❌ NONE**

- No `Dockerfile`
- No `docker-compose.yml`
- No Kubernetes manifests
- No deployment scripts

**Note:** The project uses Apple Container (macOS Silicon only), which doesn't use Docker. However, there's still no deployment automation.

The `.programs/deploy/` directory contains a program definition with a SKILL.md describing deployment procedures, but this is an agent skill, not infrastructure configuration.

**Recommendation:**
1. Document deployment steps for macOS hosts
2. Consider a `Makefile` or `justfile` for common operations
3. Add launchd plist template for macOS daemonization
4. Document reverse proxy (nginx/caddy) configuration

---

## 7. `.gitignore` Analysis

```gitignore
/target
channels/oxios-web/frontend/target/
channels/oxios-web/static/dioxus/wasm/*.wasm
channels/oxios-web/static/dioxus/wasm/snippets/
channels/oxios-web/static/dioxus/assets/
*.swp
*.swo
.DS_Store
.env
.secrets.toml
```

**Assessment:** ✅ Good

- Properly excludes build artifacts
- Excludes `.env` and `.secrets.toml` — good practice
- Excludes editor swap files and macOS `.DS_Store`

**Missing:**
- No exclusion for `*.pem`, `*.key`, `*.p12` certificate files
- No exclusion for IDE directories (`.idea/`, `.vscode/`)
- Consider adding `*.log` for runtime logs

---

## 8. Logging & Observability

### Tracing Setup ✅

The project uses `tracing` + `tracing-subscriber` throughout — this is the Rust gold standard.

**Main binary initialization:**
```rust
tracing_subscriber::fmt()
    .with_env_filter(
        tracing_subscriber::EnvFilter::try_from_default_env()
            .unwrap_or_else(|_| { /* info or debug based on -v flag */ })
    )
    .with_target(true)
    .compact()
    .init();
```

### Coverage by Module

| Module | tracing calls | Level |
|--------|--------------|-------|
| access_manager | 12+ | info, warn, debug |
| host_exec | 10+ | info, warn, error |
| supervisor | 6+ | info, error |
| scheduler | 12+ | info, warn, debug |
| agent_runtime | 6+ | info, warn, error |
| orchestrator | 5+ | info |
| context_manager | 8+ | info, debug |
| container_manager | 3+ | info |
| ouroboros_engine | 8+ | info, warn |
| gateway | 8+ | info, warn, error |

**Assessment:** ✅ Excellent coverage. Structured logging with fields (agent_id, seed_id, task_id, error).

### Missing Observability

- ❌ No metrics (no `prometheus`, `metrics`, or `opentelemetry` integration)
- ❌ No distributed tracing spans for request tracing
- ❌ No health check endpoint (or not verified)
- ❌ No structured error reporting (e.g., Sentry integration)

**Recommendation:**
1. Add `tracing-opentelemetry` for distributed tracing
2. Add a `/health` endpoint
3. Add `metrics` crate for Prometheus-compatible metrics
4. Consider `tracing-error` for enriched error spans

---

## 9. Error Type Analysis

### Current State

| Crate | Error Strategy | Assessment |
|-------|---------------|------------|
| oxios-kernel | `anyhow` everywhere (22 modules) | ⚠️ |
| oxios-ouroboros | `anyhow` everywhere | ⚠️ |
| oxios-gateway | `anyhow` everywhere | ⚠️ |
| oxios-web | `anyhow` everywhere | ⚠️ |
| mcp.rs | `McpError` struct (manual) | ⚠️ Partial |

**Key Findings:**

- **`thiserror` is in `Cargo.toml` but NEVER used** in any crate. Every module uses `anyhow::Result`.
- Only `McpError` exists as a typed error, but it's a plain struct (not `thiserror` derive).
- `anyhow` is used consistently — which is fine for applications but problematic for a library crate like `oxios-kernel` that should expose typed errors for downstream consumers.

**Risk:** API consumers cannot match on specific error variants. No structured error codes for the HTTP API.

**Recommendation:**
1. Use `thiserror` (already in deps!) for `oxios-kernel` public errors:
   ```rust
   #[derive(Debug, thiserror::Error)]
   pub enum KernelError {
       #[error("Agent {id} not found")]
       AgentNotFound { id: AgentId },
       #[error("Permission denied: {reason}")]
       PermissionDenied { reason: String },
       #[error("Container {name} unavailable")]
       ContainerUnavailable { name: String },
   }
   ```
2. Keep `anyhow` for the binary crate only
3. Map typed errors to HTTP status codes in the web layer

---

## 10. Hardcoded Credentials, Paths & Configuration

### Hardcoded Paths

| Location | Value | Risk |
|----------|-------|------|
| `config.rs:88` | `"127.0.0.1"` (default gateway host) | ✅ Safe — localhost only |
| `main.rs` | `"~/.oxios/config.toml"` (default config path) | ✅ Standard convention |
| `main.rs` | `"anthropic/claude-sonnet-4-20250514"` (default model) | ⚠️ Hardcoded model |
| `main.rs` | `4200` (default port) | ✅ Safe |

### Environment Variables Checked

- `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `API_KEY` — ✅ Properly loaded from env
- `OXIOS_MCP_*` — ✅ MCP server configuration from env
- `RUST_LOG` — ✅ Via tracing-subscriber env filter
- `HOME` — ✅ For path expansion

### Hardcoded Credentials: ❌ NONE found ✅

No passwords, API keys, tokens, or secrets found in source code.

### Other Hardcoded Values

| Location | Value | Notes |
|----------|-------|-------|
| `main.rs` | Default model: `anthropic/claude-sonnet-4-20250514` | Should be configurable via config.toml |
| `config.rs` | Default port: `4200` | Reasonable |
| `config.rs` | Default container image | Not verified |

**Recommendation:** Make the default model configurable in `config.toml` rather than hardcoded in main.rs.

---

## 11. Integration & Unit Test Coverage

### Test Inventory

| Location | Type | Test Functions | Lines |
|----------|------|---------------|-------|
| `oxios-kernel/tests/integration_tests.rs` | Integration | 44 | 1,090 |
| `oxios-kernel/src/access_manager.rs` | Unit | 43 | ~600 |
| `oxios-kernel/src/tools/container_exec.rs` | Unit | 4 | ~100 |
| `oxios-kernel/src/tools/host_exec_tool.rs` | Unit | 10 | ~150 |
| `oxios-kernel/src/tools/mcp_tool.rs` | Unit | 2 | ~40 |
| `oxios-kernel/src/tools/program_tool.rs` | Unit | 3 | ~80 |
| `oxios-kernel/src/host_exec.rs` | Unit | 10 | ~150 |
| `oxios-kernel/src/container_manager.rs` | Unit | 4 | ~100 |
| `oxios-kernel/src/context_manager.rs` | Unit | 1+ | ~20 |
| `oxios-kernel/src/mcp.rs` | Unit | 20 | ~400 |
| `oxios-kernel/src/program.rs` | Unit | 21 | ~400 |
| **Total** | | **~162** | |

### Coverage by Module

| Module | Has Tests | Coverage |
|--------|-----------|----------|
| access_manager | ✅ 43 unit + integration | Excellent |
| host_exec | ✅ 10 unit | Good |
| scheduler | ✅ Integration | Good |
| state_store | ✅ Integration | Good |
| event_bus | ✅ Integration | Good |
| supervisor | ✅ Integration | Good |
| orchestrator | ✅ Integration | Good |
| program | ✅ 21 unit | Excellent |
| mcp | ✅ 20 unit | Excellent |
| container_manager | ✅ 4 unit | Good |
| context_manager | ✅ 1 unit | Minimal |
| skill | ❌ No tests | Gap |
| persona/persona_manager | ❌ Minimal | Gap |
| a2a | ❌ No tests | Gap |
| config | ❌ No tests | Gap |
| **oxios-ouroboros** | ❌ No tests | **Major Gap** |
| **oxios-gateway** | ❌ No tests | **Major Gap** |
| **oxios-web** | ❌ No tests | **Major Gap** |

### Missing Test Coverage (Critical)

1. **oxios-ouroboros** — The spec-first protocol engine has **zero tests**. Interview → Seed → Execute → Evaluate → Evolve lifecycle is untested.
2. **oxios-gateway** — Message routing has **zero tests**.
3. **oxios-web** — HTTP routes (1798 lines!) have **zero tests**. No API endpoint tests.
4. **a2a.rs** — Inter-agent communication untested.
5. **config.rs** — Configuration loading/parsing untested.

**Recommendation:** Priority order for test additions:
1. Ouroboros protocol tests (interview, seed, evaluate, evolve)
2. HTTP API route tests (using `axum::test`)
3. Gateway message routing tests
4. Config parsing tests

---

## 12. `Cargo.lock` Analysis

- **Packages:** 328 (workspace) + frontend lock
- **Lock file committed:** ✅ Yes
- **Format version:** 4 (Cargo 1.78+)

### Key Dependency Versions (from lock file)

| Package | Version | Status |
|---------|---------|--------|
| tokio | 1.x | ✅ Current |
| axum | 0.8.x | ✅ Current |
| serde | 1.x | ✅ Current |
| tracing | 0.1.x | ✅ Current |
| oxi-ai | 0.5.0 | ⚠️ Local path |
| oxi-agent | 0.5.0 | ⚠️ Local path |
| reqwest | 0.12.x | ✅ Current |

---

## Security Deep Dive

### ✅ Strengths

1. **Access Manager (RBAC)** — Comprehensive 3-tier RBAC (User/Superuser/Admin) with:
   - Tool-level access control
   - Path-based sandbox restrictions (glob patterns)
   - Agent identity tracking
   - Audit logging of all authorization decisions
   - Container workspace isolation

2. **Host Exec Bridge** — Sandboxed command execution with:
   - Allowlist-based command filtering
   - Path traversal protection (`../` detection)
   - Argument validation

3. **Container Isolation** — Apple Container-based per-project isolation

4. **No `unsafe`** — Pure safe Rust throughout

5. **No hardcoded credentials** — All secrets from environment variables

### ❌ Critical Security Issues

| # | Issue | Severity | Details |
|---|-------|----------|---------|
| 1 | **No API Authentication** | 🔴 Critical | HTTP API has zero auth — anyone who can reach port 4200 has full admin access |
| 2 | **Permissive CORS** | 🔴 Critical | `CorsLayer::permissive()` allows any origin — CSRF and data exfiltration risk |
| 3 | **No Rate Limiting on HTTP** | 🟠 High | No rate limiting on API endpoints — DoS and API abuse risk |
| 4 | **No HTTPS/TLS** | 🟠 High | HTTP only — all traffic including API keys in plaintext |
| 5 | **No Input Validation** | 🟠 High | Routes accept arbitrary strings without sanitization (1798 lines of routes, no validation layer) |
| 6 | **Host Command Injection** | 🟡 Medium | Host exec bridge has allowlist, but complex command construction could bypass in edge cases |

### Security Recommendations (Priority Order)

1. **Add authentication middleware** — API key, JWT, or session-based auth
2. **Configure CORS properly** — Restrict to known origins
3. **Add rate limiting** — `tower-governor` or `tower-limit`
4. **Add HTTPS** — TLS termination via reverse proxy or `axum-server` with rustls
5. **Input validation** — Add `validator` crate for request payloads
6. **Security headers** — Add `tower-http` security headers middleware
7. **Audit log persistence** — Current audit log is in-memory only; persist to disk

---

## Production Readiness Checklist

| Category | Status | Notes |
|----------|--------|-------|
| Error handling | ⚠️ Partial | `anyhow` everywhere; no typed errors |
| Logging | ✅ Good | `tracing` throughout with structured fields |
| Metrics | ❌ Missing | No metrics collection |
| Health checks | ❌ Missing | No `/health` endpoint |
| Authentication | ❌ Missing | No auth on HTTP API |
| Authorization | ✅ Good | RBAC + audit logging |
| CORS | ❌ Insecure | Permissive CORS |
| HTTPS/TLS | ❌ Missing | HTTP only |
| Rate limiting | ❌ Missing | No HTTP rate limiting |
| CI/CD | ❌ Missing | No automated pipeline |
| Container deployment | ❌ Missing | No deployment config |
| Secrets management | ✅ Good | Env vars, .gitignore excludes secrets |
| Database migrations | N/A | Uses file-based state store |
| Graceful shutdown | ✅ Good | SIGINT/SIGTERM handling |
| Configuration | ✅ Good | TOML config with defaults |
| Documentation | ✅ Good | AGENTS.md, SKILL.md, inline docs |
| Test coverage | ⚠️ Partial | 162 tests but gaps in ouroboros/gateway/web |
| Dependency audit | ❌ Missing | No `cargo-audit` in pipeline |
| Reproducible builds | ⚠️ Partial | Cargo.lock present but path deps break it |

---

## Priority Action Items

### Immediate (Before Any External Access)

1. **Add API authentication** — API key header middleware
2. **Fix CORS** — Restrict to `localhost` or specific origins
3. **Add `/health` endpoint** — Essential for any deployment

### Short-term (Before Staging)

4. **Add CI pipeline** — fmt, clippy, test, audit
5. **Add typed errors** — Use `thiserror` for kernel public API
6. **Add Ouroboros tests** — Core protocol must be tested
7. **Add HTTP route tests** — 1798 lines of untested routes

### Medium-term (Before Production)

8. **Add metrics/monitoring** — Prometheus + Grafana
9. **Add TLS termination** — Reverse proxy or built-in
10. **Add rate limiting** — Per-IP and per-endpoint
11. **Publish oxi dependencies** — Or switch to git deps with tags
12. **Add deployment automation** — Document or script the deploy process

---

## Summary

Oxios has a **solid architectural foundation** with excellent security primitives (RBAC, sandboxing, audit logging) and clean code (zero unsafe, zero TODOs). The main risks are **operational security gaps** (no auth, permissive CORS, no TLS) and **missing infrastructure** (no CI/CD, no deployment config). The test coverage is good for kernel internals but critically missing for the protocol engine and HTTP layer.

**Production readiness: ~40%** — Solid core, needs hardening in operational security, observability, and deployment infrastructure before any external-facing use.