unified-agent-api-codex 0.3.5

# Codex Rust Wrapper

Async wrapper around the Codex CLI focused on the headless `codex exec` flow. The client shells out to the bundled or system Codex binary, mirrors stdout/stderr when asked, and keeps the parent process environment untouched.

- crates.io package: `unified-agent-api-codex`
- Rust library crate: `codex`

## Capability + versioning release notes (Workstream F)
- Capability probes now capture `codex --version`, `codex features list` (`--json` when available), and `--help` hints, storing results as `CodexCapabilities` snapshots with `collected_at` timestamps and `BinaryFingerprint` metadata keyed by canonical binary path.
- Guard helpers (`guard_output_schema`, `guard_add_dir`, `guard_mcp_login`, `guard_features_list`) keep optional flags off when support is unknown; surface `CapabilityGuard.notes` to operators instead of passing flags blindly.
- Cache controls: configure `CapabilityCachePolicy::{PreferCache, Refresh, Bypass}` via `capability_cache_policy` or `bypass_capability_cache`. Use `Refresh` for TTL/backoff windows or hot-swaps that reuse the same path; use `Bypass` when metadata is missing (FUSE/overlay filesystems) or when you need an isolated probe that skips cache reads/writes.
- TTL/backoff helper: `capability_cache_ttl_decision` inspects `collected_at` and fingerprint presence to recommend `Refresh` vs `Bypass` for hot-swaps or metadata-missing paths (FUSE/overlay); start with a ~5 minute TTL and back off toward 10-15 minutes when metadata keeps failing.
- Overrides + persistence: `capability_snapshot` / `capability_overrides` accept manual snapshots and feature/version hints; `write_capabilities_snapshot`, `read_capabilities_snapshot`, and `capability_snapshot_matches_binary` let hosts reuse snapshots across processes while avoiding stale data when fingerprints diverge.
- Update advisories stay offline: supply `CodexLatestReleases` and call `update_advisory_from_capabilities` to prompt upgrades without this crate performing network I/O.

## Snapshot reuse + cache policy quickstart
Run the snapshot example to see disk reuse with fingerprint checks plus cache policy guidance:

```
cargo run -p unified-agent-api-codex --example capability_snapshot -- ./codex ./codex-capabilities.json auto
```

- The example loads a prior snapshot when the fingerprint matches, falls back to `CapabilityCachePolicy::Refresh` after a TTL or hot-swap, and drops to `CapabilityCachePolicy::Bypass` when metadata is missing (typical on some FUSE/overlay mounts) to avoid persisting snapshots that cannot be validated.
- Refresh vs. Bypass: use `Refresh` to re-probe while still writing back to the cache (good for TTL/backoff windows or deployments that reuse the same path); use `Bypass` for one-off probes that should not read or write cache entries when metadata is unreliable.

See `crates/codex/examples/capability_snapshot.rs` for the full flow, including fingerprint validation and snapshot persistence helpers.

## TTL/backoff helper
Use `capability_cache_ttl_decision` to decide whether to reuse a cached snapshot or force a probe with the right cache policy:

```rust
use codex::{
    capability_cache_entry, capability_cache_ttl_decision, CapabilityCachePolicy, CodexClient,
};
use std::{path::Path, time::{Duration, SystemTime}};

async fn refresh_capabilities(client: &CodexClient, binary: &Path) {
    let cached = capability_cache_entry(binary);
    let ttl = Duration::from_secs(300); // start with ~5 minutes for binaries with fingerprints
    let decision = capability_cache_ttl_decision(cached.as_ref(), ttl, SystemTime::now());

    let capabilities = if let Some(snapshot) = cached.filter(|_| !decision.should_probe) {
        snapshot
    } else {
        client.probe_capabilities_with_policy(decision.policy).await
    };

    if decision.policy == CapabilityCachePolicy::Bypass {
        // FUSE/overlay path; back off toward 10-15 minutes to avoid hammering probes.
    }

    let _ = capabilities; // reuse, refresh, or bypass based on the helper decision
}
```
- `Refresh` covers hot-swaps that reuse the same binary path even when fingerprints look unchanged.
- `Bypass` is returned when metadata is missing; avoid cache writes and increase the TTL/backoff window to reduce probe churn.

## Binary and `CODEX_HOME` isolation

- Point the wrapper at a bundled Codex binary via [`CodexClientBuilder::binary`]; if unset, it honors `CODEX_BINARY` or falls back to `codex` on `PATH`.
- Apply an app-scoped home with [`CodexClientBuilder::codex_home`]. The resolved binary is mirrored into `CODEX_BINARY`, and the provided home is exported as `CODEX_HOME` for every spawn site (exec/login/status/logout). The parent environment is never mutated.
- Use [`CodexClientBuilder::create_home_dirs`] to control whether `CODEX_HOME`, `conversations/`, and `logs/` are created up front (defaults to `true` when a home is set). `RUST_LOG` defaults to `error` if you have not set it.

```rust
use codex::{CodexClient, CodexHomeLayout};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let binary = "/opt/myapp/bin/codex";
    let codex_home = "/var/lib/myapp/codex";

    // Discover (and optionally create) the CODEX_HOME layout.
    let layout = CodexHomeLayout::new(codex_home);
    layout.materialize(true)?;
    println!("Logs live at {}", layout.logs_dir().display());

    let client = CodexClient::builder()
        .binary(binary)
        .codex_home(codex_home)
        .create_home_dirs(true)
        .mirror_stdout(false)
        .quiet(true)
        .build();

    let reply = client.send_prompt("Health check").await?;
    println!("{reply}");
    Ok(())
}
```

## `CODEX_HOME` layout helper

`CodexHomeLayout` documents where Codex stores state under an app-scoped home:

- `config.toml`
- `auth.json`
- `.credentials.json`
- `history.jsonl`
- `conversations/` for transcript JSONL files
- `logs/` for `codex-*.log` files

Call [`CodexHomeLayout::materialize`] to create the root, `conversations/`, and `logs/` directories before spawning Codex.

## Stream JSONL events

Use the streaming surface to consume `codex exec --json` output as it arrives. Disable stdout mirroring so you control the console, and set an idle timeout to fail fast if the CLI stalls.

```rust
use codex::{CodexClient, ExecStreamRequest, ThreadEvent};
use futures_util::StreamExt;
use std::{path::PathBuf, time::Duration};

# async fn demo() -> Result<(), Box<dyn std::error::Error>> {
let client = CodexClient::builder()
    .json(true)
    .quiet(true)
    .mirror_stdout(false)
    .json_event_log("logs/codex_events.log")
    .build();

let mut stream = client
    .stream_exec(ExecStreamRequest {
        prompt: "List repo files".into(),
        idle_timeout: Some(Duration::from_secs(30)),
        output_last_message: Some(PathBuf::from("last_message.txt")),
        output_schema: None,
        json_event_log: None, // override per request if desired
    })
    .await?;

while let Some(event) = stream.events.next().await {
    match event {
        Ok(ThreadEvent::ItemDelta(delta)) => println!("delta: {:?}", delta.delta),
        Ok(other) => println!("event: {other:?}"),
        Err(err) => {
            eprintln!("stream error: {err}");
            break;
        }
    }
}

let completion = stream.completion.await?;
println!("codex exited with {}", completion.status);
if let Some(path) = completion.last_message_path {
    println!("last message saved to {}", path.display());
}
# Ok(()) }
```

## Log the raw JSON stream

Set `json_event_log` on the builder or per request to tee every raw JSONL line to disk before parsing:

- The log is appended to (existing files are preserved) and flushed per line.
- Parent directories are created automatically.
- An empty string is ignored; set a real path or leave `None` to disable.
- The per-request `json_event_log` overrides the builder default for that run.

Events still flow to your `events` stream even when teeing is enabled.

## Apply or inspect task diffs

`CodexClient::apply_task` wraps `codex apply <TASK_ID>`, and `CodexClient::cloud_diff_task` wraps `codex cloud diff <TASK_ID>` when supported by the binary. `CodexClient::apply`/`CodexClient::diff` are convenience helpers that will append `<TASK_ID>` from `CODEX_TASK_ID` when set.

All of these capture stdout/stderr and return the exit status via [`ApplyDiffArtifacts`](crates/codex/src/lib.rs). They honor the builder flags you already use for streaming:

- `mirror_stdout` controls whether stdout is echoed while still being captured.
- `quiet` suppresses stderr mirroring (stderr is always returned in the artifacts).
- `RUST_LOG` defaults to `error` for these subcommands when the environment is unset; set `RUST_LOG=info` (or higher) to inspect codex internals.

```rust
use codex::CodexClient;

# async fn demo() -> Result<(), Box<dyn std::error::Error>> {
let client = CodexClient::builder()
    .mirror_stdout(false) // silence stdout while capturing
    .quiet(true)          // silence stderr while capturing
    .build();

let apply = client.apply_task("t-123").await?;
println!("exit: {}", apply.status);
println!("stdout: {}", apply.stdout);
println!("stderr: {}", apply.stderr);
# Ok(()) }
```

## RUST_LOG defaults

If `RUST_LOG` is unset, the wrapper injects `RUST_LOG=error` for spawned commands to silence verbose upstream tracing. Any existing `RUST_LOG` value is respected.

## MCP + app-server helpers

- `codex::mcp` offers typed clients for `codex mcp-server --stdio` and `codex app-server --stdio`, along with config managers for `[mcp_servers]` and `[app_runtimes]` plus launcher helpers when you want to spawn from saved config.
- Use `CodexClient::spawn_mcp_login_process` (capability-guarded) when you need an interactive bearer token for HTTP transports before persisting it via `McpConfigManager::login`.
- Examples: `mcp_codex_flow` (typed `tools/call` for `codex` + `codex-reply` with optional cancellation), `mcp_codex_tool`/`mcp_codex_reply` (raw tool calls with `--sample` payloads; use the `session_id` from `session_configured` as the `conversationId`, and note that `codex-reply` requires the session to remain active inside the same `mcp-server` process on 0.61.0), and `app_server_turns`/`app_server_thread_turn` (thread start/resume + optional interrupt). Pair these with `feature_detection` if the binary may be missing server endpoints.
- MCP `codex-reply` does **not** rehydrate conversations from disk on 0.61.0; follow-up calls only work while the original `mcp-server` process is still running. For cross-process resumes, use `codex exec resume` (CLI) or the app-server `thread/resume` path instead.

## Runtime definitions and env prep
- `[mcp_servers]` and `[app_runtimes]` live in `config.toml`; `McpConfigManager` reads/writes them.
- `StdioServerConfig` should be built with the Workstream A env prep (binary path, `CODEX_HOME`, base env, timeouts). Runtime entries layer env/timeout overrides on top of those defaults, and `CODEX_HOME` is injected when `code_home` is set.
- Resolution through the runtime/app APIs is read-only: stored config and metadata are not mutated.

## MCP runtime API (read-only)
- `McpRuntimeApi::from_config(&manager, &defaults)` loads launch-ready stdio configs or HTTP connectors from stored runtimes.
- `available` returns `McpRuntimeSummary` entries (description/tags/tool hints + transport kind).
- `launcher`, `stdio_launcher`, and `http_connector` hand back launchers/connectors without side effects; HTTP connectors resolve bearer tokens from env without overwriting existing `Authorization` headers.
- `prepare` spawns stdio runtimes or hands back HTTP connectors with tool hints preserved; use `ManagedStdioRuntime::stop` to shut down processes (drop is best-effort kill).
- Use `McpRuntimeManager` directly when you already have launchers and only need spawn/connector plumbing.

## App runtime API (read-only)
- `AppRuntimeApi::from_config(&manager, &defaults)` merges stored `[app_runtimes]` entries with defaults (binary/path/env/timeout) while keeping metadata/resume hints intact.
- `available` lists stored runtimes and metadata; `prepare`/`stdio_config` return merged stdio configs without launching.
- `start` launches an app-server and returns `ManagedAppRuntime` (metadata + merged env + `CodexAppServer` handle). Calls leave stored definitions untouched and preserve metadata for future starts.

## Pooled app runtimes
- `AppRuntimePoolApi::from_config(&manager, &defaults)` (or `AppRuntimeApi::pool_api`) wraps the pool that reuses running runtimes by name.
- `available` lists stored entries; `running` lists active runtimes; `start` reuses an existing process if one is already running; `stop`/`stop_all` clean up without altering stored definitions or metadata/resume hints.
- Pool handles still expose stdio configs via `launcher`/`prepare` so callers can inspect launch parameters without starting a process.

## Examples and tests
- `examples/mcp_codex_flow.rs`: starts `codex mcp-server`, streams `codex/event`, supports `$ /cancelRequest` and follow-up `codex/codex-reply` via `tools/call`; respects `CODEX_BINARY`/`CODEX_HOME` and does not touch stored `[mcp_servers]`.
- `examples/app_server_turns.rs`: starts/resumes `codex app-server` threads, streams items/task_complete, and can issue `turn/interrupt` after the first item; metadata/thread IDs come from server responses and are not persisted by the wrapper.
- `examples/responses_api_proxy.rs`: launches `codex responses-api-proxy` with an API key piped on stdin; falls back to a stub `--sample` path when no `OPENAI_API_KEY`/`CODEX_API_KEY` is available and polls `--server-info` for `{port,pid}`.
- `examples/stdio_to_uds_live.rs`: Unix-only live bridge that spins up a temp Unix socket listener, runs `codex stdio-to-uds <socket>`, sends `ping`, and prints the echoed `pong`.
- `cargo test -p unified-agent-api-codex` exercises env merging and non-destructive behavior (`runtime_api_*`, `app_runtime_*`, `app_runtime_pool_*` cover listing/prepare/start/stop without writing config or altering metadata).
- See `crates/codex/EXAMPLES.md` for one-to-one CLI parity examples, including `bundled_binary_home` to run Codex from an embedded binary with isolated state.

## Integration notes

- For a practical integration pattern in an async shell/orchestrator (Substrate), see `docs/integrations/substrate.md`.