cellos-supervisor 0.5.1

CellOS execution-cell runner — boots cells in Firecracker microVMs or gVisor, enforces narrow typed authority, emits signed CloudEvents.
Documentation

cellos-supervisor

The CellOS runner. Takes an ExecutionCellDocument, brings up a cell on a host backend, enforces network/identity/secret policy, and emits a typed CloudEvent for every step it takes.

What it is

cellos-supervisor is primarily a binary (src/main.rs) — the composition root that wires a host backend, a secret broker, an event sink, and one or more export sinks into a single [Supervisor] (src/supervisor.rs:38) that owns the cell lifecycle:

network_scope → trust_plane_observability (optional) → secrets →
lifecycle.started → run → export → destroy → revoke → lifecycle.destroyed

It sits across L4–L6 of the layer model: above the L1 ports defined by cellos-core, above the L2/L3 host-backend abstractions (cellos-host-*), sinks (cellos-sink-*), brokers (cellos-broker-*), and exports (cellos-export-*), and below cellos-server and cellos-cortex which treat the supervisor as an opaque emitter of CloudEvents. With ~15,000 lines of Rust across two-dozen modules, this is the biggest crate in the workspace. This README is a map, not a manual — for any module you care about, the linked source is the source of truth.

What cellos-supervisor deliberately does NOT do:

  • It does not own the spec vocabulary. Every type (ExecutionCellDocument, AuthorityBundle, PolicyPackSpec, RunSpec) comes from cellos-core.
  • It does not own the wire format of CloudEvents. Every event is built by a *_data_v1 / cloud_event_v1_* function from cellos-core::events.
  • It does not host an HTTP server. Operators talk to cellos-server; the supervisor only publishes events.
  • It does not depend on Cortex. The Cortex bridge lives in cellos-cortex and links to the supervisor, never the other way around (ADR-0008).
  • It does not run on a cellos-lite build with local LLM dependencies — the inference broker port is implemented by external crates.

Public API surface

The crate is a binary; lib.rs (src/lib.rs:1) exposes only the modules integration tests need:

  • dns_proxy — the SEAM-1 / L2-04 DNS proxy. Forward-only UDP, enforces dnsAuthority.hostnameAllowlist at the protocol layer, emits one dns_query CloudEvent per query. src/dns_proxy/mod.rs:1. Submodules: parser, upstream, spawn, dnssec.
  • sni_proxy — TLS SNI / H2 :authority evaluator. src/sni_proxy/.
  • resolver_refresh — host-controlled DNS resolver refresh with TTL watchdog and drift CloudEvent emission. src/resolver_refresh/.
  • ebpf_flow — scaffolding for the eBPF/nflog per-flow listener (Phase 2). src/ebpf_flow.rs:1.
  • event_signing — Ed25519/HMAC per-event signing wrapper. The public posture mirror (event_signing_posture::SigningConfig, src/lib.rs:62) is doc(hidden) and exists only so integration tests can pin the Zeroizing<Vec<u8>> invariant on key material.
  • linux_cgroup — cgroup v2 helpers (target_os = "linux").
  • nft_counters — nftables counter readers for network-enforcement events.
  • per_flow — real-time per-flow nflog listener.
  • destruction_evidence — terminal-state evidence builder for the lifecycle.destroyed event.
  • spec_input — read ExecutionCellDocument from stdin/file + spec-hash computation. src/spec_input.rs.
  • trust_keyset_load — load SignedTrustKeysetEnvelope from CELLOS_TRUST_KEYSET_PATH and verify against the operator-supplied keyring.
  • host_telemetry (re-export of cellos_host_telemetry) — F1a Path B host-side probes + F3b vsock listener (per ADR-0006 §5.4).
  • __a2_02::resolve_caller_identitydoc(hidden) mirror of composition::resolve_caller_identity for an integration test that pins the CELLOS_CALLER_IDENTITY → trim → "default" fallback contract.

Everything else lives in the binary's private module tree:

  • composition — env-driven wiring (host backend, broker, sinks, exports, policy/authz/authority/trust keys). src/composition.rs.
  • supervisor — the lifecycle orchestrator. src/supervisor.rs:38.
  • supervisor_helpers — helpers for cell-spec destructuring, target resolution, redaction. src/supervisor_helpers.rs.
  • network_policy, linux_isolation, linux_mount, linux_net, linux_seccomp — the Linux dataplane and isolation primitives.
  • runtime_secret, proxy_activation, command_runner, trust_plane_observability — the rest of the run-phase machinery.

Architecture / how it works

                ┌───────────────────────────────────────────────────────┐
                │ main.rs:                                              │
                │  - parse argv / stdin spec                            │
                │  - validate_execution_cell_document                   │
                │  - verify_authority_derivation                        │
                │  - enforce_derivation_scope_policy                    │
                │  - build_supervisor (composition.rs)                  │
                │  - emit_startup_banner                                │
                │  - Supervisor::run(spec)                              │
                └─────────────────────────┬─────────────────────────────┘
                                          ▼
       ┌──────────────────────────────────────────────────────────────┐
       │ Supervisor (src/supervisor.rs)                               │
       │                                                              │
       │   host:        Arc<dyn CellBackend>      ← cellos-host-*     │
       │   broker:      Arc<dyn SecretBroker>     ← cellos-broker-*   │
       │   event_sink:  Arc<dyn EventSink>        ← cellos-sink-*     │
       │   jsonl_sink:  Option<Arc<dyn EventSink>>← cellos-sink-jsonl │
       │   exports:     HashMap<String, Arc<dyn ExportSink>>          │
       │   policy_pack: Option<PolicyPackSpec>     (admission gate)   │
       │   authz_policy:Option<AuthorizationPolicy>(RBAC, ADR-0007)   │
       │   authority_keys, trust_verify_keys:Arc<HashMap<...>>         │
       │                                                              │
       │   lifecycle:                                                 │
       │      network_scope                                           │
       │      trust_plane_observability                               │
       │      secrets (mount / env / runtime lease)                   │
       │      lifecycle.started                                       │
       │      run → command_completed / observability.*               │
       │      export → export_completed_v2 / export_failed_v2         │
       │      destroy → revoke → lifecycle.destroyed (always)         │
       └──────────────────────────────────────────────────────────────┘
                                          │
                                          ▼
                  ┌───────────────────────────────────────────┐
                  │  CloudEventV1 → primary event_sink →      │
                  │  optional jsonl_sink (mirror)             │
                  │  → JetStream / file / DLQ / redacted      │
                  └───────────────────────────────────────────┘

Teardown semantics: destroy and revoke_for_cell are called unconditionally even when a phase error has already been captured (src/supervisor.rs:5). Residue classes (ResidueClass, LifecycleResidueClass) on the terminal event let the projector and operator audit what was left behind.

The DNS proxy (src/dns_proxy/) is forward-only UDP. It parses each query, evaluates dnsAuthority.hostnameAllowlist (literal or single-leading-*. wildcard), and either forwards verbatim to the declared upstream or builds a REFUSED response. Every observed query is emitted as a CloudEvent built by cellos_core::cloud_event_v1_dns_query. SERVFAIL is synthesized on upstream timeout so workloads see deterministic failure (src/dns_proxy/mod.rs:23).

Configuration

The supervisor is configured almost entirely through environment variables. This is a representative slice (run cargo doc --open -p cellos-supervisor for the full list):

Env var Default Effect
CELLOS_STRICT_CONFIG unset When truthy, refuse to start if any env var fell back to a default. Useful in CI. src/composition.rs:96.
CELLOS_CALLER_IDENTITY "default" RBAC subject used by authz_policy. Empty/whitespace → "default". src/composition.rs:148.
CELLOS_CELL_BACKEND host-cellos Selects host backend: cellctl, firecracker, or stub. src/composition.rs:157.
CELLOS_BROKER env Secret broker: env, file, oidc, vault. src/composition.rs:170.
CELLOS_EXPORT_DIR unset When set, mount the local-FS export sink. src/composition.rs:183.
CELLOS_EXPORT_HTTP_BASE_URL unset When set, mount the HTTP export sink.
CELLOS_DEPLOYMENT_PROFILE hardened Deployment profile (hardened / portable). Hardened auto-sets several REQUIRE_* flags; portable relaxes Darwin/dev-host isolation checks. src/composition.rs:231.
CELLOS_POLICY_PACK_PATH unset Path to the policy-pack JSON; loaded into Supervisor.policy_pack.
CELLOS_AUTHZ_POLICY_PATH unset Path to the authorization-policy JSON (ADR-0007).
CELLOS_AUTHORITY_KEYS_PATH unset (required in hardened) Operator-supplied role → Ed25519 verifying-key map.
CELLOS_TRUST_VERIFY_KEYS_PATH unset Trust-keyset signer kid → verifying-key map.
CELLOS_TRUST_KEYSET_PATH unset Signed trust-keyset envelope.
CELLOS_REQUIRE_AUTHORITY_DERIVATION unset (auto in hardened) Reject specs without a derivation token.
CELLOS_REQUIRE_SCOPED_DERIVATION_TOKENS unset (auto in hardened) Refuse non-scoped derivation tokens.
CELLOS_REQUIRE_TELEMETRY_DECLARED unset (auto in hardened) Require telemetry.declared in every spec.
CELL_OS_USE_NOOP_SINK unset Force the noop event sink (debugging).
CELL_OS_JSONL_EVENTS unset Mirror every event to a JSONL file (path-valued).
CELL_OS_REQUIRE_JETSTREAM unset (auto in hardened) Refuse to start if the JetStream sink can't connect.
CELLOS_RUN_ID run-local-001 (validate for --validate) Stamped onto every event in this run.
CELLOS_EVENT_SIGNING_* unset Configure the I5 per-event signing wrapper. See src/event_signing.rs.

Hardened profile defaults are documented at src/composition.rs:253.

Examples

Run a spec under the stub backend with JSONL output:

CELLOS_CELL_BACKEND=stub \
CELLOS_BROKER=env \
CELL_OS_USE_NOOP_SINK=1 \
CELL_OS_JSONL_EVENTS=/tmp/events.jsonl \
CELLOS_DEPLOYMENT_PROFILE=portable \
cargo run -p cellos-supervisor --bin cellos-supervisor -- /path/to/spec.yaml

Validate a spec without running it:

cargo run -p cellos-supervisor --bin cellos-supervisor -- --validate /path/to/spec.yaml

Project the resulting JSONL into a state snapshot:

cargo run -p cellos-projector --bin cellos-projector -- /tmp/events.jsonl --pretty

The cellos-supervisor binary's argv shape is contracted; see crates/cellos-supervisor/tests/argv_invariants.rs for the typed guarantees.

Testing

cargo test -p cellos-supervisor

crates/cellos-supervisor/tests/ carries ~80 integration tests covering break-attempt scenarios (DNS rebinding, DNSSEC downgrade, kernel UDP 443, SNI mismatch H2c, H2 CONTINUATION flood, post-isolation residue, capability drop/grant), event invariants (lifecycle reason typed, terminal state naming, manifest_failed, forced terminal exit code), secret hygiene (zeroization, debug redaction, per-backend delivery defaults), and trust-keyset behaviour.

Several tests are gated:

  • firecracker_e2e.rs requires a local Firecracker binary and host root caps; it is #[ignore]d in default runs.
  • The break-attempt tests that use nftables / nflog require Linux and run only on target_os = "linux".

To run everything including ignored tests:

cargo test -p cellos-supervisor -- --include-ignored

The supervisor crate has its own preflight skill, CellPreflight, that catches common Docker / Firecracker build mistakes before a 12-minute rebuild.

Related crates

  • cellos-core — owns every spec/event type this crate consumes and emits.
  • cellos-server — projects the CloudEvents this supervisor publishes; never imports this crate.
  • cellos-host-cellos, cellos-host-firecracker, cellos-host-stub — the three CellBackend implementations selected by CELLOS_CELL_BACKEND.
  • cellos-sink-jetstream, cellos-sink-jsonl, cellos-sink-redact, cellos-sink-dlq — the EventSink implementations layered behind the primary sink.
  • cellos-broker-env, cellos-broker-file, cellos-broker-oidc, cellos-broker-vaultSecretBroker implementations.
  • cellos-export-local, cellos-export-http, cellos-export-s3ExportSink implementations.
  • cellos-host-telemetry — Path B host-side probes + F3b vsock receiver, re-exported as host_telemetry.
  • cellos-cortex — only crate allowed to import this one across the Cortex boundary.

ADRs

  • ADR-0001 — NATS JetStream as the proprietary host substrate.
  • ADR-0004 — TLS termination + fronting trust boundary (sni_proxy).
  • ADR-0005 — typed authority enforcement (the four variants admitted here).
  • ADR-0006 — in-VM observability evidence + the F3b host-side vsock receiver.
  • ADR-0007 — authorization policy + secret-ref admission.
  • ADR-0009 — doctrine → authority mapping, consumed by callers via cellos-cortex.
  • ADR-0010 — formation authority invariant.