# cellos-supervisor
The CellOS runner. Takes an `ExecutionCellDocument`, brings up a cell on a
host backend, enforces network/identity/secret policy, and emits a typed
CloudEvent for every step it takes.
## What it is
`cellos-supervisor` is primarily a binary (`src/main.rs`) — the composition
root that wires a host backend, a secret broker, an event sink, and one or
more export sinks into a single [`Supervisor`] (`src/supervisor.rs:38`)
that owns the cell lifecycle:
```
network_scope → trust_plane_observability (optional) → secrets →
lifecycle.started → run → export → destroy → revoke → lifecycle.destroyed
```
It sits across L4–L6 of the layer model: above the L1 ports defined by
`cellos-core`, above the L2/L3 host-backend abstractions (`cellos-host-*`),
sinks (`cellos-sink-*`), brokers (`cellos-broker-*`), and exports
(`cellos-export-*`), and below `cellos-server` and `cellos-cortex` which
treat the supervisor as an opaque emitter of CloudEvents. With ~15,000 lines
of Rust across two-dozen modules, this is the biggest crate in the
workspace. This README is a map, not a manual — for any module you care
about, the linked source is the source of truth.
What `cellos-supervisor` deliberately does NOT do:
- It does not own the spec vocabulary. Every type (`ExecutionCellDocument`,
`AuthorityBundle`, `PolicyPackSpec`, `RunSpec`) comes from `cellos-core`.
- It does not own the wire format of CloudEvents. Every event is built by a
`*_data_v1` / `cloud_event_v1_*` function from `cellos-core::events`.
- It does not host an HTTP server. Operators talk to `cellos-server`; the
supervisor only publishes events.
- It does not depend on Cortex. The Cortex bridge lives in `cellos-cortex`
and links *to* the supervisor, never the other way around (ADR-0008).
- It does not run on a `cellos-lite` build with local LLM dependencies —
the inference broker port is implemented by external crates.
## Public API surface
The crate is a binary; `lib.rs` (`src/lib.rs:1`) exposes only the modules
integration tests need:
- `dns_proxy` — the SEAM-1 / L2-04 DNS proxy. Forward-only UDP, enforces
`dnsAuthority.hostnameAllowlist` at the protocol layer, emits one
`dns_query` CloudEvent per query. `src/dns_proxy/mod.rs:1`.
Submodules: `parser`, `upstream`, `spawn`, `dnssec`.
- `sni_proxy` — TLS SNI / H2 `:authority` evaluator. `src/sni_proxy/`.
- `resolver_refresh` — host-controlled DNS resolver refresh with TTL
watchdog and drift CloudEvent emission. `src/resolver_refresh/`.
- `ebpf_flow` — scaffolding for the eBPF/nflog per-flow listener (Phase 2).
`src/ebpf_flow.rs:1`.
- `event_signing` — Ed25519/HMAC per-event signing wrapper. The public
posture mirror (`event_signing_posture::SigningConfig`,
`src/lib.rs:62`) is `doc(hidden)` and exists only so integration tests
can pin the `Zeroizing<Vec<u8>>` invariant on key material.
- `linux_cgroup` — cgroup v2 helpers (target_os = "linux").
- `nft_counters` — nftables counter readers for network-enforcement events.
- `per_flow` — real-time per-flow nflog listener.
- `destruction_evidence` — terminal-state evidence builder for the
`lifecycle.destroyed` event.
- `spec_input` — read `ExecutionCellDocument` from stdin/file +
spec-hash computation. `src/spec_input.rs`.
- `trust_keyset_load` — load `SignedTrustKeysetEnvelope` from
`CELLOS_TRUST_KEYSET_PATH` and verify against the operator-supplied
keyring.
- `host_telemetry` (re-export of `cellos_host_telemetry`) — F1a Path B
host-side probes + F3b vsock listener (per ADR-0006 §5.4).
- `__a2_02::resolve_caller_identity` — `doc(hidden)` mirror of
`composition::resolve_caller_identity` for an integration test that
pins the `CELLOS_CALLER_IDENTITY` → trim → `"default"` fallback
contract.
Everything else lives in the binary's private module tree:
- `composition` — env-driven wiring (host backend, broker, sinks, exports,
policy/authz/authority/trust keys). `src/composition.rs`.
- `supervisor` — the lifecycle orchestrator. `src/supervisor.rs:38`.
- `supervisor_helpers` — helpers for cell-spec destructuring, target
resolution, redaction. `src/supervisor_helpers.rs`.
- `network_policy`, `linux_isolation`, `linux_mount`, `linux_net`,
`linux_seccomp` — the Linux dataplane and isolation primitives.
- `runtime_secret`, `proxy_activation`, `command_runner`,
`trust_plane_observability` — the rest of the run-phase machinery.
## Architecture / how it works
```
┌───────────────────────────────────────────────────────┐
│ main.rs: │
│ - parse argv / stdin spec │
│ - validate_execution_cell_document │
│ - verify_authority_derivation │
│ - enforce_derivation_scope_policy │
│ - build_supervisor (composition.rs) │
│ - emit_startup_banner │
│ - Supervisor::run(spec) │
└─────────────────────────┬─────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ Supervisor (src/supervisor.rs) │
│ │
│ host: Arc<dyn CellBackend> ← cellos-host-* │
│ broker: Arc<dyn SecretBroker> ← cellos-broker-* │
│ event_sink: Arc<dyn EventSink> ← cellos-sink-* │
│ jsonl_sink: Option<Arc<dyn EventSink>>← cellos-sink-jsonl │
│ exports: HashMap<String, Arc<dyn ExportSink>> │
│ policy_pack: Option<PolicyPackSpec> (admission gate) │
│ authz_policy:Option<AuthorizationPolicy>(RBAC, ADR-0007) │
│ authority_keys, trust_verify_keys:Arc<HashMap<...>> │
│ │
│ lifecycle: │
│ network_scope │
│ trust_plane_observability │
│ secrets (mount / env / runtime lease) │
│ lifecycle.started │
│ run → command_completed / observability.* │
│ export → export_completed_v2 / export_failed_v2 │
│ destroy → revoke → lifecycle.destroyed (always) │
└──────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────┐
│ CloudEventV1 → primary event_sink → │
│ optional jsonl_sink (mirror) │
│ → JetStream / file / DLQ / redacted │
└───────────────────────────────────────────┘
```
Teardown semantics: `destroy` and `revoke_for_cell` are called
*unconditionally* even when a phase error has already been captured
(`src/supervisor.rs:5`). Residue classes (`ResidueClass`, `LifecycleResidueClass`)
on the terminal event let the projector and operator audit what was left
behind.
The DNS proxy (`src/dns_proxy/`) is forward-only UDP. It parses each query,
evaluates `dnsAuthority.hostnameAllowlist` (literal or single-leading-`*.`
wildcard), and either forwards verbatim to the declared upstream or builds
a REFUSED response. Every observed query is emitted as a CloudEvent built
by `cellos_core::cloud_event_v1_dns_query`. SERVFAIL is synthesized on
upstream timeout so workloads see deterministic failure
(`src/dns_proxy/mod.rs:23`).
## Configuration
The supervisor is configured almost entirely through environment variables.
This is a representative slice (run `cargo doc --open -p cellos-supervisor`
for the full list):
| `CELLOS_STRICT_CONFIG` | unset | When truthy, refuse to start if any env var fell back to a default. Useful in CI. `src/composition.rs:96`. |
| `CELLOS_CALLER_IDENTITY` | `"default"` | RBAC subject used by `authz_policy`. Empty/whitespace → `"default"`. `src/composition.rs:148`. |
| `CELLOS_CELL_BACKEND` | host-cellos | Selects host backend: `cellctl`, `firecracker`, or `stub`. `src/composition.rs:157`. |
| `CELLOS_BROKER` | `env` | Secret broker: `env`, `file`, `oidc`, `vault`. `src/composition.rs:170`. |
| `CELLOS_EXPORT_DIR` | unset | When set, mount the local-FS export sink. `src/composition.rs:183`. |
| `CELLOS_EXPORT_HTTP_BASE_URL` | unset | When set, mount the HTTP export sink. |
| `CELLOS_DEPLOYMENT_PROFILE` | `hardened` | Deployment profile (`hardened` / `permissive`). Hardened auto-sets several `REQUIRE_*` flags. `src/composition.rs:231`. |
| `CELLOS_POLICY_PACK_PATH` | unset | Path to the policy-pack JSON; loaded into `Supervisor.policy_pack`. |
| `CELLOS_AUTHZ_POLICY_PATH` | unset | Path to the authorization-policy JSON (ADR-0007). |
| `CELLOS_AUTHORITY_KEYS_PATH` | unset (required in `hardened`) | Operator-supplied role → Ed25519 verifying-key map. |
| `CELLOS_TRUST_VERIFY_KEYS_PATH` | unset | Trust-keyset signer kid → verifying-key map. |
| `CELLOS_TRUST_KEYSET_PATH` | unset | Signed trust-keyset envelope. |
| `CELLOS_REQUIRE_AUTHORITY_DERIVATION` | unset (auto in hardened) | Reject specs without a derivation token. |
| `CELLOS_REQUIRE_SCOPED_DERIVATION_TOKENS` | unset (auto in hardened) | Refuse non-scoped derivation tokens. |
| `CELLOS_REQUIRE_TELEMETRY_DECLARED` | unset (auto in hardened) | Require `telemetry.declared` in every spec. |
| `CELL_OS_USE_NOOP_SINK` | unset | Force the noop event sink (debugging). |
| `CELL_OS_JSONL_EVENTS` | unset | Mirror every event to a JSONL file (path-valued). |
| `CELL_OS_REQUIRE_JETSTREAM` | unset (auto in hardened) | Refuse to start if the JetStream sink can't connect. |
| `CELLOS_RUN_ID` | `run-local-001` (`validate` for `--validate`) | Stamped onto every event in this run. |
| `CELLOS_EVENT_SIGNING_*` | unset | Configure the I5 per-event signing wrapper. See `src/event_signing.rs`. |
Hardened profile defaults are documented at `src/composition.rs:253`.
## Examples
Run a spec under the stub backend with JSONL output:
```bash
CELLOS_CELL_BACKEND=stub \
CELLOS_BROKER=env \
CELL_OS_USE_NOOP_SINK=1 \
CELL_OS_JSONL_EVENTS=/tmp/events.jsonl \
CELLOS_DEPLOYMENT_PROFILE=permissive \
cargo run -p cellos-supervisor --bin cellos-supervisor -- /path/to/spec.yaml
```
Validate a spec without running it:
```bash
cargo run -p cellos-supervisor --bin cellos-supervisor -- --validate /path/to/spec.yaml
```
Project the resulting JSONL into a state snapshot:
```bash
cargo run -p cellos-projector --bin cellos-projector -- /tmp/events.jsonl --pretty
```
The `cellos-supervisor` binary's argv shape is contracted; see
`crates/cellos-supervisor/tests/argv_invariants.rs` for the typed
guarantees.
## Testing
```bash
cargo test -p cellos-supervisor
```
`crates/cellos-supervisor/tests/` carries ~80 integration tests covering
break-attempt scenarios (DNS rebinding, DNSSEC downgrade, kernel UDP 443,
SNI mismatch H2c, H2 CONTINUATION flood, post-isolation residue,
capability drop/grant), event invariants (lifecycle reason typed,
terminal state naming, manifest_failed, forced terminal exit code),
secret hygiene (zeroization, debug redaction, per-backend delivery
defaults), and trust-keyset behaviour.
Several tests are gated:
- `firecracker_e2e.rs` requires a local Firecracker binary and host root
caps; it is `#[ignore]`d in default runs.
- The break-attempt tests that use nftables / nflog require Linux and
run only on `target_os = "linux"`.
To run everything including ignored tests:
```bash
cargo test -p cellos-supervisor -- --include-ignored
```
The supervisor crate has its own preflight skill, `CellPreflight`, that
catches common Docker / Firecracker build mistakes before a 12-minute
rebuild.
## Related crates
- [`cellos-core`](../cellos-core/README.md) — owns every spec/event type
this crate consumes and emits.
- [`cellos-server`](../cellos-server/README.md) — projects the
CloudEvents this supervisor publishes; never imports this crate.
- [`cellos-host-cellos`](../cellos-host-cellos), `cellos-host-firecracker`,
`cellos-host-stub` — the three `CellBackend` implementations selected
by `CELLOS_CELL_BACKEND`.
- [`cellos-sink-jetstream`](../cellos-sink-jetstream),
`cellos-sink-jsonl`, `cellos-sink-redact`, `cellos-sink-dlq` — the
`EventSink` implementations layered behind the primary sink.
- [`cellos-broker-env`](../cellos-broker-env), `cellos-broker-file`,
`cellos-broker-oidc`, `cellos-broker-vault` — `SecretBroker`
implementations.
- [`cellos-export-local`](../cellos-export-local),
`cellos-export-http`, `cellos-export-s3` — `ExportSink`
implementations.
- [`cellos-host-telemetry`](../cellos-host-telemetry) — Path B host-side
probes + F3b vsock receiver, re-exported as `host_telemetry`.
- [`cellos-cortex`](../cellos-cortex/README.md) — only crate allowed to
import this one *across* the Cortex boundary.
## ADRs
- [ADR-0001](../../docs/adr/0001-rust-nats-jetstream-proprietary-host.md)
— NATS JetStream as the proprietary host substrate.
- [ADR-0004](../../docs/adr/0004-tls-termination-fronting-trust-boundary.md)
— TLS termination + fronting trust boundary (sni_proxy).
- [ADR-0005](../../docs/adr/0005-tls-termination-design.md) — typed
authority enforcement (the four variants admitted here).
- [ADR-0006](../../docs/adr/0006-in-vm-observability-runner-evidence.md)
— in-VM observability evidence + the F3b host-side vsock receiver.
- [ADR-0007](../../docs/adr/0007-rbac-secret-ref-admission.md) —
authorization policy + secret-ref admission.
- [ADR-0009](../../docs/adr/0009-cortex-doctrine-to-cellos-authority-mapping.md)
— doctrine → authority mapping, consumed by callers via `cellos-cortex`.
- [ADR-0010](../../docs/adr/0010-formation-authority-invariant.md) —
formation authority invariant.