cellos-supervisor
The CellOS runner. Takes an ExecutionCellDocument, brings up a cell on a
host backend, enforces network/identity/secret policy, and emits a typed
CloudEvent for every step it takes.
What it is
cellos-supervisor is primarily a binary (src/main.rs) — the composition
root that wires a host backend, a secret broker, an event sink, and one or
more export sinks into a single [Supervisor] (src/supervisor.rs:38)
that owns the cell lifecycle:
network_scope → trust_plane_observability (optional) → secrets →
lifecycle.started → run → export → destroy → revoke → lifecycle.destroyed
It sits across L4–L6 of the layer model: above the L1 ports defined by
cellos-core, above the L2/L3 host-backend abstractions (cellos-host-*),
sinks (cellos-sink-*), brokers (cellos-broker-*), and exports
(cellos-export-*), and below cellos-server and cellos-cortex which
treat the supervisor as an opaque emitter of CloudEvents. With ~15,000 lines
of Rust across two-dozen modules, this is the biggest crate in the
workspace. This README is a map, not a manual — for any module you care
about, the linked source is the source of truth.
What cellos-supervisor deliberately does NOT do:
- It does not own the spec vocabulary. Every type (
ExecutionCellDocument,AuthorityBundle,PolicyPackSpec,RunSpec) comes fromcellos-core. - It does not own the wire format of CloudEvents. Every event is built by a
*_data_v1/cloud_event_v1_*function fromcellos-core::events. - It does not host an HTTP server. Operators talk to
cellos-server; the supervisor only publishes events. - It does not depend on Cortex. The Cortex bridge lives in
cellos-cortexand links to the supervisor, never the other way around (ADR-0008). - It does not run on a
cellos-litebuild with local LLM dependencies — the inference broker port is implemented by external crates.
Public API surface
The crate is a binary; lib.rs (src/lib.rs:1) exposes only the modules
integration tests need:
dns_proxy— the SEAM-1 / L2-04 DNS proxy. Forward-only UDP, enforcesdnsAuthority.hostnameAllowlistat the protocol layer, emits onedns_queryCloudEvent per query.src/dns_proxy/mod.rs:1. Submodules:parser,upstream,spawn,dnssec.sni_proxy— TLS SNI / H2:authorityevaluator.src/sni_proxy/.resolver_refresh— host-controlled DNS resolver refresh with TTL watchdog and drift CloudEvent emission.src/resolver_refresh/.ebpf_flow— scaffolding for the eBPF/nflog per-flow listener (Phase 2).src/ebpf_flow.rs:1.event_signing— Ed25519/HMAC per-event signing wrapper. The public posture mirror (event_signing_posture::SigningConfig,src/lib.rs:62) isdoc(hidden)and exists only so integration tests can pin theZeroizing<Vec<u8>>invariant on key material.linux_cgroup— cgroup v2 helpers (target_os = "linux").nft_counters— nftables counter readers for network-enforcement events.per_flow— real-time per-flow nflog listener.destruction_evidence— terminal-state evidence builder for thelifecycle.destroyedevent.spec_input— readExecutionCellDocumentfrom stdin/file + spec-hash computation.src/spec_input.rs.trust_keyset_load— loadSignedTrustKeysetEnvelopefromCELLOS_TRUST_KEYSET_PATHand verify against the operator-supplied keyring.host_telemetry(re-export ofcellos_host_telemetry) — F1a Path B host-side probes + F3b vsock listener (per ADR-0006 §5.4).__a2_02::resolve_caller_identity—doc(hidden)mirror ofcomposition::resolve_caller_identityfor an integration test that pins theCELLOS_CALLER_IDENTITY→ trim →"default"fallback contract.
Everything else lives in the binary's private module tree:
composition— env-driven wiring (host backend, broker, sinks, exports, policy/authz/authority/trust keys).src/composition.rs.supervisor— the lifecycle orchestrator.src/supervisor.rs:38.supervisor_helpers— helpers for cell-spec destructuring, target resolution, redaction.src/supervisor_helpers.rs.network_policy,linux_isolation,linux_mount,linux_net,linux_seccomp— the Linux dataplane and isolation primitives.runtime_secret,proxy_activation,command_runner,trust_plane_observability— the rest of the run-phase machinery.
Architecture / how it works
┌───────────────────────────────────────────────────────┐
│ main.rs: │
│ - parse argv / stdin spec │
│ - validate_execution_cell_document │
│ - verify_authority_derivation │
│ - enforce_derivation_scope_policy │
│ - build_supervisor (composition.rs) │
│ - emit_startup_banner │
│ - Supervisor::run(spec) │
└─────────────────────────┬─────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ Supervisor (src/supervisor.rs) │
│ │
│ host: Arc<dyn CellBackend> ← cellos-host-* │
│ broker: Arc<dyn SecretBroker> ← cellos-broker-* │
│ event_sink: Arc<dyn EventSink> ← cellos-sink-* │
│ jsonl_sink: Option<Arc<dyn EventSink>>← cellos-sink-jsonl │
│ exports: HashMap<String, Arc<dyn ExportSink>> │
│ policy_pack: Option<PolicyPackSpec> (admission gate) │
│ authz_policy:Option<AuthorizationPolicy>(RBAC, ADR-0007) │
│ authority_keys, trust_verify_keys:Arc<HashMap<...>> │
│ │
│ lifecycle: │
│ network_scope │
│ trust_plane_observability │
│ secrets (mount / env / runtime lease) │
│ lifecycle.started │
│ run → command_completed / observability.* │
│ export → export_completed_v2 / export_failed_v2 │
│ destroy → revoke → lifecycle.destroyed (always) │
└──────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────┐
│ CloudEventV1 → primary event_sink → │
│ optional jsonl_sink (mirror) │
│ → JetStream / file / DLQ / redacted │
└───────────────────────────────────────────┘
Teardown semantics: destroy and revoke_for_cell are called
unconditionally even when a phase error has already been captured
(src/supervisor.rs:5). Residue classes (ResidueClass, LifecycleResidueClass)
on the terminal event let the projector and operator audit what was left
behind.
The DNS proxy (src/dns_proxy/) is forward-only UDP. It parses each query,
evaluates dnsAuthority.hostnameAllowlist (literal or single-leading-*.
wildcard), and either forwards verbatim to the declared upstream or builds
a REFUSED response. Every observed query is emitted as a CloudEvent built
by cellos_core::cloud_event_v1_dns_query. SERVFAIL is synthesized on
upstream timeout so workloads see deterministic failure
(src/dns_proxy/mod.rs:23).
Configuration
The supervisor is configured almost entirely through environment variables.
This is a representative slice (run cargo doc --open -p cellos-supervisor
for the full list):
| Env var | Default | Effect |
|---|---|---|
CELLOS_STRICT_CONFIG |
unset | When truthy, refuse to start if any env var fell back to a default. Useful in CI. src/composition.rs:96. |
CELLOS_CALLER_IDENTITY |
"default" |
RBAC subject used by authz_policy. Empty/whitespace → "default". src/composition.rs:148. |
CELLOS_CELL_BACKEND |
host-cellos | Selects host backend: cellctl, firecracker, or stub. src/composition.rs:157. |
CELLOS_BROKER |
env |
Secret broker: env, file, oidc, vault. src/composition.rs:170. |
CELLOS_EXPORT_DIR |
unset | When set, mount the local-FS export sink. src/composition.rs:183. |
CELLOS_EXPORT_HTTP_BASE_URL |
unset | When set, mount the HTTP export sink. |
CELLOS_DEPLOYMENT_PROFILE |
hardened |
Deployment profile (hardened / portable). Hardened auto-sets several REQUIRE_* flags; portable relaxes Darwin/dev-host isolation checks. src/composition.rs:231. |
CELLOS_POLICY_PACK_PATH |
unset | Path to the policy-pack JSON; loaded into Supervisor.policy_pack. |
CELLOS_AUTHZ_POLICY_PATH |
unset | Path to the authorization-policy JSON (ADR-0007). |
CELLOS_AUTHORITY_KEYS_PATH |
unset (required in hardened) |
Operator-supplied role → Ed25519 verifying-key map. |
CELLOS_TRUST_VERIFY_KEYS_PATH |
unset | Trust-keyset signer kid → verifying-key map. |
CELLOS_TRUST_KEYSET_PATH |
unset | Signed trust-keyset envelope. |
CELLOS_REQUIRE_AUTHORITY_DERIVATION |
unset (auto in hardened) | Reject specs without a derivation token. |
CELLOS_REQUIRE_SCOPED_DERIVATION_TOKENS |
unset (auto in hardened) | Refuse non-scoped derivation tokens. |
CELLOS_REQUIRE_TELEMETRY_DECLARED |
unset (auto in hardened) | Require telemetry.declared in every spec. |
CELL_OS_USE_NOOP_SINK |
unset | Force the noop event sink (debugging). |
CELL_OS_JSONL_EVENTS |
unset | Mirror every event to a JSONL file (path-valued). |
CELL_OS_REQUIRE_JETSTREAM |
unset (auto in hardened) | Refuse to start if the JetStream sink can't connect. |
CELLOS_RUN_ID |
run-local-001 (validate for --validate) |
Stamped onto every event in this run. |
CELLOS_EVENT_SIGNING_* |
unset | Configure the I5 per-event signing wrapper. See src/event_signing.rs. |
Hardened profile defaults are documented at src/composition.rs:253.
Examples
Run a spec under the stub backend with JSONL output:
CELLOS_CELL_BACKEND=stub \
CELLOS_BROKER=env \
CELL_OS_USE_NOOP_SINK=1 \
CELL_OS_JSONL_EVENTS=/tmp/events.jsonl \
CELLOS_DEPLOYMENT_PROFILE=portable \
Validate a spec without running it:
Project the resulting JSONL into a state snapshot:
The cellos-supervisor binary's argv shape is contracted; see
crates/cellos-supervisor/tests/argv_invariants.rs for the typed
guarantees.
Testing
crates/cellos-supervisor/tests/ carries ~80 integration tests covering
break-attempt scenarios (DNS rebinding, DNSSEC downgrade, kernel UDP 443,
SNI mismatch H2c, H2 CONTINUATION flood, post-isolation residue,
capability drop/grant), event invariants (lifecycle reason typed,
terminal state naming, manifest_failed, forced terminal exit code),
secret hygiene (zeroization, debug redaction, per-backend delivery
defaults), and trust-keyset behaviour.
Several tests are gated:
firecracker_e2e.rsrequires a local Firecracker binary and host root caps; it is#[ignore]d in default runs.- The break-attempt tests that use nftables / nflog require Linux and
run only on
target_os = "linux".
To run everything including ignored tests:
The supervisor crate has its own preflight skill, CellPreflight, that
catches common Docker / Firecracker build mistakes before a 12-minute
rebuild.
Related crates
cellos-core— owns every spec/event type this crate consumes and emits.cellos-server— projects the CloudEvents this supervisor publishes; never imports this crate.cellos-host-cellos,cellos-host-firecracker,cellos-host-stub— the threeCellBackendimplementations selected byCELLOS_CELL_BACKEND.cellos-sink-jetstream,cellos-sink-jsonl,cellos-sink-redact,cellos-sink-dlq— theEventSinkimplementations layered behind the primary sink.cellos-broker-env,cellos-broker-file,cellos-broker-oidc,cellos-broker-vault—SecretBrokerimplementations.cellos-export-local,cellos-export-http,cellos-export-s3—ExportSinkimplementations.cellos-host-telemetry— Path B host-side probes + F3b vsock receiver, re-exported ashost_telemetry.cellos-cortex— only crate allowed to import this one across the Cortex boundary.
ADRs
- ADR-0001 — NATS JetStream as the proprietary host substrate.
- ADR-0004 — TLS termination + fronting trust boundary (sni_proxy).
- ADR-0005 — typed authority enforcement (the four variants admitted here).
- ADR-0006 — in-VM observability evidence + the F3b host-side vsock receiver.
- ADR-0007 — authorization policy + secret-ref admission.
- ADR-0009
— doctrine → authority mapping, consumed by callers via
cellos-cortex. - ADR-0010 — formation authority invariant.