cellos-host-telemetry
The host side of the in-VM observability pipeline — per-cell vsock UDS listener, host-stamping, agent-silenced detection, host-side probes, and supervisor-side per-event signing of outbound CloudEvents.
What it is
cellos-host-telemetry is the supervisor-side receiver for the
observability path defined in
ADR-0006 — in-VM observability runner evidence.
It is not the in-guest agent — that one lives in
cellos-telemetry — and the split is deliberate
(see "Channel-authenticity model" below).
The crate does five jobs:
- Bind a per-cell UDS at
<vsock_uds_base>_9001before the VM boots, mirroring the_9000exit-code listener pattern incellos-host-firecracker::listen_for_exit_code(crates/cellos-host-firecracker/src/lib.rs:1669). Bind-before-boot is what makes the channel-authenticity primitive hold: the host trusts WHICH UDS path the bytes arrived on, not anything in the payload. - Decode CBOR-framed guest declarations (
src/listener.rs:199) with acontent_versionmajor-version gate (ADR-0006 §12) — unknown majors are rejected withTelemetryError::UnsupportedVersion. - Host-stamp every frame (
src/host_stamp.rs:28) socell_id,run_id,host_received_at, andspec_signature_hashcome exclusively from the supervisor. TheGuestDeclarationtype has no attribution fields at all, so a compromised guest cannot forge them (src/lib.rs:96-108). - Detect agent silence (
src/keepalive.rs) — aKeepAlivetracker the listener pokes per-frame, a fire-onceAgentSilencedTrigger, and awatch_for_silencewatcher loop that firescell.observability.guest.agent_silencedexactly once per run. - Sign outbound envelopes (
src/sign_outbound.rs) — supervisor-side per-event signing using the canonical-JSON payload fromcellos_core::trust_keys::canonical_event_signing_payload. Three modes (Off,HmacHMAC-SHA256 FIPS 198,Ed25519viaed25519-dalek), driven by env vars.
It additionally exposes a small host-probe surface (src/probes/,
Slot F1a / Path B) — HostProbe trait + four built-in probes
(fc_metrics, cgroup, nftables, tap_link) that watch the cell
from outside the guest using primitives the supervisor already controls
(VMM /metrics endpoint, cgroup-v2 files, nftables counters, TAP link
state). These are the cross-witness for the guest's Path A
declarations (src/probes/mod.rs:1-54).
L2 sits in the layer model at "host runtime / isolation"; this crate is the host half of the observability spine that runs next to L2 and feeds the L3 supervisor's event sink.
What it deliberately does not do:
- It does not accept signing primitives the guest could use. Per
ADR-0006 §5, the guest holds no key material — the supervisor signs.
This is different from
cellos-telemetry, which is forbidden from depending on a signer (src/lib.rs:18-23). - It does not trust ANY attribution field the guest may have
stuffed into the wire payload. Unknown CBOR keys are silently dropped
at decode (
src/listener.rs:181-217, structurally enforced byWireFramehaving only the five permitted fields). - It does not write to disk, NATS, or any other sink directly. The
outputs are values (
StampedDeclaration,AgentSilencedSignal,CloudEventV1,SignedEventEnvelopeV1); the supervisor projects them onto the configuredEventSink.
Public API surface
Top-level (src/lib.rs):
| Item | Where |
|---|---|
pub const VSOCK_TELEMETRY_PORT: u32 = 9001 |
src/lib.rs:61 |
pub const WIRE_CONTENT_VERSION_MAJOR: u16 = 1 |
src/lib.rs:68 |
pub enum TelemetryError { Bind, Wire, UnsupportedVersion } |
src/lib.rs:72 |
pub struct GuestDeclaration { probe_source, guest_pid, guest_comm, guest_monotonic_ns } |
src/lib.rs:97 |
pub struct HostStamp { cell_id, run_id, host_received_at, spec_signature_hash } |
src/lib.rs:113 |
pub struct HostProbeReading { probe, value_json, timestamp_ms } |
src/lib.rs:134 |
pub struct StampedDeclaration { cell_id, run_id, host_received_at, spec_signature_hash, probe_source, guest_pid, guest_comm, guest_monotonic_ns } |
src/lib.rs:150 |
Listener (src/listener.rs):
| Item | Where |
|---|---|
pub const MAX_FRAME_BYTES: u32 = 64 * 1024 |
src/listener.rs:50 |
pub struct VsockUdsListener |
src/listener.rs:60 |
VsockUdsListener::bind_for_cell(&Path), socket_path(), accept() |
src/listener.rs:65-112 |
pub struct VsockUdsStream |
src/listener.rs:115 |
VsockUdsStream::recv_stamped(&HostStamp, &KeepAlive) |
src/listener.rs:128 |
VsockUdsStream::recv_guest_declaration() |
src/listener.rs:153 |
pub fn decode_frame(body: &[u8]) -> Result<GuestDeclaration, TelemetryError> |
src/listener.rs:199 |
Host-stamping (src/host_stamp.rs):
| Item | Where |
|---|---|
pub fn stamp(GuestDeclaration, HostStamp) -> StampedDeclaration |
src/host_stamp.rs:28 |
pub fn stamp_now(GuestDeclaration, cell_id, run_id, spec_signature_hash) |
src/host_stamp.rs:51 |
Keep-alive (src/keepalive.rs):
| Item | Where |
|---|---|
pub const DEFAULT_KEEPALIVE_WINDOW: Duration = Duration::from_secs(10) |
src/keepalive.rs:36 |
pub struct KeepAlive (new, window, notify_frame, is_silenced) |
src/keepalive.rs:41-79 |
pub struct AgentSilencedSignal |
src/keepalive.rs:85 |
AgentSilencedSignal::CLOUDEVENT_TYPE = "dev.cellos.events.cell.observability.v1.guest.agent_silenced" |
src/keepalive.rs:105 |
pub struct AgentSilencedTrigger (new, fire, has_fired) |
src/keepalive.rs:138-179 |
pub async fn watch_for_silence(KeepAlive, Arc<AgentSilencedTrigger>, poll_interval: Duration) |
src/keepalive.rs:190 |
Signing (src/sign_outbound.rs):
| Item | Where |
|---|---|
pub struct StampedDeclaration { guest, host } (F4b local) |
src/sign_outbound.rs:73 |
pub const PROVENANCE_DECLARED: &str = "declared" |
src/sign_outbound.rs:93 |
pub const ENV_SIGN_ALG / ENV_SIGN_KID / ENV_SIGN_HMAC_KEY / ENV_SIGN_ED25519_SK |
src/sign_outbound.rs:98-104 |
pub enum SignOutboundError { InvalidConfig, Signer, Serialize } |
src/sign_outbound.rs:108 |
pub enum SigningKeyMaterial { Off, Hmac { kid, key }, Ed25519 { kid, signing_key } } |
src/sign_outbound.rs:127 |
pub enum SigningOutcome |
src/sign_outbound.rs:283 |
pub fn host_stamped_envelope(...) |
src/sign_outbound.rs:306 |
pub fn sign_host_stamped_envelope(...) |
src/sign_outbound.rs:354 |
pub fn host_stamp_and_sign(...) |
src/sign_outbound.rs:374 |
Probes (src/probes/):
| Item | Where |
|---|---|
pub const HOST_PROBE_EVENT_SOURCE = "cellos-host-telemetry/probes" |
src/probes/mod.rs:78 |
pub const HOST_PROBE_EVENT_TYPE_PREFIX = "dev.cellos.events.cell.observability.host.v1" |
src/probes/mod.rs:84 |
pub struct ProbeContext { cell_id, run_id, spec_signature_hash } |
src/probes/mod.rs:93 |
pub struct ProbeReading |
src/probes/mod.rs:125 |
pub enum ProbeError |
src/probes/mod.rs:158 |
pub trait HostProbe |
src/probes/mod.rs:191 |
pub fn build_host_probe_envelope(...) -> CloudEventV1 |
src/probes/mod.rs:233 |
pub fn emit_reading(...) |
re-exported from src/lib.rs:50 |
Built-in probes: FcMetricsProbe, CgroupProbe, NftablesProbe, TapLinkProbe |
src/probes/{fc_metrics,cgroup,nftables,tap_link}.rs |
#![deny(unsafe_code)] and #![warn(missing_docs)] are enforced at
crate root (src/lib.rs:36-37).
Architecture / how it works
Channel-authenticity model (ADR-0006 §5). Firecracker proxies the
guest's vsock connection to a per-cell UDS at
<vsock_uds_base>_<port>. The supervisor passes the same base path
through to this crate (so the telemetry UDS sits alongside the exit-code
UDS in one socket dir, making teardown a single remove_dir_all), and
the listener binds at <base>_9001 before the workload's first
instruction. The host then trusts whichever stream the bytes arrived on
— payloads carry no attribution. This is structural: GuestDeclaration
literally has no cell_id / run_id / spec_signature_hash fields, and
a compile-time witness test in src/lib.rs:186-206 keeps it that way.
Wire format (ADR-0006 §12). Each frame is u32 LE length + CBOR map
body. The map must contain content_version: u16 (high byte = major,
low byte = minor) — only WIRE_CONTENT_VERSION_MAJOR = 1 is accepted
today, and unknown majors are rejected with
TelemetryError::UnsupportedVersion (src/listener.rs:203-209). The
remaining permitted fields are probe_source, guest_pid,
guest_comm, guest_monotonic_ns; anything else is dropped at decode
because WireFrame (src/listener.rs:186-193) has nowhere to put it.
Frames whose body length is 0 or exceeds 64 KiB (MAX_FRAME_BYTES) are
rejected with TelemetryError::Wire.
Per-frame stamping. VsockUdsStream::recv_stamped re-stamps
host_received_at on every frame so the receive instant is accurate per
event, while cell_id / run_id / spec_signature_hash are per-run
and constant (src/listener.rs:138-148). The keep-alive tracker is
poked on every successful receive (src/listener.rs:137).
Silence is observable. watch_for_silence polls the KeepAlive
tracker at a configurable interval; when last_frame_at.elapsed() >= window it calls AgentSilencedTrigger::fire exactly once. The
trigger's fire-once invariant is structural — a second fire call
returns None (src/keepalive.rs:161-173). The watcher reads
elapsed under the same critical section that checks the silence
condition, so a frame landing at the boundary cannot yield an
elapsed_ms < keepalive_window_ms in the emitted signal
(src/keepalive.rs:202-211).
Signing (F4b). The supervisor calls
[host_stamp_and_sign] (src/sign_outbound.rs:374) with a
StampedDeclaration + SigningKeyMaterial, gets back a
SigningOutcome::Unsigned(CloudEventV1) or
SigningOutcome::Signed(SignedEventEnvelopeV1). Signing payload is the
canonical-JSON serialization of the FULL CloudEventV1 per
cellos_core::trust_keys::canonical_event_signing_payload; mutating
any field after the signature is computed makes verification fail (I5 /
O2 doctrine). HMAC keys land in a verifier's hmac_keys map; Ed25519
public keys land in verifying_keys. SigningKeyMaterial implements a
custom Debug that NEVER prints key bytes — only the variant and the
kid — so an accidental {:?} in a tracing span cannot leak key material
(src/sign_outbound.rs:147-150).
Host probes (Slot F1a / Path B). Each HostProbe implementation
reads concrete inputs from the host (a file path, an endpoint, a table
name, an interface name) and returns a ProbeReading carrying
probe_source, inputs, and output blocks — D12 doctrine ("every
probe-emitted event is attributable to a probe and its concrete
inputs/outputs"). build_host_probe_envelope projects readings into
cellos_core::CloudEventV1s with source = "cellos-host-telemetry/probes"
and type dev.cellos.events.cell.observability.host.v1.<probe>. Per-
probe read() implementations are #[cfg(target_os = "linux")]; on
other targets they return ProbeError::PlatformUnsupported.
Configuration
| Env var | Default | Effect |
|---|---|---|
CELLOS_HOST_TELEMETRY_SIGN_ALG |
off |
One of off, hmac-sha256, ed25519. Anything else is rejected. (src/sign_outbound.rs:98) |
CELLOS_HOST_TELEMETRY_SIGN_KID |
required when alg != off | Signer kid embedded in SignedEventEnvelopeV1. (src/sign_outbound.rs:100) |
CELLOS_HOST_TELEMETRY_SIGN_HMAC_KEY |
required when alg=hmac-sha256 | Base64url (no-pad, padding tolerated) of the shared HMAC key. (src/sign_outbound.rs:102) |
CELLOS_HOST_TELEMETRY_SIGN_ED25519_SK |
required when alg=ed25519 | Base64url of the 32-byte Ed25519 seed. (src/sign_outbound.rs:104) |
Setting both *_HMAC_KEY and *_ED25519_SK is rejected — the operator
must pick one to avoid ambiguity over which key signed the stream
(src/sign_outbound.rs:40-43).
There is no env var for the listener itself; the UDS base path is
chosen by the calling backend (cellos-host-firecracker) and passed to
VsockUdsListener::bind_for_cell.
Examples
Listener + host-stamping:
use Path;
use ;
use ;
let listener = bind_for_cell?;
let mut stream = listener.accept.await?;
let stamp = HostStamp ;
let keepalive = new;
while let Some = stream.recv_stamped.await?
# Ok::
Silence watcher:
use Arc;
use Duration;
use ;
let keepalive = new;
let trigger = new;
let signal = watch_for_silence.await;
// signal: Option<AgentSilencedSignal> — Some(_) on first silence detection
Signing:
use ;
let key_material = from_env?;
let outcome: SigningOutcome = host_stamp_and_sign?;
match outcome
# Ok::
Testing
cargo test -p cellos-host-telemetry
In-source unit tests cover:
- Frame decode: unknown major rejected, known major accepted with
unknown fields dropped, garbage rejected, UDS bind path, end-to-end
round trip with attribution overwrite (
src/listener.rs:244-345). - Host stamping: host-stamped attribution overrides,
host_received_atpreserved when supplied explicitly (src/host_stamp.rs:68-111). - Keep-alive: fresh tracker is not silenced, post-window is silenced,
notify_frameresets timer, trigger fires exactly once, watcher fires after window (src/keepalive.rs:215-277). - Constants pinned: vsock port 9001, wire major 1
(
src/lib.rs:171-184).
Integration tests under tests/:
| File | Scope |
|---|---|
smoke.rs |
Host-probe envelope builder, emit_reading against a no-op sink, wire-version / port constants. |
kill_the_agent.rs |
Agent-silenced detection end-to-end. |
No #[ignore] gating — the crate's tests all run on every CI leg
because the listener works against an in-process Unix Domain Socket
(no vsock required).
Related crates
cellos-telemetry— the in-guest agent. Forbidden from depending on a signer; emits unsigned declarations over vsock.cellos-host-firecracker— pairs the_9000exit-code UDS with the_9001telemetry UDS in the same per-cell socket dir.cellos-core—CloudEventV1,SignedEventEnvelopeV1,canonical_event_signing_payload,EventSink, the trust-key sign/verify primitives.cellos-supervisor— owns the receiver loop and theEventSinkthe projected envelopes flow into.
ADRs
- ADR-0006 — In-VM observability runner evidence
— the doctrine reference for the entire host-receiver design.
Specifically §5 (channel-authenticity), §6 (host-stamped attribution
is non-negotiable), §7 (
agent_silencedis an observable signal), and §12 (wire-schema versioning) are all enforced in this crate.