cellos-host-telemetry 0.5.1

Host-side telemetry receiver for CellOS — vsock listener that host-stamps and signs CloudEvents emitted by the in-guest cellos-telemetry agent.
Documentation

cellos-host-telemetry

The host side of the in-VM observability pipeline — per-cell vsock UDS listener, host-stamping, agent-silenced detection, host-side probes, and supervisor-side per-event signing of outbound CloudEvents.

What it is

cellos-host-telemetry is the supervisor-side receiver for the observability path defined in ADR-0006 — in-VM observability runner evidence. It is not the in-guest agent — that one lives in cellos-telemetry — and the split is deliberate (see "Channel-authenticity model" below).

The crate does five jobs:

  1. Bind a per-cell UDS at <vsock_uds_base>_9001 before the VM boots, mirroring the _9000 exit-code listener pattern in cellos-host-firecracker::listen_for_exit_code (crates/cellos-host-firecracker/src/lib.rs:1669). Bind-before-boot is what makes the channel-authenticity primitive hold: the host trusts WHICH UDS path the bytes arrived on, not anything in the payload.
  2. Decode CBOR-framed guest declarations (src/listener.rs:199) with a content_version major-version gate (ADR-0006 §12) — unknown majors are rejected with TelemetryError::UnsupportedVersion.
  3. Host-stamp every frame (src/host_stamp.rs:28) so cell_id, run_id, host_received_at, and spec_signature_hash come exclusively from the supervisor. The GuestDeclaration type has no attribution fields at all, so a compromised guest cannot forge them (src/lib.rs:96-108).
  4. Detect agent silence (src/keepalive.rs) — a KeepAlive tracker the listener pokes per-frame, a fire-once AgentSilencedTrigger, and a watch_for_silence watcher loop that fires cell.observability.guest.agent_silenced exactly once per run.
  5. Sign outbound envelopes (src/sign_outbound.rs) — supervisor-side per-event signing using the canonical-JSON payload from cellos_core::trust_keys::canonical_event_signing_payload. Three modes (Off, Hmac HMAC-SHA256 FIPS 198, Ed25519 via ed25519-dalek), driven by env vars.

It additionally exposes a small host-probe surface (src/probes/, Slot F1a / Path B) — HostProbe trait + four built-in probes (fc_metrics, cgroup, nftables, tap_link) that watch the cell from outside the guest using primitives the supervisor already controls (VMM /metrics endpoint, cgroup-v2 files, nftables counters, TAP link state). These are the cross-witness for the guest's Path A declarations (src/probes/mod.rs:1-54).

L2 sits in the layer model at "host runtime / isolation"; this crate is the host half of the observability spine that runs next to L2 and feeds the L3 supervisor's event sink.

What it deliberately does not do:

  • It does not accept signing primitives the guest could use. Per ADR-0006 §5, the guest holds no key material — the supervisor signs. This is different from cellos-telemetry, which is forbidden from depending on a signer (src/lib.rs:18-23).
  • It does not trust ANY attribution field the guest may have stuffed into the wire payload. Unknown CBOR keys are silently dropped at decode (src/listener.rs:181-217, structurally enforced by WireFrame having only the five permitted fields).
  • It does not write to disk, NATS, or any other sink directly. The outputs are values (StampedDeclaration, AgentSilencedSignal, CloudEventV1, SignedEventEnvelopeV1); the supervisor projects them onto the configured EventSink.

Public API surface

Top-level (src/lib.rs):

Item Where
pub const VSOCK_TELEMETRY_PORT: u32 = 9001 src/lib.rs:61
pub const WIRE_CONTENT_VERSION_MAJOR: u16 = 1 src/lib.rs:68
pub enum TelemetryError { Bind, Wire, UnsupportedVersion } src/lib.rs:72
pub struct GuestDeclaration { probe_source, guest_pid, guest_comm, guest_monotonic_ns } src/lib.rs:97
pub struct HostStamp { cell_id, run_id, host_received_at, spec_signature_hash } src/lib.rs:113
pub struct HostProbeReading { probe, value_json, timestamp_ms } src/lib.rs:134
pub struct StampedDeclaration { cell_id, run_id, host_received_at, spec_signature_hash, probe_source, guest_pid, guest_comm, guest_monotonic_ns } src/lib.rs:150

Listener (src/listener.rs):

Item Where
pub const MAX_FRAME_BYTES: u32 = 64 * 1024 src/listener.rs:50
pub struct VsockUdsListener src/listener.rs:60
VsockUdsListener::bind_for_cell(&Path), socket_path(), accept() src/listener.rs:65-112
pub struct VsockUdsStream src/listener.rs:115
VsockUdsStream::recv_stamped(&HostStamp, &KeepAlive) src/listener.rs:128
VsockUdsStream::recv_guest_declaration() src/listener.rs:153
pub fn decode_frame(body: &[u8]) -> Result<GuestDeclaration, TelemetryError> src/listener.rs:199

Host-stamping (src/host_stamp.rs):

Item Where
pub fn stamp(GuestDeclaration, HostStamp) -> StampedDeclaration src/host_stamp.rs:28
pub fn stamp_now(GuestDeclaration, cell_id, run_id, spec_signature_hash) src/host_stamp.rs:51

Keep-alive (src/keepalive.rs):

Item Where
pub const DEFAULT_KEEPALIVE_WINDOW: Duration = Duration::from_secs(10) src/keepalive.rs:36
pub struct KeepAlive (new, window, notify_frame, is_silenced) src/keepalive.rs:41-79
pub struct AgentSilencedSignal src/keepalive.rs:85
AgentSilencedSignal::CLOUDEVENT_TYPE = "dev.cellos.events.cell.observability.v1.guest.agent_silenced" src/keepalive.rs:105
pub struct AgentSilencedTrigger (new, fire, has_fired) src/keepalive.rs:138-179
pub async fn watch_for_silence(KeepAlive, Arc<AgentSilencedTrigger>, poll_interval: Duration) src/keepalive.rs:190

Signing (src/sign_outbound.rs):

Item Where
pub struct StampedDeclaration { guest, host } (F4b local) src/sign_outbound.rs:73
pub const PROVENANCE_DECLARED: &str = "declared" src/sign_outbound.rs:93
pub const ENV_SIGN_ALG / ENV_SIGN_KID / ENV_SIGN_HMAC_KEY / ENV_SIGN_ED25519_SK src/sign_outbound.rs:98-104
pub enum SignOutboundError { InvalidConfig, Signer, Serialize } src/sign_outbound.rs:108
pub enum SigningKeyMaterial { Off, Hmac { kid, key }, Ed25519 { kid, signing_key } } src/sign_outbound.rs:127
pub enum SigningOutcome src/sign_outbound.rs:283
pub fn host_stamped_envelope(...) src/sign_outbound.rs:306
pub fn sign_host_stamped_envelope(...) src/sign_outbound.rs:354
pub fn host_stamp_and_sign(...) src/sign_outbound.rs:374

Probes (src/probes/):

Item Where
pub const HOST_PROBE_EVENT_SOURCE = "cellos-host-telemetry/probes" src/probes/mod.rs:78
pub const HOST_PROBE_EVENT_TYPE_PREFIX = "dev.cellos.events.cell.observability.host.v1" src/probes/mod.rs:84
pub struct ProbeContext { cell_id, run_id, spec_signature_hash } src/probes/mod.rs:93
pub struct ProbeReading src/probes/mod.rs:125
pub enum ProbeError src/probes/mod.rs:158
pub trait HostProbe src/probes/mod.rs:191
pub fn build_host_probe_envelope(...) -> CloudEventV1 src/probes/mod.rs:233
pub fn emit_reading(...) re-exported from src/lib.rs:50
Built-in probes: FcMetricsProbe, CgroupProbe, NftablesProbe, TapLinkProbe src/probes/{fc_metrics,cgroup,nftables,tap_link}.rs

#![deny(unsafe_code)] and #![warn(missing_docs)] are enforced at crate root (src/lib.rs:36-37).

Architecture / how it works

Channel-authenticity model (ADR-0006 §5). Firecracker proxies the guest's vsock connection to a per-cell UDS at <vsock_uds_base>_<port>. The supervisor passes the same base path through to this crate (so the telemetry UDS sits alongside the exit-code UDS in one socket dir, making teardown a single remove_dir_all), and the listener binds at <base>_9001 before the workload's first instruction. The host then trusts whichever stream the bytes arrived on — payloads carry no attribution. This is structural: GuestDeclaration literally has no cell_id / run_id / spec_signature_hash fields, and a compile-time witness test in src/lib.rs:186-206 keeps it that way.

Wire format (ADR-0006 §12). Each frame is u32 LE length + CBOR map body. The map must contain content_version: u16 (high byte = major, low byte = minor) — only WIRE_CONTENT_VERSION_MAJOR = 1 is accepted today, and unknown majors are rejected with TelemetryError::UnsupportedVersion (src/listener.rs:203-209). The remaining permitted fields are probe_source, guest_pid, guest_comm, guest_monotonic_ns; anything else is dropped at decode because WireFrame (src/listener.rs:186-193) has nowhere to put it. Frames whose body length is 0 or exceeds 64 KiB (MAX_FRAME_BYTES) are rejected with TelemetryError::Wire.

Per-frame stamping. VsockUdsStream::recv_stamped re-stamps host_received_at on every frame so the receive instant is accurate per event, while cell_id / run_id / spec_signature_hash are per-run and constant (src/listener.rs:138-148). The keep-alive tracker is poked on every successful receive (src/listener.rs:137).

Silence is observable. watch_for_silence polls the KeepAlive tracker at a configurable interval; when last_frame_at.elapsed() >= window it calls AgentSilencedTrigger::fire exactly once. The trigger's fire-once invariant is structural — a second fire call returns None (src/keepalive.rs:161-173). The watcher reads elapsed under the same critical section that checks the silence condition, so a frame landing at the boundary cannot yield an elapsed_ms < keepalive_window_ms in the emitted signal (src/keepalive.rs:202-211).

Signing (F4b). The supervisor calls [host_stamp_and_sign] (src/sign_outbound.rs:374) with a StampedDeclaration + SigningKeyMaterial, gets back a SigningOutcome::Unsigned(CloudEventV1) or SigningOutcome::Signed(SignedEventEnvelopeV1). Signing payload is the canonical-JSON serialization of the FULL CloudEventV1 per cellos_core::trust_keys::canonical_event_signing_payload; mutating any field after the signature is computed makes verification fail (I5 / O2 doctrine). HMAC keys land in a verifier's hmac_keys map; Ed25519 public keys land in verifying_keys. SigningKeyMaterial implements a custom Debug that NEVER prints key bytes — only the variant and the kid — so an accidental {:?} in a tracing span cannot leak key material (src/sign_outbound.rs:147-150).

Host probes (Slot F1a / Path B). Each HostProbe implementation reads concrete inputs from the host (a file path, an endpoint, a table name, an interface name) and returns a ProbeReading carrying probe_source, inputs, and output blocks — D12 doctrine ("every probe-emitted event is attributable to a probe and its concrete inputs/outputs"). build_host_probe_envelope projects readings into cellos_core::CloudEventV1s with source = "cellos-host-telemetry/probes" and type dev.cellos.events.cell.observability.host.v1.<probe>. Per- probe read() implementations are #[cfg(target_os = "linux")]; on other targets they return ProbeError::PlatformUnsupported.

Configuration

Env var Default Effect
CELLOS_HOST_TELEMETRY_SIGN_ALG off One of off, hmac-sha256, ed25519. Anything else is rejected. (src/sign_outbound.rs:98)
CELLOS_HOST_TELEMETRY_SIGN_KID required when alg != off Signer kid embedded in SignedEventEnvelopeV1. (src/sign_outbound.rs:100)
CELLOS_HOST_TELEMETRY_SIGN_HMAC_KEY required when alg=hmac-sha256 Base64url (no-pad, padding tolerated) of the shared HMAC key. (src/sign_outbound.rs:102)
CELLOS_HOST_TELEMETRY_SIGN_ED25519_SK required when alg=ed25519 Base64url of the 32-byte Ed25519 seed. (src/sign_outbound.rs:104)

Setting both *_HMAC_KEY and *_ED25519_SK is rejected — the operator must pick one to avoid ambiguity over which key signed the stream (src/sign_outbound.rs:40-43).

There is no env var for the listener itself; the UDS base path is chosen by the calling backend (cellos-host-firecracker) and passed to VsockUdsListener::bind_for_cell.

Examples

Listener + host-stamping:

use std::path::Path;
use std::time::{Duration, SystemTime};
use cellos_host_telemetry::{
    listener::VsockUdsListener, keepalive::KeepAlive, HostStamp,
};

let listener = VsockUdsListener::bind_for_cell(Path::new(
    "/tmp/cellos-vsock-cell-42.socket",
))?;
let mut stream = listener.accept().await?;
let stamp = HostStamp {
    cell_id: "cell-42".into(),
    run_id: "run-7".into(),
    host_received_at: SystemTime::now(),
    spec_signature_hash: "sha256:deadbeef".into(),
};
let keepalive = KeepAlive::new(Duration::from_secs(10));
while let Some(stamped) = stream.recv_stamped(&stamp, &keepalive).await? {
    // stamped: StampedDeclaration with host-stamped attribution
    let _ = stamped;
}
# Ok::<(), cellos_host_telemetry::TelemetryError>(())

Silence watcher:

use std::sync::Arc;
use std::time::Duration;
use cellos_host_telemetry::keepalive::{
    AgentSilencedTrigger, KeepAlive, watch_for_silence,
};

let keepalive = KeepAlive::new(Duration::from_secs(10));
let trigger = Arc::new(AgentSilencedTrigger::new(
    "cell-42",
    "run-7",
    Duration::from_secs(10),
));
let signal = watch_for_silence(keepalive, trigger, Duration::from_millis(250)).await;
// signal: Option<AgentSilencedSignal> — Some(_) on first silence detection

Signing:

use cellos_host_telemetry::{
    sign_outbound::{host_stamp_and_sign, SigningKeyMaterial, SigningOutcome},
    GuestDeclaration, HostStamp,
};

let key_material = SigningKeyMaterial::from_env()?;
let outcome: SigningOutcome = host_stamp_and_sign(/* ...stamped declaration... */)?;
match outcome {
    SigningOutcome::Unsigned(cloudevent) => { /* emit as-is */ }
    SigningOutcome::Signed(envelope)     => { /* emit wrapped */ }
}
# Ok::<(), Box<dyn std::error::Error>>(())

Testing

cargo test -p cellos-host-telemetry

In-source unit tests cover:

  • Frame decode: unknown major rejected, known major accepted with unknown fields dropped, garbage rejected, UDS bind path, end-to-end round trip with attribution overwrite (src/listener.rs:244-345).
  • Host stamping: host-stamped attribution overrides, host_received_at preserved when supplied explicitly (src/host_stamp.rs:68-111).
  • Keep-alive: fresh tracker is not silenced, post-window is silenced, notify_frame resets timer, trigger fires exactly once, watcher fires after window (src/keepalive.rs:215-277).
  • Constants pinned: vsock port 9001, wire major 1 (src/lib.rs:171-184).

Integration tests under tests/:

File Scope
smoke.rs Host-probe envelope builder, emit_reading against a no-op sink, wire-version / port constants.
kill_the_agent.rs Agent-silenced detection end-to-end.

No #[ignore] gating — the crate's tests all run on every CI leg because the listener works against an in-process Unix Domain Socket (no vsock required).

Related crates

  • cellos-telemetry — the in-guest agent. Forbidden from depending on a signer; emits unsigned declarations over vsock.
  • cellos-host-firecracker — pairs the _9000 exit-code UDS with the _9001 telemetry UDS in the same per-cell socket dir.
  • cellos-coreCloudEventV1, SignedEventEnvelopeV1, canonical_event_signing_payload, EventSink, the trust-key sign/verify primitives.
  • cellos-supervisor — owns the receiver loop and the EventSink the projected envelopes flow into.

ADRs

  • ADR-0006 — In-VM observability runner evidence — the doctrine reference for the entire host-receiver design. Specifically §5 (channel-authenticity), §6 (host-stamped attribution is non-negotiable), §7 (agent_silenced is an observable signal), and §12 (wire-schema versioning) are all enforced in this crate.