# cellos-host-telemetry
The host side of the in-VM observability pipeline — per-cell vsock UDS
listener, host-stamping, agent-silenced detection, host-side probes,
and supervisor-side per-event signing of outbound CloudEvents.
## What it is
`cellos-host-telemetry` is the supervisor-side receiver for the
observability path defined in
[ADR-0006 — in-VM observability runner evidence](../../docs/adr/0006-in-vm-observability-runner-evidence.md).
It is **not** the in-guest agent — that one lives in
[`cellos-telemetry`](../cellos-telemetry) — and the split is deliberate
(see "Channel-authenticity model" below).
The crate does five jobs:
1. **Bind a per-cell UDS** at `<vsock_uds_base>_9001` *before* the VM
boots, mirroring the `_9000` exit-code listener pattern in
`cellos-host-firecracker::listen_for_exit_code`
(`crates/cellos-host-firecracker/src/lib.rs:1669`). Bind-before-boot
is what makes the channel-authenticity primitive hold: the host
trusts WHICH UDS path the bytes arrived on, not anything in the
payload.
2. **Decode CBOR-framed guest declarations** (`src/listener.rs:199`)
with a `content_version` major-version gate
(ADR-0006 §12) — unknown majors are rejected with
`TelemetryError::UnsupportedVersion`.
3. **Host-stamp every frame** (`src/host_stamp.rs:28`) so `cell_id`,
`run_id`, `host_received_at`, and `spec_signature_hash` come
exclusively from the supervisor. The `GuestDeclaration` type has no
attribution fields at all, so a compromised guest cannot forge them
(`src/lib.rs:96-108`).
4. **Detect agent silence** (`src/keepalive.rs`) — a `KeepAlive` tracker
the listener pokes per-frame, a fire-once `AgentSilencedTrigger`, and
a `watch_for_silence` watcher loop that fires
`cell.observability.guest.agent_silenced` exactly once per run.
5. **Sign outbound envelopes** (`src/sign_outbound.rs`) — supervisor-side
per-event signing using the canonical-JSON payload from
`cellos_core::trust_keys::canonical_event_signing_payload`. Three
modes (`Off`, `Hmac` HMAC-SHA256 FIPS 198, `Ed25519` via
`ed25519-dalek`), driven by env vars.
It additionally exposes a small **host-probe** surface (`src/probes/`,
Slot F1a / Path B) — `HostProbe` trait + four built-in probes
(`fc_metrics`, `cgroup`, `nftables`, `tap_link`) that watch the cell
from outside the guest using primitives the supervisor already controls
(VMM `/metrics` endpoint, cgroup-v2 files, nftables counters, TAP link
state). These are the cross-witness for the guest's Path A
declarations (`src/probes/mod.rs:1-54`).
L2 sits in the [layer model](../../LAYERS.md) at "host runtime /
isolation"; this crate is the host half of the observability spine that
runs *next to* L2 and feeds the L3 supervisor's event sink.
What it deliberately does **not** do:
- It does **not** accept signing primitives the guest could use. Per
ADR-0006 §5, the guest holds no key material — the supervisor signs.
This is different from [`cellos-telemetry`](../cellos-telemetry),
which is forbidden from depending on a signer (`src/lib.rs:18-23`).
- It does **not** trust ANY attribution field the guest may have
stuffed into the wire payload. Unknown CBOR keys are silently dropped
at decode (`src/listener.rs:181-217`, structurally enforced by
`WireFrame` having only the five permitted fields).
- It does **not** write to disk, NATS, or any other sink directly. The
outputs are values (`StampedDeclaration`, `AgentSilencedSignal`,
`CloudEventV1`, `SignedEventEnvelopeV1`); the supervisor projects
them onto the configured `EventSink`.
## Public API surface
Top-level (`src/lib.rs`):
| `pub const VSOCK_TELEMETRY_PORT: u32 = 9001` | `src/lib.rs:61` |
| `pub const WIRE_CONTENT_VERSION_MAJOR: u16 = 1` | `src/lib.rs:68` |
| `pub enum TelemetryError { Bind, Wire, UnsupportedVersion }` | `src/lib.rs:72` |
| `pub struct GuestDeclaration { probe_source, guest_pid, guest_comm, guest_monotonic_ns }` | `src/lib.rs:97` |
| `pub struct HostStamp { cell_id, run_id, host_received_at, spec_signature_hash }` | `src/lib.rs:113` |
| `pub struct HostProbeReading { probe, value_json, timestamp_ms }` | `src/lib.rs:134` |
| `pub struct StampedDeclaration { cell_id, run_id, host_received_at, spec_signature_hash, probe_source, guest_pid, guest_comm, guest_monotonic_ns }` | `src/lib.rs:150` |
Listener (`src/listener.rs`):
| `pub const MAX_FRAME_BYTES: u32 = 64 * 1024` | `src/listener.rs:50` |
| `pub struct VsockUdsListener` | `src/listener.rs:60` |
| `VsockUdsListener::bind_for_cell(&Path)`, `socket_path()`, `accept()` | `src/listener.rs:65-112` |
| `pub struct VsockUdsStream` | `src/listener.rs:115` |
| `VsockUdsStream::recv_stamped(&HostStamp, &KeepAlive)` | `src/listener.rs:128` |
| `VsockUdsStream::recv_guest_declaration()` | `src/listener.rs:153` |
| `pub fn decode_frame(body: &[u8]) -> Result<GuestDeclaration, TelemetryError>` | `src/listener.rs:199` |
Host-stamping (`src/host_stamp.rs`):
| `pub fn stamp(GuestDeclaration, HostStamp) -> StampedDeclaration` | `src/host_stamp.rs:28` |
| `pub fn stamp_now(GuestDeclaration, cell_id, run_id, spec_signature_hash)` | `src/host_stamp.rs:51` |
Keep-alive (`src/keepalive.rs`):
| `pub const DEFAULT_KEEPALIVE_WINDOW: Duration = Duration::from_secs(10)` | `src/keepalive.rs:36` |
| `pub struct KeepAlive` (`new`, `window`, `notify_frame`, `is_silenced`) | `src/keepalive.rs:41-79` |
| `pub struct AgentSilencedSignal` | `src/keepalive.rs:85` |
| `AgentSilencedSignal::CLOUDEVENT_TYPE = "dev.cellos.events.cell.observability.v1.guest.agent_silenced"` | `src/keepalive.rs:105` |
| `pub struct AgentSilencedTrigger` (`new`, `fire`, `has_fired`) | `src/keepalive.rs:138-179` |
| `pub async fn watch_for_silence(KeepAlive, Arc<AgentSilencedTrigger>, poll_interval: Duration)` | `src/keepalive.rs:190` |
Signing (`src/sign_outbound.rs`):
| `pub struct StampedDeclaration { guest, host }` (F4b local) | `src/sign_outbound.rs:73` |
| `pub const PROVENANCE_DECLARED: &str = "declared"` | `src/sign_outbound.rs:93` |
| `pub const ENV_SIGN_ALG / ENV_SIGN_KID / ENV_SIGN_HMAC_KEY / ENV_SIGN_ED25519_SK` | `src/sign_outbound.rs:98-104` |
| `pub enum SignOutboundError { InvalidConfig, Signer, Serialize }` | `src/sign_outbound.rs:108` |
| `pub enum SigningKeyMaterial { Off, Hmac { kid, key }, Ed25519 { kid, signing_key } }` | `src/sign_outbound.rs:127` |
| `pub enum SigningOutcome` | `src/sign_outbound.rs:283` |
| `pub fn host_stamped_envelope(...)` | `src/sign_outbound.rs:306` |
| `pub fn sign_host_stamped_envelope(...)` | `src/sign_outbound.rs:354` |
| `pub fn host_stamp_and_sign(...)` | `src/sign_outbound.rs:374` |
Probes (`src/probes/`):
| `pub const HOST_PROBE_EVENT_SOURCE = "cellos-host-telemetry/probes"` | `src/probes/mod.rs:78` |
| `pub const HOST_PROBE_EVENT_TYPE_PREFIX = "dev.cellos.events.cell.observability.host.v1"` | `src/probes/mod.rs:84` |
| `pub struct ProbeContext { cell_id, run_id, spec_signature_hash }` | `src/probes/mod.rs:93` |
| `pub struct ProbeReading` | `src/probes/mod.rs:125` |
| `pub enum ProbeError` | `src/probes/mod.rs:158` |
| `pub trait HostProbe` | `src/probes/mod.rs:191` |
| `pub fn build_host_probe_envelope(...) -> CloudEventV1` | `src/probes/mod.rs:233` |
| `pub fn emit_reading(...)` | re-exported from `src/lib.rs:50` |
| Built-in probes: `FcMetricsProbe`, `CgroupProbe`, `NftablesProbe`, `TapLinkProbe` | `src/probes/{fc_metrics,cgroup,nftables,tap_link}.rs` |
`#![deny(unsafe_code)]` and `#![warn(missing_docs)]` are enforced at
crate root (`src/lib.rs:36-37`).
## Architecture / how it works
**Channel-authenticity model (ADR-0006 §5).** Firecracker proxies the
guest's vsock connection to a per-cell UDS at
`<vsock_uds_base>_<port>`. The supervisor passes the *same* base path
through to this crate (so the telemetry UDS sits alongside the exit-code
UDS in one socket dir, making teardown a single `remove_dir_all`), and
the listener binds at `<base>_9001` before the workload's first
instruction. The host then trusts whichever stream the bytes arrived on
— payloads carry no attribution. This is structural: `GuestDeclaration`
literally has no `cell_id` / `run_id` / `spec_signature_hash` fields, and
a compile-time witness test in `src/lib.rs:186-206` keeps it that way.
**Wire format (ADR-0006 §12).** Each frame is `u32 LE length` + CBOR map
body. The map must contain `content_version: u16` (high byte = major,
low byte = minor) — only `WIRE_CONTENT_VERSION_MAJOR = 1` is accepted
today, and unknown majors are rejected with
`TelemetryError::UnsupportedVersion` (`src/listener.rs:203-209`). The
remaining permitted fields are `probe_source`, `guest_pid`,
`guest_comm`, `guest_monotonic_ns`; anything else is dropped at decode
because `WireFrame` (`src/listener.rs:186-193`) has nowhere to put it.
Frames whose body length is 0 or exceeds 64 KiB (`MAX_FRAME_BYTES`) are
rejected with `TelemetryError::Wire`.
**Per-frame stamping.** `VsockUdsStream::recv_stamped` re-stamps
`host_received_at` on every frame so the receive instant is accurate per
event, while `cell_id` / `run_id` / `spec_signature_hash` are per-run
and constant (`src/listener.rs:138-148`). The keep-alive tracker is
poked on every successful receive (`src/listener.rs:137`).
**Silence is observable.** `watch_for_silence` polls the `KeepAlive`
tracker at a configurable interval; when `last_frame_at.elapsed() >=
window` it calls `AgentSilencedTrigger::fire` exactly once. The
trigger's fire-once invariant is structural — a second `fire` call
returns `None` (`src/keepalive.rs:161-173`). The watcher reads
`elapsed` under the same critical section that checks the silence
condition, so a frame landing at the boundary cannot yield an
`elapsed_ms < keepalive_window_ms` in the emitted signal
(`src/keepalive.rs:202-211`).
**Signing (F4b).** The supervisor calls
[`host_stamp_and_sign`] (`src/sign_outbound.rs:374`) with a
`StampedDeclaration` + `SigningKeyMaterial`, gets back a
`SigningOutcome::Unsigned(CloudEventV1)` or
`SigningOutcome::Signed(SignedEventEnvelopeV1)`. Signing payload is the
canonical-JSON serialization of the FULL `CloudEventV1` per
`cellos_core::trust_keys::canonical_event_signing_payload`; mutating
any field after the signature is computed makes verification fail (I5 /
O2 doctrine). HMAC keys land in a verifier's `hmac_keys` map; Ed25519
public keys land in `verifying_keys`. `SigningKeyMaterial` implements a
custom `Debug` that NEVER prints key bytes — only the variant and the
kid — so an accidental `{:?}` in a tracing span cannot leak key material
(`src/sign_outbound.rs:147-150`).
**Host probes (Slot F1a / Path B).** Each `HostProbe` implementation
reads concrete inputs from the host (a file path, an endpoint, a table
name, an interface name) and returns a `ProbeReading` carrying
`probe_source`, `inputs`, and `output` blocks — D12 doctrine ("every
probe-emitted event is attributable to a probe and its concrete
inputs/outputs"). `build_host_probe_envelope` projects readings into
`cellos_core::CloudEventV1`s with `source = "cellos-host-telemetry/probes"`
and type `dev.cellos.events.cell.observability.host.v1.<probe>`. Per-
probe `read()` implementations are `#[cfg(target_os = "linux")]`; on
other targets they return `ProbeError::PlatformUnsupported`.
## Configuration
| `CELLOS_HOST_TELEMETRY_SIGN_ALG` | `off` | One of `off`, `hmac-sha256`, `ed25519`. Anything else is rejected. (`src/sign_outbound.rs:98`) |
| `CELLOS_HOST_TELEMETRY_SIGN_KID` | required when alg != off | Signer kid embedded in `SignedEventEnvelopeV1`. (`src/sign_outbound.rs:100`) |
| `CELLOS_HOST_TELEMETRY_SIGN_HMAC_KEY` | required when alg=hmac-sha256 | Base64url (no-pad, padding tolerated) of the shared HMAC key. (`src/sign_outbound.rs:102`) |
| `CELLOS_HOST_TELEMETRY_SIGN_ED25519_SK` | required when alg=ed25519 | Base64url of the 32-byte Ed25519 seed. (`src/sign_outbound.rs:104`) |
Setting both `*_HMAC_KEY` and `*_ED25519_SK` is rejected — the operator
must pick one to avoid ambiguity over which key signed the stream
(`src/sign_outbound.rs:40-43`).
There is no env var for the listener itself; the UDS base path is
chosen by the calling backend (`cellos-host-firecracker`) and passed to
`VsockUdsListener::bind_for_cell`.
## Examples
Listener + host-stamping:
```rust
use std::path::Path;
use std::time::{Duration, SystemTime};
use cellos_host_telemetry::{
listener::VsockUdsListener, keepalive::KeepAlive, HostStamp,
};
let listener = VsockUdsListener::bind_for_cell(Path::new(
"/tmp/cellos-vsock-cell-42.socket",
))?;
let mut stream = listener.accept().await?;
let stamp = HostStamp {
cell_id: "cell-42".into(),
run_id: "run-7".into(),
host_received_at: SystemTime::now(),
spec_signature_hash: "sha256:deadbeef".into(),
};
let keepalive = KeepAlive::new(Duration::from_secs(10));
while let Some(stamped) = stream.recv_stamped(&stamp, &keepalive).await? {
// stamped: StampedDeclaration with host-stamped attribution
let _ = stamped;
}
# Ok::<(), cellos_host_telemetry::TelemetryError>(())
```
Silence watcher:
```rust
use std::sync::Arc;
use std::time::Duration;
use cellos_host_telemetry::keepalive::{
AgentSilencedTrigger, KeepAlive, watch_for_silence,
};
let keepalive = KeepAlive::new(Duration::from_secs(10));
let trigger = Arc::new(AgentSilencedTrigger::new(
"cell-42",
"run-7",
Duration::from_secs(10),
));
let signal = watch_for_silence(keepalive, trigger, Duration::from_millis(250)).await;
// signal: Option<AgentSilencedSignal> — Some(_) on first silence detection
```
Signing:
```rust
use cellos_host_telemetry::{
sign_outbound::{host_stamp_and_sign, SigningKeyMaterial, SigningOutcome},
GuestDeclaration, HostStamp,
};
let key_material = SigningKeyMaterial::from_env()?;
let outcome: SigningOutcome = host_stamp_and_sign(/* ...stamped declaration... */)?;
match outcome {
SigningOutcome::Unsigned(cloudevent) => { /* emit as-is */ }
SigningOutcome::Signed(envelope) => { /* emit wrapped */ }
}
# Ok::<(), Box<dyn std::error::Error>>(())
```
## Testing
```
cargo test -p cellos-host-telemetry
```
In-source unit tests cover:
- Frame decode: unknown major rejected, known major accepted with
unknown fields dropped, garbage rejected, UDS bind path, end-to-end
round trip with attribution overwrite (`src/listener.rs:244-345`).
- Host stamping: host-stamped attribution overrides, `host_received_at`
preserved when supplied explicitly (`src/host_stamp.rs:68-111`).
- Keep-alive: fresh tracker is not silenced, post-window is silenced,
`notify_frame` resets timer, trigger fires exactly once, watcher fires
after window (`src/keepalive.rs:215-277`).
- Constants pinned: vsock port 9001, wire major 1
(`src/lib.rs:171-184`).
Integration tests under `tests/`:
| `smoke.rs` | Host-probe envelope builder, `emit_reading` against a no-op sink, wire-version / port constants. |
| `kill_the_agent.rs` | Agent-silenced detection end-to-end. |
No `#[ignore]` gating — the crate's tests all run on every CI leg
because the listener works against an in-process Unix Domain Socket
(no vsock required).
## Related crates
- [`cellos-telemetry`](../cellos-telemetry) — the in-guest agent. Forbidden
from depending on a signer; emits unsigned declarations over vsock.
- [`cellos-host-firecracker`](../cellos-host-firecracker) — pairs the
`_9000` exit-code UDS with the `_9001` telemetry UDS in the same
per-cell socket dir.
- [`cellos-core`](../cellos-core) — `CloudEventV1`,
`SignedEventEnvelopeV1`, `canonical_event_signing_payload`,
`EventSink`, the trust-key sign/verify primitives.
- [`cellos-supervisor`](../cellos-supervisor) — owns the receiver loop
and the `EventSink` the projected envelopes flow into.
## ADRs
- [ADR-0006 — In-VM observability runner evidence](../../docs/adr/0006-in-vm-observability-runner-evidence.md)
— the doctrine reference for the entire host-receiver design.
Specifically §5 (channel-authenticity), §6 (host-stamped attribution
is non-negotiable), §7 (`agent_silenced` is an observable signal), and
§12 (wire-schema versioning) are all enforced in this crate.