cellos-telemetry 0.5.1

In-guest telemetry agent for CellOS — runs as PID 2 inside Firecracker microVMs, emits CBOR-over-vsock observations. No signing key by design (ADR-0006).
Documentation
# cellos-telemetry

The in-guest CellOS telemetry agent. Runs as PID 2 inside every
Firecracker microVM, declares process/network/capability events to the
host supervisor over vsock, and holds no signing key.

## What it is

`cellos-telemetry` is the runner-evidence wedge of ADR-0006: the
process the host trusts to *describe* what is happening inside a guest,
because the channel it speaks on (a vsock CID:port the supervisor
bound before the workload existed) is the authenticity primitive — not
a signature on the payload.

Layer L2 (host runtime / isolation) — guest-side. Forked by
[`cellos-init`](../cellos-init/README.md) as PID 2 *before* the
workload process (PID 3+) starts. The workload's seccomp profile
blocks `kill(2)`, `tgkill(2)`, and `ptrace(2)` against PIDs ≤ 2; the
agent is structurally unreachable from the workload.

What it isn't:
- **Not the host-side receiver.** `cellos-host-telemetry` (separate
  crate) is the host-resident CBOR-over-vsock listener that host-stamps
  arriving frames with `cell_id`, `run_id`, `host_received_at`, and
  the ADG `output` block. This crate produces only the guest fields.
- **Not a signer.** The crate's `Cargo.toml` carries a hard-coded
  DENY list (`ring`, `ed25519-dalek`, `hmac`, `rustls`, `webpki`,
  `sha2-as-mac`) enforced by `cargo-deny` in CI. ADR-0006 §5 — Claim 5a:
  the guest agent never holds a key. Dependencies are `libc` + value
  types from `cellos-core`. A compromised guest could sign anything;
  signature-in-guest is theatre. The host trusts the channel, not the
  payload.
- **Not a kernel tracer.** Probes are `/proc` deltas, one declared
  `inotify` watch, and (stubbed) connect/capability surfaces — no
  eBPF, no kprobes, no kernel modules.

## Public API surface

### Crate-level constants

| Constant | Value | Meaning |
|---|---|---|
| `WIRE_CONTENT_VERSION_MAJOR` | `1` | CBOR wire-format major. Must match `cellos_host_telemetry::WIRE_CONTENT_VERSION_MAJOR` or the host rejects the frame. |
| `VSOCK_TELEMETRY_PORT` | `9001` | Well-known guest→host vsock port. |
| `VMADDR_CID_HOST` | `2` | Host CID per the AF_VSOCK ABI. |
| `MAX_FRAME_BODY_BYTES` | `4096` | Per-frame body cap. |

### Probe identifiers (`probes::*`)

- `process.spawned`, `process.exited` (`/proc` delta walker — implemented)
- `capability.denied` (stub — kernel surface not yet wired)
- `fs.inotify_fired` — declared `inotify` watch
- `net.connect_attempted` (stub)

`probes::ALL` enumerates every known probe id; `probes::is_known(s)`
gates declaration acceptance.

### Wire types

- `GuestTelemetryDeclaration` — the agent's declared probe surface,
  projected from `cellos_core::DeclaredAuthoritySurface` so the host
  can subset-check `declared ⊆ authorized` at admission (F3
  admission-path prep, 2026-05-16).
- `ProbeEvent` — a single emitted event (guest fields only:
  `probe_source`, `guest_pid`, `guest_comm`, `guest_monotonic_ns`,
  `content_version`).
- `WireError` — decode failure reasons.

### CBOR codec (`src/lib.rs`)

- `encode_event_body` / `decode_event_body` — bare CBOR map(5) body.
- `encode_frame` / `decode_frame``u32 LE length || body` framing.

The codec is hand-rolled and minimal: definite-length `map(5)`, uint
major (0), text major (3). No floats, no tags, no indefinite lengths.
`content_version` is always emitted first so the host can short-circuit
unknown majors before walking unknown probe-source strings.

### Probe modules

- `probes::process``/proc` delta walker for spawn/exit.
- `probes::inotify` — one declared `inotify` watch.
- `probes::capability` — capability-denied stub.
- `probes::net_connect` — connect-attempted stub.

The CBOR + framing core is pure-safe Rust (`#![deny(unsafe_code)]`).
Syscall surfaces under `probes/` opt out per-module — `libc::inotify_init1`,
`socket`, `connect`, `fork` have no safe wrapper at this layer.

## Architecture

The wire shape from `src/lib.rs`:

```text
u32 LE length || CBOR map(5) {
    "content_version"     => u16  (FIRST — host short-circuits unknown major)
    "probe_source"        => text
    "guest_pid"           => u32
    "guest_comm"          => text
    "guest_monotonic_ns"  => u64
}
```

The agent fills only those five fields. The supervisor host-stamps
`cell_id`, `run_id`, `host_received_at`, `spec_signature_hash`, and
the ADG `output` block on receipt; anything the agent puts in those
fields is overwritten. That asymmetry is intentional — it is what
makes the channel the authenticity primitive.

**Back-pressure (ADR-0006 §5.3): drop-with-counter.** The agent
surfaces drops via the `cell.observability.guest.telemetry.dropped`
counter, never by blocking the workload. The workload's progress is
never coupled to the agent's I/O.

## Configuration

No env vars and no config file. The agent is parameterised entirely
by what `cellos-init` and the supervisor put on the kernel cmdline +
the bound vsock channel. The `VSOCK_TELEMETRY_PORT` is a compile-time
constant; the host binds the matching `(CID, port)` before the workload
runs.

## Examples

The agent is not invoked directly; it is forked by `cellos-init`. From
inside a supervisor integration test, the host-side counterpart looks
like:

```rust
// Pseudo-code — actual host receiver lives in cellos-host-telemetry.
use cellos_telemetry::{decode_frame, VSOCK_TELEMETRY_PORT};

let mut buf = [0u8; 4 + cellos_telemetry::MAX_FRAME_BODY_BYTES];
let n = vsock_listener.accept(VSOCK_TELEMETRY_PORT, cell_cid, &mut buf)?;
let event = decode_frame(&buf[..n])?;
// host-stamp cell_id, run_id, host_received_at, ADG output …
```

## Testing

Unit tests live inline. Public-surface coverage focuses on the codec
round-trip and the probe-id surface:

```
cargo test -p cellos-telemetry
```

Integration tests for the in-guest agent live alongside
`cellos-init` and `cellos-host-firecracker` — the agent is exercised
end-to-end through a real (or mocked) vsock channel.

The "no signing primitive" claim is enforced at the workspace level by
`cargo-deny` (`deny.toml`): the `cellos-telemetry` musl target dep set
is asserted to be `libc + cellos-core value types only`. Session 19
ratified this; see ADR-0006 Claim 5a / Claim 7.

## Related crates

- [`cellos-init`]../cellos-init/README.md — PID-1 init that forks
  this agent as PID 2 before the workload starts.
- [`cellos-core`]../cellos-core/README.md`DeclaredAuthoritySurface`
  + value types projected into `GuestTelemetryDeclaration`.
- `cellos-host-telemetry` — the host-side receiver. It host-stamps
  arriving frames and emits the typed CloudEvents downstream.
- [`cellos-host-firecracker`]../cellos-host-firecracker/ — binds the
  per-cell vsock CID and the `VSOCK_TELEMETRY_PORT` channel before the
  workload runs.

## ADRs

- [ADR-0006]../../docs/adr/0006-in-vm-observability-runner-evidence.md  in-VM observability as the runner-evidence wedge. This crate is the
  concrete implementation. Key claims this crate enforces:
  - **Claim 5** — no signing key in the guest; supervisor signs
    outbound CloudEvents using its host-side key after host-stamping.
  - **Claim 5a**`cellos-telemetry` Cargo manifest forbids any
    signing primitive; CI cargo-deny gate enforces.
  - **§5.2** — agent runs as PID 2, forked by `cellos-init` before
    the workload (PID 3+).
  - **§5.3** — drop-with-counter back-pressure.
  - **§12** — wire-schema versioning (`content_version` first).
- [ADR-0001]../../docs/adr/0001-rust-nats-jetstream-proprietary-host.md  the proprietary host backend decision that puts the guest-observation
  surface inside the cell.