# cellos-host-firecracker
The Firecracker microVM [`CellBackend`] — L2-06 production host backend.
Boots one Firecracker VMM per cell behind a jailer and a per-cell
manifest, runs `spec.run.argv` under `cellos-init`, and reports
authenticated exit codes back to the supervisor over a per-cell vsock
UDS.
## What it is
`cellos-host-firecracker` implements [`cellos_core::ports::CellBackend`]
on top of a real Firecracker VMM. Each cell is one VMM child process,
one chroot under `jailer`, one ext4 rootfs (read-only when a scratch
drive is attached), one writable virtio-blk scratch image (optional),
one TAP interface + nftables egress ruleset (Linux only), and one
vsock-backed Unix Domain Socket family (`_9000` for exit code,
`_9001` for telemetry).
It is the **L2** (host runtime / isolation) production target named in
[LAYERS.md](../../LAYERS.md): "Map an authority bundle to real
isolation". The supervisor selects it with `CELLOS_CELL_BACKEND=firecracker`
(`crates/cellos-supervisor/src/composition.rs:1031`). The backend
constructor takes a [`FirecrackerConfig`] (which can be populated from
env via `FirecrackerConfig::from_env()`); env validation is strict
because every flag here trades isolation for ergonomics and we want
production misconfigurations to fail closed.
The crate has three sub-modules:
- `lib.rs` — [`FirecrackerConfig`], [`FirecrackerCellBackend`], the
`CellBackend` impl, the in-VM exit-code bridge, the HMAC verification
helpers, jailer / chroot wiring, manifest verification, vsock UDS
listener (`listen_for_exit_code`), TAP + nftables setup, scratch ext4
provisioning, graceful-shutdown plumbing, FD-leak guards.
- `api_client.rs` — minimal `hyper` client for the Firecracker
Management API (`PUT /machine-config`, `PUT /boot-source`,
`PUT /drives/{...}`, `PUT /network-interfaces/{...}`,
`PUT /vsock`, `PUT /actions`, `PUT /snapshot/create`,
`PUT /snapshot/load`, `PATCH /vm`).
- `pool.rs` — [`FirecrackerPool`], the warm pool state machine (snapshot
→ restore) for fast cell startup. The pool is opt-in via
`CELLOS_FIRECRACKER_POOL_SIZE`; default 0 means a zero-slot no-op pool
that always misses, so wiring is an inert pass-through on the
cold-boot path.
What it deliberately does **not** do:
- It does **not** run `spec.run.argv` as a host subprocess. The argv is
base64-encoded into the kernel `cmdline` as `cellos.argv=<b64>`;
`cellos-init` reads `/proc/cmdline`, forks/execs inside the guest,
and the supervisor reads the exit code back over vsock. The
host-subprocess fallback in `cellos-supervisor` is shadowed when this
backend reports an in-VM exit (`src/lib.rs:1-31`).
- It does **not** trust the guest's exit code unauthenticated. The
4-byte little-endian `i32` is followed by a 32-byte HMAC-SHA256 tag
over `exit_code_bytes ‖ cell_id_bytes` keyed by a per-cell 32-byte
key read from `/dev/urandom` at `create()` (FC-18, `src/lib.rs:134-218`).
- It does **not** boot non-Linux. The whole backend body is
`#[cfg(target_os = "linux")]`; on Windows/macOS the struct still
constructs (so the supervisor composition root type-checks
everywhere) but every `CellBackend` method returns an
`Unsupported`-shaped `CellosError::Host` (`src/lib.rs:632-639`).
- It does **not** terminate TLS in-VM. Per ADR-0004 / ADR-0005, TLS
termination lives on the host's egress path, not inside the cell.
## Public API surface
| `pub const VSOCK_EXIT_PORT: u32 = 9000` | `src/lib.rs:134` |
| `pub struct FirecrackerConfig { ... 13 fields ... }` | `src/lib.rs:238` |
| `FirecrackerConfig::from_env() -> Result<Self, CellosError>` | `src/lib.rs:333` |
| `pub struct FirecrackerCellBackend` | `src/lib.rs:640` |
| `FirecrackerCellBackend::new(FirecrackerConfig)` | `src/lib.rs:663` |
| `FirecrackerCellBackend::from_env() -> Result<Self, CellosError>` | `src/lib.rs:687` |
| `FirecrackerCellBackend::with_event_sink(Arc<dyn EventSink>) -> Self` | `src/lib.rs:701` |
| `FirecrackerCellBackend::config(&self) -> &FirecrackerConfig` | `src/lib.rs:706` |
| `FirecrackerCellBackend::pool_size(&self) -> usize` (async) | `src/lib.rs:715` |
| `FirecrackerCellBackend::pool_available(&self) -> usize` (async) | `src/lib.rs:728` |
| `impl CellBackend for FirecrackerCellBackend` | `src/lib.rs:831` (Linux), `src/lib.rs:1334` (non-Linux stub) |
| `pub mod api_client` — `FirecrackerApiClient`, `BootSource`, `Drive`, `NetworkInterface`, `VsockDevice`, `MachineConfig`, `InstanceAction`, `InstanceActionType`, `SnapshotCreate`, `SnapshotLoad`, `MemBackend`, `MemBackendType`, `SnapshotType`, `VmState`, `VmStatePatch` | `src/api_client.rs` |
| `pub mod pool` — `FirecrackerPool`, `PoolSlot::{Available,InUse,Empty}`, `POOL_SIZE_ENV`, `pool_size_from_env()` | `src/pool.rs` |
| `pub const pool::POOL_SIZE_ENV = "CELLOS_FIRECRACKER_POOL_SIZE"` | `src/pool.rs:62` |
| `pub(crate) fn verify_exit_hmac(key, exit_code_bytes, cell_id, received_tag) -> bool` | `src/lib.rs:187` |
`FirecrackerConfig` (in declaration order, `src/lib.rs:238-330`):
| `binary_path: PathBuf` | Firecracker VMM binary. |
| `kernel_image_path: PathBuf` | Kernel image (`vmlinux`). |
| `rootfs_image_path: PathBuf` | Rootfs ext4 image. |
| `jailer_binary_path: Option<PathBuf>` | When present, `create()` exec's via jailer. |
| `chroot_base_dir: PathBuf` | Jailer chroot base (default `/var/lib/cellos/firecracker`). |
| `socket_dir: PathBuf` | API socket dir when jailer is *off* (default `/tmp`). |
| `jailer_uid: u32`, `jailer_gid: u32` | Drop-to ids (default 10002 / 10002). |
| `scratch_dir: Option<PathBuf>` | When set, rootfs is read-only and a writable scratch ext4 is attached as a second virtio-blk. |
| `manifest_path: Option<PathBuf>` | Artifact manifest. `create()` verifies SHA-256 of each declared artifact before boot. |
| `require_jailer: bool` | Default `true`; `create()` refuses to proceed without the jailer unless explicitly overridden. |
| `allow_no_manifest: bool` | Default `false`; the two-flag opt-out (`*_ALLOW_NO_MANIFEST=1` AND `*_ALLOW_NO_MANIFEST_REALLY=1`) is needed to flip this on. |
| `enable_network: bool` | TAP + nftables egress filter. Default `true` on Linux, `false` elsewhere. |
| `allow_no_vsock: bool` | Bounded-timeout vsock wait (default 5s) — surfaces a misconfigured kernel as `forced` termination instead of an indefinite hang. |
| `no_vsock_timeout: Duration` | Wait budget when `allow_no_vsock` is true. |
| `no_seccomp: bool` | Pass `--no-seccomp` to Firecracker (arm64 emulation / Rosetta workaround). Never set in production. |
## Architecture / how it works
**Lifecycle per cell.**
`create(spec)`:
1. Compute the per-cell HMAC key from `/dev/urandom`
(`generate_exit_hmac_key`, `src/lib.rs:161-168`).
2. Verify the artifact manifest if `manifest_path` is set, populating
`kernel_digest_sha256` / `rootfs_digest_sha256` /
`firecracker_digest_sha256` on the returned handle.
3. Consult the warm pool (`pool::checkout`); on a hit, `PUT /snapshot/load`
with the snapshot + mem-file paths and skip the `PUT /machine-config`
+ `PUT /boot-source` arc. On a miss (or pool disabled), follow the
cold-boot path.
4. Stage the chroot (`<chroot_base>/firecracker/<cell_id>/root/...`),
hard-link / copy the rootfs and optional scratch image into the
chroot, generate per-cell nftables ruleset from
`spec.authority.egress`, bring up the TAP, spawn `firecracker
--api-sock <socket>` under the jailer (or directly if
`*_ALLOW_NO_JAILER=1`), wait for the API socket to appear (max
`SOCKET_READY_TIMEOUT = 10s`, `src/lib.rs:77`), drive the Management
API.
5. Bind the per-cell exit-code UDS (`<vsock_uds_path>_9000`) and
telemetry UDS (`<vsock_uds_path>_9001` — owned by
[`cellos-host-telemetry`](../cellos-host-telemetry) but rendezvoused
in the same socket dir) BEFORE booting the guest.
6. `PUT /actions InstanceStart`. The guest's `cellos-init` reads
`cellos.argv=<base64-json>` from `/proc/cmdline`, forks the workload,
then writes a 36-byte authenticated frame
(`4-byte LE i32 || 32-byte HMAC-SHA256 tag`) back to
`<vsock_uds_path>_9000` before powering off (`src/lib.rs:134-148`).
`wait_for_in_vm_exit(cell_id)` reads the 36-byte frame, recomputes the
HMAC with the per-cell key, and rejects a mismatched tag in
constant time via `hmac::Mac::verify_slice` (`src/lib.rs:172-218`).
`destroy(handle)` sends `SendCtrlAltDel` via the Management API, waits
up to the per-cell graceful-shutdown budget
(`spec.run.limits.graceful_shutdown_seconds`, defaulting to
`GRACEFUL_SHUTDOWN_TIMEOUT = 5s` per `src/lib.rs:84` and
`resolve_graceful_shutdown_timeout` at `src/lib.rs:93`), then SIGKILLs.
TAP + nftables table are removed; the chroot is unlinked; the warm-pool
slot transitions to `Empty` and the background filler can re-snapshot.
**Warm pool (L2-06-2, `src/pool.rs`).** Cold boot is ~125 ms; restore is
~10 ms. Slot lifecycle is `Empty --fill()--> Available --checkout()-->
InUse --checkin()--> Empty`. `checkin` returns to `Empty` (not
`Available`) by design: a VM that ran a cell is no longer at the
parked-init snapshot state. A background task re-fills slots after
`destroy()`. The fill task is spawned at supervisor startup when the env
var resolves > 0 (`crates/cellos-supervisor/src/composition.rs:1051-1075`).
**HMAC exit auth (FC-18).** Without authentication, anything inside the
guest with vsock access could spoof a "successful" exit. The
36-byte frame (`EXIT_AUTHED_FRAME_LEN`, `src/lib.rs:147`) commits the
exit code AND the cell id under a key only the host knows. Both ends use
constant-time compare. The verification helper
(`src/lib.rs:187-218`) is `pub(crate)`; tests reach it through a
doc-hidden `__fc18` shim.
**Manifest verification (FC-08).** When `CELLOS_FIRECRACKER_MANIFEST`
points to a v1 manifest file, `create()` re-hashes the kernel, rootfs,
and Firecracker binary before boot. Digest mismatch is a hard error.
The opt-out is intentionally two flags — see "Configuration" below.
**Jailer.** The jailer drops to `jailer_uid` / `jailer_gid` (10002 by
default) and chroots into `<chroot_base>/firecracker/<cell_id>/root`.
The API socket then lives at
`<chroot_base>/firecracker/<cell_id>/root/run/firecracker.socket`
(`src/lib.rs:244-247`). The require-jailer flag is on by default and
flipping it off requires the explicit `*_ALLOW_NO_JAILER=1` opt-out
that emits a loud warning (`src/lib.rs:268-269`).
**Non-Linux stub.** Outside Linux, the `CellBackend` impl returns
`CellosError::Host` from every method; tests can still link the crate
on macOS / Windows hosts (`src/lib.rs:1334-...`).
## Configuration
All env vars are read by `FirecrackerConfig::from_env()` (or the
`from_lookup` helper for tests). Required absolute paths fail-closed at
init if absent.
| `CELLOS_CELL_BACKEND` | unset | Set to `firecracker` to select this backend. |
| `CELLOS_FIRECRACKER_BINARY` | **required** | Absolute path to the Firecracker VMM binary. |
| `CELLOS_FIRECRACKER_KERNEL_IMAGE` | **required** | Absolute path to the kernel image. |
| `CELLOS_FIRECRACKER_ROOTFS_IMAGE` | **required** | Absolute path to the rootfs ext4 image. |
| `CELLOS_FIRECRACKER_JAILER_BINARY` | unset | Absolute path to the jailer; presence enables jailer mode. |
| `CELLOS_FIRECRACKER_CHROOT_BASE` | `/var/lib/cellos/firecracker` | Chroot base for the jailer. |
| `CELLOS_FIRECRACKER_SOCKET_DIR` | `/tmp` | API socket dir when the jailer is OFF. |
| `CELLOS_FIRECRACKER_JAILER_UID` / `_GID` | `10002` / `10002` | Drop-to ids inside the jailer. Must be non-root in production. |
| `CELLOS_FIRECRACKER_SCRATCH_DIR` | unset | When set, rootfs is mounted read-only and a writable scratch ext4 is attached as a second virtio-blk. |
| `CELLOS_FIRECRACKER_MANIFEST` | unset | Path to the v1 artifact manifest. Required unless the two-flag opt-out is set. |
| `CELLOS_FIRECRACKER_REQUIRE_JAILER` | `true` | Explicit override for the require-jailer flag. |
| `CELLOS_FIRECRACKER_ALLOW_NO_JAILER` | `0` | Dev opt-out for the jailer. Emits a loud warning. |
| `CELLOS_FIRECRACKER_ALLOW_NO_MANIFEST` | `0` | First half of the two-flag manifest opt-out. |
| `CELLOS_FIRECRACKER_ALLOW_NO_MANIFEST_REALLY` | `0` | Second half. Both must be `1`; emits a `MANIFEST VERIFICATION DISABLED` warning. Setting only one is rejected. Setting these *and* `CELLOS_FIRECRACKER_MANIFEST` is rejected. |
| `CELLOS_FIRECRACKER_ENABLE_NETWORK` | `1` on Linux, `0` elsewhere | TAP + nftables egress filter. |
| `CELLOS_FIRECRACKER_ALLOW_NO_VSOCK` | `0` | Bounded-timeout vsock wait. |
| `CELLOS_FIRECRACKER_NO_VSOCK_TIMEOUT_SECS` | `5` | Wait budget when `allow_no_vsock` is `1`. |
| `CELLOS_FIRECRACKER_NO_SECCOMP` | `0` | Pass `--no-seccomp` to Firecracker. arm64-emulation workaround; never in production. |
| `CELLOS_FIRECRACKER_POOL_SIZE` | `0` | Warm-pool slot count. `0` disables the pool. |
The two-flag manifest opt-out exists because a single env var can be set
in a base image, a Helm chart copied between environments, or an `.env`
file leaking from dev to prod by mistake. Requiring a paired
`_REALLY=1` forces the operator to make the trade-off explicit on the
same line, in the same operation (`src/lib.rs:280-285`).
## Examples
```rust
use std::sync::Arc;
use cellos_core::ports::CellBackend;
use cellos_host_firecracker::{FirecrackerCellBackend, FirecrackerConfig};
// From env (validates required paths up front; fails fast on misconfig).
let backend: Arc<dyn CellBackend> = Arc::new(FirecrackerCellBackend::from_env()?);
// Or pin the config inline.
let cfg = FirecrackerConfig::from_env()?;
let backend: Arc<dyn CellBackend> = Arc::new(FirecrackerCellBackend::new(cfg));
// Optional: attach an EventSink so `cell.firecracker.v1.pool_checkout`
// CloudEvents flow alongside the regular lifecycle stream.
// let backend = FirecrackerCellBackend::from_env()?.with_event_sink(sink);
# Ok::<(), cellos_core::CellosError>(())
```
Warm-pool sizing:
```bash
export CELLOS_FIRECRACKER_POOL_SIZE=8
# Supervisor spawns one background fill task at startup.
# pool::pool_size_from_env() resolves the value at backend construction.
```
## Testing
```
# Unit tests (always run, no KVM required).
cargo test -p cellos-host-firecracker
# Integration suite — requires Linux + /dev/kvm + opt-in.
cargo test -p cellos-host-firecracker -- --ignored
```
The `tests/` directory contains 41 integration tests. The
`#[ignore]`-gated subset needs Linux, `/dev/kvm`, the jailer binary,
`CAP_NET_ADMIN` for nftables / TAP manipulation, and (in some cases) the
artifact manifest. Highlights:
| `firecracker_e2e_exit_42.rs` | FC-16 canonical exit-42 e2e canary — a real workload returns 42 over the authenticated vsock frame. |
| `vsock_exit_auth.rs` | FC-18 HMAC-authenticated exit code — rejects forged tags. |
| `vsock_recv_upper_bound.rs` | The 4 + 32 byte frame is the upper bound on what `wait_for_in_vm_exit` accepts. |
| `firecracker_table_ip6_enforcement.rs` / `firecracker_ipv6_isolation.rs` / `nft_egress_drop.rs` | nftables egress drops undeclared destinations (v4 + v6). |
| `jailer_isolation.rs` / `jailer_failure_isolation.rs` / `fc64_jailer_chroot_escape_rejected.rs` | Jailer chroot and capability-bounded execution. |
| `boot_soak_50.rs` | 50 cold boots end-to-end. Heavy — opt-in only via `--ignored`. |
| `cleanup_on_crash.rs` / `cleanup_fault_injection.rs` / `cleanup_leak_check.rs` / `scratch_cleanup.rs` / `tap_device_lifecycle.rs` | Teardown invariants — no leaked TAPs, sockets, scratch images, or VMM processes. |
| `fc51_manifest_failed_emission.rs` / `firecracker_manifest_failed_e2e.rs` | FC-08 manifest digest enforcement. |
| `fc52_oom_enforcement.rs` / `fc53_vcpu_quota.rs` / `fc54_ttl_enforcement.rs` | Resource-limit enforcement (memory OOM, vCPU quota derived from `spec.run.limits.cpu_max`, TTL). |
| `fc55_orphan_vm_reaping.rs` / `fc56_fd_leak_bound.rs` / `fc57_socket_leak_bound.rs` | Reaping + FD / socket leak bounds. |
| `fc59_kernel_panic_handled.rs` / `fc60_init_segfault_handled.rs` / `fc61_rootfs_corruption_handled.rs` / `fc62_vsock_recv_hang.rs` / `fc63_vmm_crash_mid_run.rs` | Fault classes — none of these may produce a silent `Success` terminal state. |
The `cleanup_on_crash.rs` file panics with a stable `FC-37 GAP:` marker
on the `reconcile_orphans` slots so a future implementer can lift the
`#[ignore]` without re-discovering the contract.
A small subset (e.g. `host_capabilities_smoke.rs`) runs on every CI leg
without `--ignored`.
## Related crates
- [`cellos-core`](../cellos-core) — the `CellBackend` trait, `CellHandle`,
`TeardownReport`, `EgressRule`, `ExecutionCellDocument`,
`ExecutionCellSpec`, `CellosError`.
- [`cellos-init`](../cellos-init) — the in-guest PID-1 that reads
`cellos.argv=<b64>` from `/proc/cmdline`, forks the workload, computes
the HMAC tag, and writes the 36-byte frame back over vsock.
- [`cellos-host-telemetry`](../cellos-host-telemetry) — pairs the
`_9000` exit-code UDS this crate owns with the `_9001` telemetry UDS
in the same socket dir.
- [`cellos-host-stub`](../cellos-host-stub) — no-op backend used in
unit tests of the supervisor pipeline.
- [`cellos-host-gvisor`](../cellos-host-gvisor) — `runsc` alternative
for hosts without KVM.
- [`cellos-supervisor`](../cellos-supervisor) — selects this backend
with `CELLOS_CELL_BACKEND=firecracker` and spawns the warm-pool fill
task.
## ADRs
- [ADR-0001 — Rust + NATS/JetStream + proprietary host](../../docs/adr/0001-rust-nats-jetstream-proprietary-host.md)
— the surrounding stack commitments that frame Firecracker as one
isolation primitive among several.
- [ADR-0004 — TLS termination fronting / trust boundary](../../docs/adr/0004-tls-termination-fronting-trust-boundary.md)
— TLS termination is an egress-edge concern, not in-VM, so this
backend does not interpose on TLS.
- [ADR-0005 — TLS termination design](../../docs/adr/0005-tls-termination-design.md)
— the design that backend authors must respect: egress goes through
declared rules; this crate translates them into nftables.
- [ADR-0006 — In-VM observability runner evidence](../../docs/adr/0006-in-vm-observability-runner-evidence.md)
— the receiver side (`cellos-host-telemetry`) is the host half of the
observability system this backend ships the channel for.