cellos-host-firecracker 0.5.1

Firecracker microVM backend for CellOS — jailer integration, warm pool with snapshot/restore, KVM nested-virtualisation aware.
Documentation

cellos-host-firecracker

The Firecracker microVM [CellBackend] — L2-06 production host backend. Boots one Firecracker VMM per cell behind a jailer and a per-cell manifest, runs spec.run.argv under cellos-init, and reports authenticated exit codes back to the supervisor over a per-cell vsock UDS.

What it is

cellos-host-firecracker implements [cellos_core::ports::CellBackend] on top of a real Firecracker VMM. Each cell is one VMM child process, one chroot under jailer, one ext4 rootfs (read-only when a scratch drive is attached), one writable virtio-blk scratch image (optional), one TAP interface + nftables egress ruleset (Linux only), and one vsock-backed Unix Domain Socket family (_9000 for exit code, _9001 for telemetry).

It is the L2 (host runtime / isolation) production target named in LAYERS.md: "Map an authority bundle to real isolation". The supervisor selects it with CELLOS_CELL_BACKEND=firecracker (crates/cellos-supervisor/src/composition.rs:1031). The backend constructor takes a [FirecrackerConfig] (which can be populated from env via FirecrackerConfig::from_env()); env validation is strict because every flag here trades isolation for ergonomics and we want production misconfigurations to fail closed.

The crate has three sub-modules:

  • lib.rs — [FirecrackerConfig], [FirecrackerCellBackend], the CellBackend impl, the in-VM exit-code bridge, the HMAC verification helpers, jailer / chroot wiring, manifest verification, vsock UDS listener (listen_for_exit_code), TAP + nftables setup, scratch ext4 provisioning, graceful-shutdown plumbing, FD-leak guards.
  • api_client.rs — minimal hyper client for the Firecracker Management API (PUT /machine-config, PUT /boot-source, PUT /drives/{...}, PUT /network-interfaces/{...}, PUT /vsock, PUT /actions, PUT /snapshot/create, PUT /snapshot/load, PATCH /vm).
  • pool.rs — [FirecrackerPool], the warm pool state machine (snapshot → restore) for fast cell startup. The pool is opt-in via CELLOS_FIRECRACKER_POOL_SIZE; default 0 means a zero-slot no-op pool that always misses, so wiring is an inert pass-through on the cold-boot path.

What it deliberately does not do:

  • It does not run spec.run.argv as a host subprocess. The argv is base64-encoded into the kernel cmdline as cellos.argv=<b64>; cellos-init reads /proc/cmdline, forks/execs inside the guest, and the supervisor reads the exit code back over vsock. The host-subprocess fallback in cellos-supervisor is shadowed when this backend reports an in-VM exit (src/lib.rs:1-31).
  • It does not trust the guest's exit code unauthenticated. The 4-byte little-endian i32 is followed by a 32-byte HMAC-SHA256 tag over exit_code_bytes ‖ cell_id_bytes keyed by a per-cell 32-byte key read from /dev/urandom at create() (FC-18, src/lib.rs:134-218).
  • It does not boot non-Linux. The whole backend body is #[cfg(target_os = "linux")]; on Windows/macOS the struct still constructs (so the supervisor composition root type-checks everywhere) but every CellBackend method returns an Unsupported-shaped CellosError::Host (src/lib.rs:632-639).
  • It does not terminate TLS in-VM. Per ADR-0004 / ADR-0005, TLS termination lives on the host's egress path, not inside the cell.

Public API surface

Item Where
pub const VSOCK_EXIT_PORT: u32 = 9000 src/lib.rs:134
pub struct FirecrackerConfig { ... 13 fields ... } src/lib.rs:238
FirecrackerConfig::from_env() -> Result<Self, CellosError> src/lib.rs:333
pub struct FirecrackerCellBackend src/lib.rs:640
FirecrackerCellBackend::new(FirecrackerConfig) src/lib.rs:663
FirecrackerCellBackend::from_env() -> Result<Self, CellosError> src/lib.rs:687
FirecrackerCellBackend::with_event_sink(Arc<dyn EventSink>) -> Self src/lib.rs:701
FirecrackerCellBackend::config(&self) -> &FirecrackerConfig src/lib.rs:706
FirecrackerCellBackend::pool_size(&self) -> usize (async) src/lib.rs:715
FirecrackerCellBackend::pool_available(&self) -> usize (async) src/lib.rs:728
impl CellBackend for FirecrackerCellBackend src/lib.rs:831 (Linux), src/lib.rs:1334 (non-Linux stub)
pub mod api_clientFirecrackerApiClient, BootSource, Drive, NetworkInterface, VsockDevice, MachineConfig, InstanceAction, InstanceActionType, SnapshotCreate, SnapshotLoad, MemBackend, MemBackendType, SnapshotType, VmState, VmStatePatch src/api_client.rs
pub mod poolFirecrackerPool, PoolSlot::{Available,InUse,Empty}, POOL_SIZE_ENV, pool_size_from_env() src/pool.rs
pub const pool::POOL_SIZE_ENV = "CELLOS_FIRECRACKER_POOL_SIZE" src/pool.rs:62
pub(crate) fn verify_exit_hmac(key, exit_code_bytes, cell_id, received_tag) -> bool src/lib.rs:187

FirecrackerConfig (in declaration order, src/lib.rs:238-330):

Field Meaning
binary_path: PathBuf Firecracker VMM binary.
kernel_image_path: PathBuf Kernel image (vmlinux).
rootfs_image_path: PathBuf Rootfs ext4 image.
jailer_binary_path: Option<PathBuf> When present, create() exec's via jailer.
chroot_base_dir: PathBuf Jailer chroot base (default /var/lib/cellos/firecracker).
socket_dir: PathBuf API socket dir when jailer is off (default /tmp).
jailer_uid: u32, jailer_gid: u32 Drop-to ids (default 10002 / 10002).
scratch_dir: Option<PathBuf> When set, rootfs is read-only and a writable scratch ext4 is attached as a second virtio-blk.
manifest_path: Option<PathBuf> Artifact manifest. create() verifies SHA-256 of each declared artifact before boot.
require_jailer: bool Default true; create() refuses to proceed without the jailer unless explicitly overridden.
allow_no_manifest: bool Default false; the two-flag opt-out (*_ALLOW_NO_MANIFEST=1 AND *_ALLOW_NO_MANIFEST_REALLY=1) is needed to flip this on.
enable_network: bool TAP + nftables egress filter. Default true on Linux, false elsewhere.
allow_no_vsock: bool Bounded-timeout vsock wait (default 5s) — surfaces a misconfigured kernel as forced termination instead of an indefinite hang.
no_vsock_timeout: Duration Wait budget when allow_no_vsock is true.
no_seccomp: bool Pass --no-seccomp to Firecracker (arm64 emulation / Rosetta workaround). Never set in production.

Architecture / how it works

Lifecycle per cell.

create(spec):

  1. Compute the per-cell HMAC key from /dev/urandom (generate_exit_hmac_key, src/lib.rs:161-168).
  2. Verify the artifact manifest if manifest_path is set, populating kernel_digest_sha256 / rootfs_digest_sha256 / firecracker_digest_sha256 on the returned handle.
  3. Consult the warm pool (pool::checkout); on a hit, PUT /snapshot/load with the snapshot + mem-file paths and skip the PUT /machine-config
    • PUT /boot-source arc. On a miss (or pool disabled), follow the cold-boot path.
  4. Stage the chroot (<chroot_base>/firecracker/<cell_id>/root/...), hard-link / copy the rootfs and optional scratch image into the chroot, generate per-cell nftables ruleset from spec.authority.egress, bring up the TAP, spawn firecracker --api-sock <socket> under the jailer (or directly if *_ALLOW_NO_JAILER=1), wait for the API socket to appear (max SOCKET_READY_TIMEOUT = 10s, src/lib.rs:77), drive the Management API.
  5. Bind the per-cell exit-code UDS (<vsock_uds_path>_9000) and telemetry UDS (<vsock_uds_path>_9001 — owned by cellos-host-telemetry but rendezvoused in the same socket dir) BEFORE booting the guest.
  6. PUT /actions InstanceStart. The guest's cellos-init reads cellos.argv=<base64-json> from /proc/cmdline, forks the workload, then writes a 36-byte authenticated frame (4-byte LE i32 || 32-byte HMAC-SHA256 tag) back to <vsock_uds_path>_9000 before powering off (src/lib.rs:134-148).

wait_for_in_vm_exit(cell_id) reads the 36-byte frame, recomputes the HMAC with the per-cell key, and rejects a mismatched tag in constant time via hmac::Mac::verify_slice (src/lib.rs:172-218).

destroy(handle) sends SendCtrlAltDel via the Management API, waits up to the per-cell graceful-shutdown budget (spec.run.limits.graceful_shutdown_seconds, defaulting to GRACEFUL_SHUTDOWN_TIMEOUT = 5s per src/lib.rs:84 and resolve_graceful_shutdown_timeout at src/lib.rs:93), then SIGKILLs. TAP + nftables table are removed; the chroot is unlinked; the warm-pool slot transitions to Empty and the background filler can re-snapshot.

Warm pool (L2-06-2, src/pool.rs). Cold boot is ~125 ms; restore is ~10 ms. Slot lifecycle is Empty --fill()--> Available --checkout()--> InUse --checkin()--> Empty. checkin returns to Empty (not Available) by design: a VM that ran a cell is no longer at the parked-init snapshot state. A background task re-fills slots after destroy(). The fill task is spawned at supervisor startup when the env var resolves > 0 (crates/cellos-supervisor/src/composition.rs:1051-1075).

HMAC exit auth (FC-18). Without authentication, anything inside the guest with vsock access could spoof a "successful" exit. The 36-byte frame (EXIT_AUTHED_FRAME_LEN, src/lib.rs:147) commits the exit code AND the cell id under a key only the host knows. Both ends use constant-time compare. The verification helper (src/lib.rs:187-218) is pub(crate); tests reach it through a doc-hidden __fc18 shim.

Manifest verification (FC-08). When CELLOS_FIRECRACKER_MANIFEST points to a v1 manifest file, create() re-hashes the kernel, rootfs, and Firecracker binary before boot. Digest mismatch is a hard error. The opt-out is intentionally two flags — see "Configuration" below.

Jailer. The jailer drops to jailer_uid / jailer_gid (10002 by default) and chroots into <chroot_base>/firecracker/<cell_id>/root. The API socket then lives at <chroot_base>/firecracker/<cell_id>/root/run/firecracker.socket (src/lib.rs:244-247). The require-jailer flag is on by default and flipping it off requires the explicit *_ALLOW_NO_JAILER=1 opt-out that emits a loud warning (src/lib.rs:268-269).

Non-Linux stub. Outside Linux, the CellBackend impl returns CellosError::Host from every method; tests can still link the crate on macOS / Windows hosts (src/lib.rs:1334-...).

Configuration

All env vars are read by FirecrackerConfig::from_env() (or the from_lookup helper for tests). Required absolute paths fail-closed at init if absent.

Env var Default Effect
CELLOS_CELL_BACKEND unset Set to firecracker to select this backend.
CELLOS_FIRECRACKER_BINARY required Absolute path to the Firecracker VMM binary.
CELLOS_FIRECRACKER_KERNEL_IMAGE required Absolute path to the kernel image.
CELLOS_FIRECRACKER_ROOTFS_IMAGE required Absolute path to the rootfs ext4 image.
CELLOS_FIRECRACKER_JAILER_BINARY unset Absolute path to the jailer; presence enables jailer mode.
CELLOS_FIRECRACKER_CHROOT_BASE /var/lib/cellos/firecracker Chroot base for the jailer.
CELLOS_FIRECRACKER_SOCKET_DIR /tmp API socket dir when the jailer is OFF.
CELLOS_FIRECRACKER_JAILER_UID / _GID 10002 / 10002 Drop-to ids inside the jailer. Must be non-root in production.
CELLOS_FIRECRACKER_SCRATCH_DIR unset When set, rootfs is mounted read-only and a writable scratch ext4 is attached as a second virtio-blk.
CELLOS_FIRECRACKER_MANIFEST unset Path to the v1 artifact manifest. Required unless the two-flag opt-out is set.
CELLOS_FIRECRACKER_REQUIRE_JAILER true Explicit override for the require-jailer flag.
CELLOS_FIRECRACKER_ALLOW_NO_JAILER 0 Dev opt-out for the jailer. Emits a loud warning.
CELLOS_FIRECRACKER_ALLOW_NO_MANIFEST 0 First half of the two-flag manifest opt-out.
CELLOS_FIRECRACKER_ALLOW_NO_MANIFEST_REALLY 0 Second half. Both must be 1; emits a MANIFEST VERIFICATION DISABLED warning. Setting only one is rejected. Setting these and CELLOS_FIRECRACKER_MANIFEST is rejected.
CELLOS_FIRECRACKER_ENABLE_NETWORK 1 on Linux, 0 elsewhere TAP + nftables egress filter.
CELLOS_FIRECRACKER_ALLOW_NO_VSOCK 0 Bounded-timeout vsock wait.
CELLOS_FIRECRACKER_NO_VSOCK_TIMEOUT_SECS 5 Wait budget when allow_no_vsock is 1.
CELLOS_FIRECRACKER_NO_SECCOMP 0 Pass --no-seccomp to Firecracker. arm64-emulation workaround; never in production.
CELLOS_FIRECRACKER_POOL_SIZE 0 Warm-pool slot count. 0 disables the pool.

The two-flag manifest opt-out exists because a single env var can be set in a base image, a Helm chart copied between environments, or an .env file leaking from dev to prod by mistake. Requiring a paired _REALLY=1 forces the operator to make the trade-off explicit on the same line, in the same operation (src/lib.rs:280-285).

Examples

use std::sync::Arc;
use cellos_core::ports::CellBackend;
use cellos_host_firecracker::{FirecrackerCellBackend, FirecrackerConfig};

// From env (validates required paths up front; fails fast on misconfig).
let backend: Arc<dyn CellBackend> = Arc::new(FirecrackerCellBackend::from_env()?);

// Or pin the config inline.
let cfg = FirecrackerConfig::from_env()?;
let backend: Arc<dyn CellBackend> = Arc::new(FirecrackerCellBackend::new(cfg));

// Optional: attach an EventSink so `cell.firecracker.v1.pool_checkout`
// CloudEvents flow alongside the regular lifecycle stream.
// let backend = FirecrackerCellBackend::from_env()?.with_event_sink(sink);
# Ok::<(), cellos_core::CellosError>(())

Warm-pool sizing:

export CELLOS_FIRECRACKER_POOL_SIZE=8
# Supervisor spawns one background fill task at startup.
# pool::pool_size_from_env() resolves the value at backend construction.

Testing

# Unit tests (always run, no KVM required).
cargo test -p cellos-host-firecracker

# Integration suite — requires Linux + /dev/kvm + opt-in.
cargo test -p cellos-host-firecracker -- --ignored

The tests/ directory contains 41 integration tests. The #[ignore]-gated subset needs Linux, /dev/kvm, the jailer binary, CAP_NET_ADMIN for nftables / TAP manipulation, and (in some cases) the artifact manifest. Highlights:

Test What it pins
firecracker_e2e_exit_42.rs FC-16 canonical exit-42 e2e canary — a real workload returns 42 over the authenticated vsock frame.
vsock_exit_auth.rs FC-18 HMAC-authenticated exit code — rejects forged tags.
vsock_recv_upper_bound.rs The 4 + 32 byte frame is the upper bound on what wait_for_in_vm_exit accepts.
firecracker_table_ip6_enforcement.rs / firecracker_ipv6_isolation.rs / nft_egress_drop.rs nftables egress drops undeclared destinations (v4 + v6).
jailer_isolation.rs / jailer_failure_isolation.rs / fc64_jailer_chroot_escape_rejected.rs Jailer chroot and capability-bounded execution.
boot_soak_50.rs 50 cold boots end-to-end. Heavy — opt-in only via --ignored.
cleanup_on_crash.rs / cleanup_fault_injection.rs / cleanup_leak_check.rs / scratch_cleanup.rs / tap_device_lifecycle.rs Teardown invariants — no leaked TAPs, sockets, scratch images, or VMM processes.
fc51_manifest_failed_emission.rs / firecracker_manifest_failed_e2e.rs FC-08 manifest digest enforcement.
fc52_oom_enforcement.rs / fc53_vcpu_quota.rs / fc54_ttl_enforcement.rs Resource-limit enforcement (memory OOM, vCPU quota derived from spec.run.limits.cpu_max, TTL).
fc55_orphan_vm_reaping.rs / fc56_fd_leak_bound.rs / fc57_socket_leak_bound.rs Reaping + FD / socket leak bounds.
fc59_kernel_panic_handled.rs / fc60_init_segfault_handled.rs / fc61_rootfs_corruption_handled.rs / fc62_vsock_recv_hang.rs / fc63_vmm_crash_mid_run.rs Fault classes — none of these may produce a silent Success terminal state.

The cleanup_on_crash.rs file panics with a stable FC-37 GAP: marker on the reconcile_orphans slots so a future implementer can lift the #[ignore] without re-discovering the contract.

A small subset (e.g. host_capabilities_smoke.rs) runs on every CI leg without --ignored.

Related crates

  • cellos-core — the CellBackend trait, CellHandle, TeardownReport, EgressRule, ExecutionCellDocument, ExecutionCellSpec, CellosError.
  • cellos-init — the in-guest PID-1 that reads cellos.argv=<b64> from /proc/cmdline, forks the workload, computes the HMAC tag, and writes the 36-byte frame back over vsock.
  • cellos-host-telemetry — pairs the _9000 exit-code UDS this crate owns with the _9001 telemetry UDS in the same socket dir.
  • cellos-host-stub — no-op backend used in unit tests of the supervisor pipeline.
  • cellos-host-gvisorrunsc alternative for hosts without KVM.
  • cellos-supervisor — selects this backend with CELLOS_CELL_BACKEND=firecracker and spawns the warm-pool fill task.

ADRs