cellos-host-firecracker
The Firecracker microVM [CellBackend] — L2-06 production host backend.
Boots one Firecracker VMM per cell behind a jailer and a per-cell
manifest, runs spec.run.argv under cellos-init, and reports
authenticated exit codes back to the supervisor over a per-cell vsock
UDS.
What it is
cellos-host-firecracker implements [cellos_core::ports::CellBackend]
on top of a real Firecracker VMM. Each cell is one VMM child process,
one chroot under jailer, one ext4 rootfs (read-only when a scratch
drive is attached), one writable virtio-blk scratch image (optional),
one TAP interface + nftables egress ruleset (Linux only), and one
vsock-backed Unix Domain Socket family (_9000 for exit code,
_9001 for telemetry).
It is the L2 (host runtime / isolation) production target named in
LAYERS.md: "Map an authority bundle to real
isolation". The supervisor selects it with CELLOS_CELL_BACKEND=firecracker
(crates/cellos-supervisor/src/composition.rs:1031). The backend
constructor takes a [FirecrackerConfig] (which can be populated from
env via FirecrackerConfig::from_env()); env validation is strict
because every flag here trades isolation for ergonomics and we want
production misconfigurations to fail closed.
The crate has three sub-modules:
lib.rs— [FirecrackerConfig], [FirecrackerCellBackend], theCellBackendimpl, the in-VM exit-code bridge, the HMAC verification helpers, jailer / chroot wiring, manifest verification, vsock UDS listener (listen_for_exit_code), TAP + nftables setup, scratch ext4 provisioning, graceful-shutdown plumbing, FD-leak guards.api_client.rs— minimalhyperclient for the Firecracker Management API (PUT /machine-config,PUT /boot-source,PUT /drives/{...},PUT /network-interfaces/{...},PUT /vsock,PUT /actions,PUT /snapshot/create,PUT /snapshot/load,PATCH /vm).pool.rs— [FirecrackerPool], the warm pool state machine (snapshot → restore) for fast cell startup. The pool is opt-in viaCELLOS_FIRECRACKER_POOL_SIZE; default 0 means a zero-slot no-op pool that always misses, so wiring is an inert pass-through on the cold-boot path.
What it deliberately does not do:
- It does not run
spec.run.argvas a host subprocess. The argv is base64-encoded into the kernelcmdlineascellos.argv=<b64>;cellos-initreads/proc/cmdline, forks/execs inside the guest, and the supervisor reads the exit code back over vsock. The host-subprocess fallback incellos-supervisoris shadowed when this backend reports an in-VM exit (src/lib.rs:1-31). - It does not trust the guest's exit code unauthenticated. The
4-byte little-endian
i32is followed by a 32-byte HMAC-SHA256 tag overexit_code_bytes ‖ cell_id_byteskeyed by a per-cell 32-byte key read from/dev/urandomatcreate()(FC-18,src/lib.rs:134-218). - It does not boot non-Linux. The whole backend body is
#[cfg(target_os = "linux")]; on Windows/macOS the struct still constructs (so the supervisor composition root type-checks everywhere) but everyCellBackendmethod returns anUnsupported-shapedCellosError::Host(src/lib.rs:632-639). - It does not terminate TLS in-VM. Per ADR-0004 / ADR-0005, TLS termination lives on the host's egress path, not inside the cell.
Public API surface
| Item | Where |
|---|---|
pub const VSOCK_EXIT_PORT: u32 = 9000 |
src/lib.rs:134 |
pub struct FirecrackerConfig { ... 13 fields ... } |
src/lib.rs:238 |
FirecrackerConfig::from_env() -> Result<Self, CellosError> |
src/lib.rs:333 |
pub struct FirecrackerCellBackend |
src/lib.rs:640 |
FirecrackerCellBackend::new(FirecrackerConfig) |
src/lib.rs:663 |
FirecrackerCellBackend::from_env() -> Result<Self, CellosError> |
src/lib.rs:687 |
FirecrackerCellBackend::with_event_sink(Arc<dyn EventSink>) -> Self |
src/lib.rs:701 |
FirecrackerCellBackend::config(&self) -> &FirecrackerConfig |
src/lib.rs:706 |
FirecrackerCellBackend::pool_size(&self) -> usize (async) |
src/lib.rs:715 |
FirecrackerCellBackend::pool_available(&self) -> usize (async) |
src/lib.rs:728 |
impl CellBackend for FirecrackerCellBackend |
src/lib.rs:831 (Linux), src/lib.rs:1334 (non-Linux stub) |
pub mod api_client — FirecrackerApiClient, BootSource, Drive, NetworkInterface, VsockDevice, MachineConfig, InstanceAction, InstanceActionType, SnapshotCreate, SnapshotLoad, MemBackend, MemBackendType, SnapshotType, VmState, VmStatePatch |
src/api_client.rs |
pub mod pool — FirecrackerPool, PoolSlot::{Available,InUse,Empty}, POOL_SIZE_ENV, pool_size_from_env() |
src/pool.rs |
pub const pool::POOL_SIZE_ENV = "CELLOS_FIRECRACKER_POOL_SIZE" |
src/pool.rs:62 |
pub(crate) fn verify_exit_hmac(key, exit_code_bytes, cell_id, received_tag) -> bool |
src/lib.rs:187 |
FirecrackerConfig (in declaration order, src/lib.rs:238-330):
| Field | Meaning |
|---|---|
binary_path: PathBuf |
Firecracker VMM binary. |
kernel_image_path: PathBuf |
Kernel image (vmlinux). |
rootfs_image_path: PathBuf |
Rootfs ext4 image. |
jailer_binary_path: Option<PathBuf> |
When present, create() exec's via jailer. |
chroot_base_dir: PathBuf |
Jailer chroot base (default /var/lib/cellos/firecracker). |
socket_dir: PathBuf |
API socket dir when jailer is off (default /tmp). |
jailer_uid: u32, jailer_gid: u32 |
Drop-to ids (default 10002 / 10002). |
scratch_dir: Option<PathBuf> |
When set, rootfs is read-only and a writable scratch ext4 is attached as a second virtio-blk. |
manifest_path: Option<PathBuf> |
Artifact manifest. create() verifies SHA-256 of each declared artifact before boot. |
require_jailer: bool |
Default true; create() refuses to proceed without the jailer unless explicitly overridden. |
allow_no_manifest: bool |
Default false; the two-flag opt-out (*_ALLOW_NO_MANIFEST=1 AND *_ALLOW_NO_MANIFEST_REALLY=1) is needed to flip this on. |
enable_network: bool |
TAP + nftables egress filter. Default true on Linux, false elsewhere. |
allow_no_vsock: bool |
Bounded-timeout vsock wait (default 5s) — surfaces a misconfigured kernel as forced termination instead of an indefinite hang. |
no_vsock_timeout: Duration |
Wait budget when allow_no_vsock is true. |
no_seccomp: bool |
Pass --no-seccomp to Firecracker (arm64 emulation / Rosetta workaround). Never set in production. |
Architecture / how it works
Lifecycle per cell.
create(spec):
- Compute the per-cell HMAC key from
/dev/urandom(generate_exit_hmac_key,src/lib.rs:161-168). - Verify the artifact manifest if
manifest_pathis set, populatingkernel_digest_sha256/rootfs_digest_sha256/firecracker_digest_sha256on the returned handle. - Consult the warm pool (
pool::checkout); on a hit,PUT /snapshot/loadwith the snapshot + mem-file paths and skip thePUT /machine-configPUT /boot-sourcearc. On a miss (or pool disabled), follow the cold-boot path.
- Stage the chroot (
<chroot_base>/firecracker/<cell_id>/root/...), hard-link / copy the rootfs and optional scratch image into the chroot, generate per-cell nftables ruleset fromspec.authority.egress, bring up the TAP, spawnfirecracker --api-sock <socket>under the jailer (or directly if*_ALLOW_NO_JAILER=1), wait for the API socket to appear (maxSOCKET_READY_TIMEOUT = 10s,src/lib.rs:77), drive the Management API. - Bind the per-cell exit-code UDS (
<vsock_uds_path>_9000) and telemetry UDS (<vsock_uds_path>_9001— owned bycellos-host-telemetrybut rendezvoused in the same socket dir) BEFORE booting the guest. PUT /actions InstanceStart. The guest'scellos-initreadscellos.argv=<base64-json>from/proc/cmdline, forks the workload, then writes a 36-byte authenticated frame (4-byte LE i32 || 32-byte HMAC-SHA256 tag) back to<vsock_uds_path>_9000before powering off (src/lib.rs:134-148).
wait_for_in_vm_exit(cell_id) reads the 36-byte frame, recomputes the
HMAC with the per-cell key, and rejects a mismatched tag in
constant time via hmac::Mac::verify_slice (src/lib.rs:172-218).
destroy(handle) sends SendCtrlAltDel via the Management API, waits
up to the per-cell graceful-shutdown budget
(spec.run.limits.graceful_shutdown_seconds, defaulting to
GRACEFUL_SHUTDOWN_TIMEOUT = 5s per src/lib.rs:84 and
resolve_graceful_shutdown_timeout at src/lib.rs:93), then SIGKILLs.
TAP + nftables table are removed; the chroot is unlinked; the warm-pool
slot transitions to Empty and the background filler can re-snapshot.
Warm pool (L2-06-2, src/pool.rs). Cold boot is ~125 ms; restore is
~10 ms. Slot lifecycle is Empty --fill()--> Available --checkout()--> InUse --checkin()--> Empty. checkin returns to Empty (not
Available) by design: a VM that ran a cell is no longer at the
parked-init snapshot state. A background task re-fills slots after
destroy(). The fill task is spawned at supervisor startup when the env
var resolves > 0 (crates/cellos-supervisor/src/composition.rs:1051-1075).
HMAC exit auth (FC-18). Without authentication, anything inside the
guest with vsock access could spoof a "successful" exit. The
36-byte frame (EXIT_AUTHED_FRAME_LEN, src/lib.rs:147) commits the
exit code AND the cell id under a key only the host knows. Both ends use
constant-time compare. The verification helper
(src/lib.rs:187-218) is pub(crate); tests reach it through a
doc-hidden __fc18 shim.
Manifest verification (FC-08). When CELLOS_FIRECRACKER_MANIFEST
points to a v1 manifest file, create() re-hashes the kernel, rootfs,
and Firecracker binary before boot. Digest mismatch is a hard error.
The opt-out is intentionally two flags — see "Configuration" below.
Jailer. The jailer drops to jailer_uid / jailer_gid (10002 by
default) and chroots into <chroot_base>/firecracker/<cell_id>/root.
The API socket then lives at
<chroot_base>/firecracker/<cell_id>/root/run/firecracker.socket
(src/lib.rs:244-247). The require-jailer flag is on by default and
flipping it off requires the explicit *_ALLOW_NO_JAILER=1 opt-out
that emits a loud warning (src/lib.rs:268-269).
Non-Linux stub. Outside Linux, the CellBackend impl returns
CellosError::Host from every method; tests can still link the crate
on macOS / Windows hosts (src/lib.rs:1334-...).
Configuration
All env vars are read by FirecrackerConfig::from_env() (or the
from_lookup helper for tests). Required absolute paths fail-closed at
init if absent.
| Env var | Default | Effect |
|---|---|---|
CELLOS_CELL_BACKEND |
unset | Set to firecracker to select this backend. |
CELLOS_FIRECRACKER_BINARY |
required | Absolute path to the Firecracker VMM binary. |
CELLOS_FIRECRACKER_KERNEL_IMAGE |
required | Absolute path to the kernel image. |
CELLOS_FIRECRACKER_ROOTFS_IMAGE |
required | Absolute path to the rootfs ext4 image. |
CELLOS_FIRECRACKER_JAILER_BINARY |
unset | Absolute path to the jailer; presence enables jailer mode. |
CELLOS_FIRECRACKER_CHROOT_BASE |
/var/lib/cellos/firecracker |
Chroot base for the jailer. |
CELLOS_FIRECRACKER_SOCKET_DIR |
/tmp |
API socket dir when the jailer is OFF. |
CELLOS_FIRECRACKER_JAILER_UID / _GID |
10002 / 10002 |
Drop-to ids inside the jailer. Must be non-root in production. |
CELLOS_FIRECRACKER_SCRATCH_DIR |
unset | When set, rootfs is mounted read-only and a writable scratch ext4 is attached as a second virtio-blk. |
CELLOS_FIRECRACKER_MANIFEST |
unset | Path to the v1 artifact manifest. Required unless the two-flag opt-out is set. |
CELLOS_FIRECRACKER_REQUIRE_JAILER |
true |
Explicit override for the require-jailer flag. |
CELLOS_FIRECRACKER_ALLOW_NO_JAILER |
0 |
Dev opt-out for the jailer. Emits a loud warning. |
CELLOS_FIRECRACKER_ALLOW_NO_MANIFEST |
0 |
First half of the two-flag manifest opt-out. |
CELLOS_FIRECRACKER_ALLOW_NO_MANIFEST_REALLY |
0 |
Second half. Both must be 1; emits a MANIFEST VERIFICATION DISABLED warning. Setting only one is rejected. Setting these and CELLOS_FIRECRACKER_MANIFEST is rejected. |
CELLOS_FIRECRACKER_ENABLE_NETWORK |
1 on Linux, 0 elsewhere |
TAP + nftables egress filter. |
CELLOS_FIRECRACKER_ALLOW_NO_VSOCK |
0 |
Bounded-timeout vsock wait. |
CELLOS_FIRECRACKER_NO_VSOCK_TIMEOUT_SECS |
5 |
Wait budget when allow_no_vsock is 1. |
CELLOS_FIRECRACKER_NO_SECCOMP |
0 |
Pass --no-seccomp to Firecracker. arm64-emulation workaround; never in production. |
CELLOS_FIRECRACKER_POOL_SIZE |
0 |
Warm-pool slot count. 0 disables the pool. |
The two-flag manifest opt-out exists because a single env var can be set
in a base image, a Helm chart copied between environments, or an .env
file leaking from dev to prod by mistake. Requiring a paired
_REALLY=1 forces the operator to make the trade-off explicit on the
same line, in the same operation (src/lib.rs:280-285).
Examples
use Arc;
use CellBackend;
use ;
// From env (validates required paths up front; fails fast on misconfig).
let backend: = new;
// Or pin the config inline.
let cfg = from_env?;
let backend: = new;
// Optional: attach an EventSink so `cell.firecracker.v1.pool_checkout`
// CloudEvents flow alongside the regular lifecycle stream.
// let backend = FirecrackerCellBackend::from_env()?.with_event_sink(sink);
# Ok::
Warm-pool sizing:
# Supervisor spawns one background fill task at startup.
# pool::pool_size_from_env() resolves the value at backend construction.
Testing
# Unit tests (always run, no KVM required).
cargo test -p cellos-host-firecracker
# Integration suite — requires Linux + /dev/kvm + opt-in.
cargo test -p cellos-host-firecracker -- --ignored
The tests/ directory contains 41 integration tests. The
#[ignore]-gated subset needs Linux, /dev/kvm, the jailer binary,
CAP_NET_ADMIN for nftables / TAP manipulation, and (in some cases) the
artifact manifest. Highlights:
| Test | What it pins |
|---|---|
firecracker_e2e_exit_42.rs |
FC-16 canonical exit-42 e2e canary — a real workload returns 42 over the authenticated vsock frame. |
vsock_exit_auth.rs |
FC-18 HMAC-authenticated exit code — rejects forged tags. |
vsock_recv_upper_bound.rs |
The 4 + 32 byte frame is the upper bound on what wait_for_in_vm_exit accepts. |
firecracker_table_ip6_enforcement.rs / firecracker_ipv6_isolation.rs / nft_egress_drop.rs |
nftables egress drops undeclared destinations (v4 + v6). |
jailer_isolation.rs / jailer_failure_isolation.rs / fc64_jailer_chroot_escape_rejected.rs |
Jailer chroot and capability-bounded execution. |
boot_soak_50.rs |
50 cold boots end-to-end. Heavy — opt-in only via --ignored. |
cleanup_on_crash.rs / cleanup_fault_injection.rs / cleanup_leak_check.rs / scratch_cleanup.rs / tap_device_lifecycle.rs |
Teardown invariants — no leaked TAPs, sockets, scratch images, or VMM processes. |
fc51_manifest_failed_emission.rs / firecracker_manifest_failed_e2e.rs |
FC-08 manifest digest enforcement. |
fc52_oom_enforcement.rs / fc53_vcpu_quota.rs / fc54_ttl_enforcement.rs |
Resource-limit enforcement (memory OOM, vCPU quota derived from spec.run.limits.cpu_max, TTL). |
fc55_orphan_vm_reaping.rs / fc56_fd_leak_bound.rs / fc57_socket_leak_bound.rs |
Reaping + FD / socket leak bounds. |
fc59_kernel_panic_handled.rs / fc60_init_segfault_handled.rs / fc61_rootfs_corruption_handled.rs / fc62_vsock_recv_hang.rs / fc63_vmm_crash_mid_run.rs |
Fault classes — none of these may produce a silent Success terminal state. |
The cleanup_on_crash.rs file panics with a stable FC-37 GAP: marker
on the reconcile_orphans slots so a future implementer can lift the
#[ignore] without re-discovering the contract.
A small subset (e.g. host_capabilities_smoke.rs) runs on every CI leg
without --ignored.
Related crates
cellos-core— theCellBackendtrait,CellHandle,TeardownReport,EgressRule,ExecutionCellDocument,ExecutionCellSpec,CellosError.cellos-init— the in-guest PID-1 that readscellos.argv=<b64>from/proc/cmdline, forks the workload, computes the HMAC tag, and writes the 36-byte frame back over vsock.cellos-host-telemetry— pairs the_9000exit-code UDS this crate owns with the_9001telemetry UDS in the same socket dir.cellos-host-stub— no-op backend used in unit tests of the supervisor pipeline.cellos-host-gvisor—runscalternative for hosts without KVM.cellos-supervisor— selects this backend withCELLOS_CELL_BACKEND=firecrackerand spawns the warm-pool fill task.
ADRs
- ADR-0001 — Rust + NATS/JetStream + proprietary host — the surrounding stack commitments that frame Firecracker as one isolation primitive among several.
- ADR-0004 — TLS termination fronting / trust boundary — TLS termination is an egress-edge concern, not in-VM, so this backend does not interpose on TLS.
- ADR-0005 — TLS termination design — the design that backend authors must respect: egress goes through declared rules; this crate translates them into nftables.
- ADR-0006 — In-VM observability runner evidence
— the receiver side (
cellos-host-telemetry) is the host half of the observability system this backend ships the channel for.