cellos-host-firecracker 0.5.1

Firecracker microVM backend for CellOS — jailer integration, warm pool with snapshot/restore, KVM nested-virtualisation aware.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
//! FC-16 — exit-code canary (42).
//!
//! # What this guards
//!
//! `crates/cellos-supervisor/tests/firecracker_e2e.rs` already exercises a
//! full Firecracker round-trip with `/bin/true`. That covers boot, vsock
//! handshake, and clean teardown — but the workload exits **0**, so it does
//! not prove the supervisor's recorded exit code is the *guest workload's*
//! exit code rather than a constant or a teardown-status proxy.
//!
//! FC-16 closes that gap with a non-zero canary: a spec running
//! `/bin/sh -c "exit 42"`. The supervisor MUST report **exactly 42** in its
//! `cell.command.v1.completed` event — not 0 (would mean the code is
//! hard-wired or the field is being defaulted), not 1 (would mean a generic
//! "non-zero" was substituted), not a signal-encoded value such as 128+N.
//!
//! The brief from `Plans/firecracker-release-readiness.md` FC-16:
//!
//! > a canonical e2e spec running `/bin/sh -c "exit 42"` reports exit code 42
//! > to the supervisor via vsock; supervisor's recorded exit code matches.
//!
//! The "via vsock" part is the authentic-capture requirement — the only path
//! `cellos-init` uses to surface a guest workload's exit code is the vsock
//! exit-ACK protocol covered by FC-19. So if 42 round-trips here, vsock IS
//! the carrier; no other channel exists in the production codepath.
//!
//! # Why this lives in the host crate, not the supervisor crate
//!
//! Slot ownership: the FC-16 task explicitly assigns this file to
//! `crates/cellos-host-firecracker/tests/`. The Firecracker backend (this
//! crate) is the component the canary actually exercises end-to-end — the
//! supervisor is the orchestrator, but every byte of the exit code travels
//! through this crate's `listen_for_exit_code` path before it reaches the
//! supervisor's event recorder. Co-locating the gate with the carrier makes
//! it obvious which crate's regression would break it.
//!
//! # Skip-on-no-Firecracker gate
//!
//! Identical preconditions to `firecracker_e2e.rs`:
//!   * `/dev/kvm` exists.
//!   * The full set of `CELLOS_FIRECRACKER_*` env vars is set and points to
//!     real files / a real socket dir.
//!   * The supervisor binary has been built.
//!
//! Any missing precondition prints a `firecracker_e2e: skipping —` line and
//! returns successfully. This matches the existing pattern so the test stays
//! safe to include in any `cargo test` invocation — only the dedicated
//! firecracker-e2e CI lane has the prerequisites and will actually run it.
//!
//! # Why Linux-only
//!
//! Same justification as the rest of this crate's integration tests:
//! Firecracker is Linux-only, the supervisor's Firecracker backend is
//! Linux-only, and the surrounding crate pre-fails to compile elsewhere.

#![cfg(target_os = "linux")]

use std::ffi::OsString;
use std::fs::{self, File};
use std::io::Write;
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
use std::time::{Duration, Instant};

/// The canary value. Chosen to be non-zero (so a hard-wired 0 trips), not
/// equal to 1 (so a "any non-zero" substitution trips), and not in the
/// signal-encoded range 128–159 (so a signal mis-translation trips).
///
/// 42 is the exact value called out in the FC-16 brief; do not change without
/// updating `Plans/firecracker-release-readiness.md`.
const CANARY_EXIT_CODE: i32 = 42;

/// CloudEvents `type` value the supervisor emits when a cell command
/// completes. The `exitCode` field of the event's `data` payload is the
/// authoritative supervisor-recorded exit code we are asserting against.
///
/// Mirrors the doc comment on `cellos_core::events::command_completed_data_v1`
/// in `crates/cellos-core/src/events.rs` (~line 281). Locking the literal
/// here means a rename of the event type (which is a wire-breaking change)
/// surfaces as a missing-event failure with a precise message rather than a
/// silent zero-event scan.
const COMMAND_COMPLETED_TYPE: &str = "dev.cellos.events.cell.command.v1.completed";

/// Required Firecracker env vars. Missing any of them is a skip, not a
/// failure — local dev machines won't have them, the firecracker-e2e CI lane
/// does.
///
/// Mirrors the list in `crates/cellos-supervisor/tests/firecracker_e2e.rs`
/// so the two tests skip / activate in lockstep on the same runner.
const REQUIRED_ENV: &[&str] = &[
    "CELLOS_FIRECRACKER_BINARY",
    "CELLOS_FIRECRACKER_KERNEL_IMAGE",
    "CELLOS_FIRECRACKER_ROOTFS_IMAGE",
    "CELLOS_FIRECRACKER_SOCKET_DIR",
];

/// Resolve the supervisor binary. This integration test lives in
/// `cellos-host-firecracker`, so cargo does NOT set
/// `CARGO_BIN_EXE_cellos-supervisor` for us (that env var is only populated
/// for tests in the package owning the `[[bin]]`). Resolution order:
///
///   1. `CELLOS_SUPERVISOR_BIN` — explicit override, matches the standalone
///      smoke script and the supervisor crate's own e2e test.
///   2. Workspace `target/<profile>/cellos-supervisor` — the canonical
///      location after `cargo build -p cellos-supervisor`.
///
/// On a CI runner that has not yet built the supervisor, the `is_file()`
/// check in the caller turns this into a skip with a helpful message.
fn supervisor_exe() -> PathBuf {
    if let Some(p) = std::env::var_os("CELLOS_SUPERVISOR_BIN") {
        return PathBuf::from(p);
    }
    let crate_dir = Path::new(env!("CARGO_MANIFEST_DIR"));
    let workspace = crate_dir
        .parent()
        .and_then(|p| p.parent())
        .expect("cellos-host-firecracker sits two levels under workspace root");
    // The firecracker-e2e CI lane builds in `release`. Local dev usually has
    // `debug`. Try release first (matches CI) then debug (matches local).
    for profile in ["release", "debug"] {
        let candidate = workspace
            .join("target")
            .join(profile)
            .join("cellos-supervisor");
        if candidate.is_file() {
            return candidate;
        }
    }
    // Return the release path so the caller's "missing binary" skip message
    // points at the canonical CI location.
    workspace
        .join("target")
        .join("release")
        .join("cellos-supervisor")
}

/// Some host backends accept `CELLOS_FIRECRACKER_ROOTFS` as shorthand for
/// `CELLOS_FIRECRACKER_ROOTFS_IMAGE`. Mirror the alias bridging that
/// `firecracker_e2e.rs` performs so this test honours the same env contract
/// the CI workflow exports.
fn handle_rootfs_alias() {
    let long = std::env::var_os("CELLOS_FIRECRACKER_ROOTFS_IMAGE");
    let short = std::env::var_os("CELLOS_FIRECRACKER_ROOTFS");
    match (long, short) {
        (Some(_), _) => {}
        (None, Some(s)) => std::env::set_var("CELLOS_FIRECRACKER_ROOTFS_IMAGE", s),
        _ => {}
    }
}

fn skip(reason: &str) {
    eprintln!("firecracker_e2e: skipping FC-16 — {reason}");
}

/// Walk `dir` recursively, collecting every `.jsonl` file. The supervisor's
/// JSONL exporter shards events by run-id under sub-directories, so we
/// cannot rely on a single known path; we read every JSONL the run produced
/// and search the union for the `command.v1.completed` event.
fn collect_jsonl_files(dir: &Path) -> Vec<PathBuf> {
    let mut out = Vec::new();
    let mut walker = vec![dir.to_path_buf()];
    while let Some(current) = walker.pop() {
        let entries = match fs::read_dir(&current) {
            Ok(it) => it,
            Err(_) => continue,
        };
        for entry in entries.flatten() {
            let path = entry.path();
            if path.is_dir() {
                walker.push(path);
            } else if path.extension().and_then(|s| s.to_str()) == Some("jsonl") {
                out.push(path);
            }
        }
    }
    out
}

/// Scan a JSONL file line-by-line for a `cell.command.v1.completed` event
/// and return its `data.exitCode` if present. Returns `None` if no such
/// event is in the file; returns `Err` only on malformed JSON (which is a
/// supervisor regression worth surfacing loudly).
fn find_exit_code_in_jsonl(path: &Path) -> Result<Option<i64>, String> {
    let contents = fs::read_to_string(path).map_err(|e| format!("read {}: {e}", path.display()))?;
    for (line_no, line) in contents.lines().enumerate() {
        let trimmed = line.trim();
        if trimmed.is_empty() {
            continue;
        }
        let value: serde_json::Value = serde_json::from_str(trimmed).map_err(|e| {
            format!(
                "{}:{}: malformed JSONL line: {e}",
                path.display(),
                line_no + 1
            )
        })?;
        let event_type = value.get("type").and_then(|v| v.as_str()).unwrap_or("");
        if event_type != COMMAND_COMPLETED_TYPE {
            continue;
        }
        // Found the event; extract data.exitCode. A missing/non-integer field
        // is a hard error — the schema guarantees it, so absence is a
        // regression we want to surface.
        let exit_code = value
            .get("data")
            .and_then(|d| d.get("exitCode"))
            .and_then(|c| c.as_i64())
            .ok_or_else(|| {
                format!(
                    "{}:{}: `{COMMAND_COMPLETED_TYPE}` event has no integer `data.exitCode`; \
                     full event: {trimmed}",
                    path.display(),
                    line_no + 1,
                )
            })?;
        return Ok(Some(exit_code));
    }
    Ok(None)
}

/// FC-16: drive the supervisor with a `/bin/sh -c "exit 42"` spec on the
/// Firecracker backend and assert the recorded exit code is exactly 42.
///
/// The flow mirrors the `/bin/true` round-trip in
/// `crates/cellos-supervisor/tests/firecracker_e2e.rs`; only the workload
/// argv and the post-run assertion differ. The shared skip-gate keeps this
/// test inert outside the firecracker-e2e CI lane.
#[test]
fn fc16_exit_code_canary_42() {
    // Precondition 1: KVM device.
    if !Path::new("/dev/kvm").exists() {
        skip("/dev/kvm not present (no KVM on this host)");
        return;
    }

    // Bridge ROOTFS aliases before checking required vars.
    handle_rootfs_alias();

    // Precondition 2: required env vars.
    let missing: Vec<&str> = REQUIRED_ENV
        .iter()
        .copied()
        .filter(|k| std::env::var_os(k).is_none())
        .collect();
    if !missing.is_empty() {
        skip(&format!("missing env: {}", missing.join(", ")));
        return;
    }

    // Precondition 3: required files exist on disk. Bad paths in env will
    // produce confusing errors deep inside the VMM; check up front.
    for key in [
        "CELLOS_FIRECRACKER_BINARY",
        "CELLOS_FIRECRACKER_KERNEL_IMAGE",
        "CELLOS_FIRECRACKER_ROOTFS_IMAGE",
    ] {
        let path = std::env::var(key).expect("checked above");
        if !Path::new(&path).exists() {
            skip(&format!("{key}={path} does not exist on disk"));
            return;
        }
    }

    // Precondition 4: socket dir exists (or can be created).
    let sock_dir = std::env::var("CELLOS_FIRECRACKER_SOCKET_DIR").expect("checked");
    if !Path::new(&sock_dir).is_dir() && fs::create_dir_all(&sock_dir).is_err() {
        skip(&format!("socket dir {sock_dir} not creatable"));
        return;
    }

    // Precondition 5: supervisor binary is built.
    let exe = supervisor_exe();
    if !exe.is_file() {
        skip(&format!(
            "supervisor binary missing at {}\
             run `cargo build -p cellos-supervisor --release` \
             (or set CELLOS_SUPERVISOR_BIN to a built binary)",
            exe.display()
        ));
        return;
    }

    // Build the cell spec: /bin/sh -c "exit 42", 64 MiB RAM, 30 s TTL, no
    // egress. The `argv` is JSON-encoded so the embedded quotes survive into
    // the supervisor's spec parser unmodified.
    //
    // Memory and TTL match firecracker_e2e.rs's `/bin/true` spec — the
    // canary's resource envelope is identical so any deviation in run-time
    // points at the workload, not the wrapper.
    let tmp = tempfile::tempdir().expect("tempdir");
    let spec_path = tmp.path().join("cell.json");
    let spec_json = r#"{
  "apiVersion": "cellos.io/v1",
  "kind": "ExecutionCell",
  "spec": {
    "id": "fc-e2e-exit42",
    "authority": { "secretRefs": [], "egressRules": [] },
    "lifetime": { "ttlSeconds": 30 },
    "run": {
      "argv": ["/bin/sh", "-c", "exit 42"],
      "timeoutMs": 20000,
      "limits": { "memoryMaxBytes": 67108864 }
    }
  }
}"#;
    File::create(&spec_path)
        .and_then(|mut f| f.write_all(spec_json.as_bytes()))
        .expect("write cell spec");

    // Per-run export dir so we can scan only this run's JSONL output.
    let export_dir = tmp.path().join("events");
    fs::create_dir_all(&export_dir).expect("mkdir export dir");

    // Build the command. Forward all CELLOS_FIRECRACKER_* and the backend
    // selector; the supervisor reads them directly. Same env contract as
    // firecracker_e2e.rs.
    let mut cmd = Command::new(&exe);
    cmd.env("CELL_OS_USE_NOOP_SINK", "1") // disable NATS sink
        .env("CELLOS_CELL_BACKEND", "firecracker")
        .env("CELLOS_EXPORT_DIR", &export_dir)
        .env("RUST_BACKTRACE", "1")
        .arg(&spec_path)
        .stdout(Stdio::piped())
        .stderr(Stdio::piped());

    // Inherit every CELLOS_FIRECRACKER_* var the harness set up.
    for (k, v) in std::env::vars_os() {
        if k.to_string_lossy().starts_with("CELLOS_FIRECRACKER_") {
            cmd.env(&k, &v);
        }
    }

    eprintln!(
        "fc16_exit_code_canary_42: spawning supervisor {} with /bin/sh -c \"exit 42\"",
        exe.display()
    );
    let mut child = cmd.spawn().expect("spawn supervisor");

    // Same 30 s wall-clock budget as firecracker_e2e.rs. Boot + sh + teardown
    // is ~5 s on a healthy runner; 30 s is the documented worst case.
    let deadline = Instant::now() + Duration::from_secs(30);
    let _status = loop {
        match child.try_wait().expect("try_wait") {
            Some(status) => break status,
            None if Instant::now() >= deadline => {
                let _ = child.kill();
                let _ = child.wait();
                panic!("supervisor did not exit within 30s");
            }
            None => std::thread::sleep(Duration::from_millis(200)),
        }
    };

    // Capture stderr/stdout for diagnostics on failure. Note: we deliberately
    // do NOT assert `status.success()` here. The supervisor's process exit
    // status is independent of the cell workload's exit code — a workload
    // that exits 42 cleanly should still drive a clean teardown, but the
    // ACCEPTANCE GATE is the recorded `exitCode` in the emitted event, not
    // the supervisor process's own status. If a future change wires the
    // supervisor's exit status to the workload's, the JSONL assertion below
    // is still the authoritative check.
    let mut stderr_buf = String::new();
    let mut stdout_buf = String::new();
    if let Some(mut s) = child.stderr.take() {
        use std::io::Read;
        let _ = s.read_to_string(&mut stderr_buf);
    }
    if let Some(mut s) = child.stdout.take() {
        use std::io::Read;
        let _ = s.read_to_string(&mut stdout_buf);
    }

    // Find the supervisor-emitted `cell.command.v1.completed` event in the
    // exported JSONL files and assert its `exitCode` is exactly 42.
    let jsonl_files = collect_jsonl_files(&export_dir);
    if jsonl_files.is_empty() {
        panic!(
            "no JSONL event files emitted under {}\n--- stderr ---\n{stderr_buf}\n\
             --- stdout ---\n{stdout_buf}",
            export_dir.display()
        );
    }

    let mut found_exit_code: Option<i64> = None;
    let mut scan_errors: Vec<String> = Vec::new();
    for file in &jsonl_files {
        match find_exit_code_in_jsonl(file) {
            Ok(Some(code)) => {
                found_exit_code = Some(code);
                break;
            }
            Ok(None) => continue,
            Err(e) => scan_errors.push(e),
        }
    }

    let recorded = found_exit_code.unwrap_or_else(|| {
        let scanned: Vec<String> = jsonl_files
            .iter()
            .map(|p| p.display().to_string())
            .collect();
        panic!(
            "FC-16: no `{COMMAND_COMPLETED_TYPE}` event found in any JSONL file under {}. \
             The supervisor produced JSONL output but it did not contain a command-completed \
             event — this means the cell never reported a clean exit through the vsock \
             handshake (FC-19), or the supervisor failed to emit the event after receiving it.\n\
             Scanned files: {scanned:?}\n\
             Scan errors: {scan_errors:?}\n\
             --- stderr ---\n{stderr_buf}\n--- stdout ---\n{stdout_buf}",
            export_dir.display(),
        )
    });

    assert_eq!(
        recorded, CANARY_EXIT_CODE as i64,
        "FC-16 violation: supervisor recorded exit code {recorded}, expected {CANARY_EXIT_CODE}.\n\
         The cell ran `/bin/sh -c \"exit 42\"`, so the only authentic value is 42. \
         Common regressions:\n\
         * recorded=0:        exit code is being defaulted (the field is unwired or the cell \
                              succeeded for the wrong reason — `cellos-init` fell through to \
                              power-off without reading the workload's status).\n\
         * recorded=1:        a generic `non-zero` is being substituted for the actual code \
                              somewhere between `cellos-init` and the supervisor's event \
                              recorder.\n\
         * recorded=128+N:    the workload was signal-killed and the i32 carries the kernel's \
                              waitpid(2) signal-encoding rather than the script's `exit` value.\n\
         * recorded=other:    vsock byte-ordering mismatch or buffer aliasing — the value is \
                              traveling but corrupted en route.\n\
         --- stderr ---\n{stderr_buf}\n--- stdout ---\n{stdout_buf}"
    );

    // Drop tmpdir last so any inspected artifacts stay valid until the
    // assertions complete.
    drop(tmp);
    let _ = OsString::new(); // silence unused-import warnings on some toolchains
}