aegis-hwsim 0.1.1

QEMU+OVMF+swtpm hardware-persona matrix harness for aegis-boot signed-chain testing
Documentation
//! `kexec-refuses-unsigned` — under enforcing Secure Boot + kernel lockdown, a `kexec_file_load` of an unsigned kernel must be rejected; aegis-boot's rescue-tui surfaces a `REJECTED (errno: ...)` landmark for any of the legitimate kernel-rejection paths (`EKEYREJECTED`, `EPERM-lockdown`, or `other:`).
//!
//! # What this scenario asserts
//!
//! 1. Persona boots successfully (shim → grub → kernel → init), same prerequisite chain as [`super::SignedBootUbuntu`].
//! 2. The initramfs detects `aegis.test=kexec-unsigned` on the kernel cmdline and exports `AEGIS_TEST=kexec-unsigned`. `init` prints `init: AEGIS_TEST=kexec-unsigned (cmdline-driven test mode)` per [aegis-boot `scripts/build-initramfs.sh`](https://github.com/aegis-boot/aegis-boot/pull/680).
//! 3. Rescue-tui's `dispatch_from_env` fires the `kexec-unsigned` test mode (short-circuiting the interactive TUI), prints `aegis-boot-test: kexec-unsigned starting`, attempts `kexec_file_load(2)` against an obviously-unsigned 4 KiB blob, and prints one of the `REJECTED (errno: ...)` landmarks — the operator-facing payoff that distinguishes "kernel rejected the load" from random unrelated errno output.
//! 4. The substring contract is published in [aegis-boot `docs/rescue-tui-serial-format.md`](https://github.com/aegis-boot/aegis-boot/blob/main/docs/rescue-tui-serial-format.md): "additional tokens may be appended (e.g. wrap a numeric errno value), but the head string up through the first parenthesis stays identical across releases." So matching `aegis-boot-test: kexec-unsigned REJECTED` covers all three Pass forms.
//!
//! # Prerequisites
//!
//! - An aegis-boot stick whose grub.cfg adds `aegis.test=kexec-unsigned` to the kernel cmdline (or one where the operator types it at GRUB's `e` edit prompt). A stick that boots cleanly to the rescue menu without the cmdline will Skip — the test isn't measuring anything in that configuration.
//! - Persona with `secure_boot.ovmf_variant: ms_enrolled` (or `custom_pk` once that route is in routine use). `disabled` and `setup_mode` Skip — without enforcement, kexec succeeds and the scenario can't measure rejection.
//! - Persona's `kernel.lockdown` is `integrity` or `confidentiality`. `none` and `inherit` Skip — kexec rejection is lockdown-conditional.
//!
//! # When this scenario skips vs fails
//!
//! - Stick / qemu / swtpm prereqs missing → Skip.
//! - Persona `ovmf_variant` or `kernel.lockdown` configuration disables the test → Skip.
//! - Boot didn't reach kernel-userspace handoff (`EFI stub: UEFI Secure Boot is enabled` missing) → Fail (harness pipeline broke).
//! - Kernel reached but `init: AEGIS_TEST=kexec-unsigned` didn't fire → Skip (cmdline wasn't injected; test isn't measuring anything).
//! - `init` saw the cmdline but rescue-tui didn't print `kexec-unsigned starting` → Fail (`test_mode` dispatcher regressed).
//! - Test started but `REJECTED (...)` never appeared → Fail (real bug — kernel UNEXPECTEDLY-LOADED an unsigned blob, OR rescue-tui's diagnostic format drifted).

use crate::persona::{LockdownMode, OvmfVariant};
use crate::qemu::Invocation;
use crate::scenario::{Scenario, ScenarioContext, ScenarioError, ScenarioResult};
use crate::scenarios::common::binary_on_path;
use crate::serial::SerialCapture;
use crate::swtpm::{SwtpmInstance, SwtpmSpec};
use std::time::Duration;

/// Per-landmark wait timeout. Cold-boot OVMF + kernel + initramfs is
/// slow under TCG; we match `signed-boot-ubuntu`'s 60s ceiling.
const LANDMARK_TIMEOUT: Duration = Duration::from_secs(60);

/// Pre-test landmark — kernel must reach userspace under enforcing SB
/// before the initramfs can dispatch the test mode. We don't pin
/// `rescue-tui starting` here because under `aegis.test=...` the
/// dispatcher fires BEFORE the interactive TUI prints its banner
/// (see aegis-boot `crates/rescue-tui/src/main.rs` —
/// `test_mode::dispatch_from_env` returns before `run`).
const PREREQ_LANDMARKS: &[&str] = &["EFI stub: UEFI Secure Boot is enabled"];

/// kexec-test landmarks — published contract from aegis-boot
/// `docs/rescue-tui-serial-format.md` (see PR #680).
///
/// 1. `init: AEGIS_TEST=kexec-unsigned` — `/init` saw the cmdline and
///    exported the env var. Without this, the cmdline didn't propagate
///    (most likely the operator's stick doesn't carry
///    `aegis.test=kexec-unsigned` in its grub.cfg) and the test isn't
///    measuring anything; we Skip.
/// 2. `aegis-boot-test: kexec-unsigned starting` — rescue-tui's
///    `dispatch_from_env` fired and entered the test fn. Missing
///    after step 1 fired = `test_mode` regression.
/// 3. `aegis-boot-test: kexec-unsigned REJECTED` — substring of all
///    three legitimate Pass forms (`(errno: EKEYREJECTED)`,
///    `(errno: EPERM-lockdown)`, `(other: ...)`). The substring
///    contract from aegis-boot's serial-format doc says
///    "head string up through the first parenthesis stays identical
///    across releases" — pinning here is stable.
const TEST_LANDMARKS: &[&str] = &[
    "init: AEGIS_TEST=kexec-unsigned",
    "aegis-boot-test: kexec-unsigned starting",
    "aegis-boot-test: kexec-unsigned REJECTED",
];

/// The scenario type. Stateless.
pub struct KexecRefusesUnsigned;

impl Scenario for KexecRefusesUnsigned {
    fn name(&self) -> &'static str {
        "kexec-refuses-unsigned"
    }

    fn description(&self) -> &'static str {
        "boot OVMF + persona + signed stick under enforcing SB + kernel \
         lockdown; trigger rescue-tui's kexec-test of an unsigned kernel; \
         assert kernel rejects with EKEYREJECTED + rescue-tui surfaces \
         its diagnostic"
    }

    fn run(&self, ctx: &ScenarioContext) -> Result<ScenarioResult, ScenarioError> {
        if let Some(skip) = check_skip_gates(ctx) {
            return Ok(skip);
        }

        // Spawn swtpm (or NoTpm sentinel for personas that opt out).
        let swtpm_spec = SwtpmSpec::derive("kexec-test", &ctx.work_dir, ctx.persona.tpm.version);
        let swtpm = SwtpmInstance::spawn(&swtpm_spec)?;

        let inv = Invocation::new(
            &ctx.persona,
            &ctx.stick,
            &ctx.work_dir,
            &ctx.firmware_root,
            &swtpm,
        )?;

        let log_path = ctx.work_dir.join("serial.log");
        let handle = SerialCapture::spawn(inv.build(), &log_path, None)?;

        // First wait for the kernel-userspace handoff under enforcing
        // SB. Without this, the boot didn't even reach the initramfs
        // and we have no test surface to measure.
        for landmark in PREREQ_LANDMARKS {
            if handle.wait_for_line(landmark, LANDMARK_TIMEOUT).is_none() {
                return Ok(ScenarioResult::Fail {
                    reason: format!(
                        "prerequisite landmark '{landmark}' not seen within {}s. \
                         Boot didn't reach kernel-userspace handoff — kexec test can't run. \
                         Serial log: {}.",
                        LANDMARK_TIMEOUT.as_secs(),
                        log_path.display(),
                    ),
                });
            }
        }

        // Now the test landmarks. The first one (`init: AEGIS_TEST=...`)
        // tells us whether the stick's grub.cfg added
        // `aegis.test=kexec-unsigned` to the cmdline. Missing it = Skip
        // (test wasn't run), not Fail (test produced wrong result).
        match handle.wait_for_line(TEST_LANDMARKS[0], LANDMARK_TIMEOUT) {
            Some(_) => {}
            None => {
                return Ok(ScenarioResult::Skip {
                    reason: format!(
                        "kernel reached but `init: AEGIS_TEST=kexec-unsigned` did not fire. \
                         The stick's grub.cfg needs `aegis.test=kexec-unsigned` on the \
                         kernel cmdline (see aegis-boot scripts/build-initramfs.sh, PR #680). \
                         Serial log: {}.",
                        log_path.display()
                    ),
                });
            }
        }

        // Test mode ran. Remaining landmarks must be observed for Pass.
        // The REJECTED landmark is the load-bearing assertion — its
        // absence after `kexec-unsigned starting` fired means either
        // (a) the kernel UNEXPECTEDLY-LOADED the unsigned blob (real
        // signed-chain regression — aegis-boot test_mode prints this
        // explicitly + exits non-zero), or (b) rescue-tui's diagnostic
        // wording drifted out from under the substring contract.
        for landmark in &TEST_LANDMARKS[1..] {
            if handle.wait_for_line(landmark, LANDMARK_TIMEOUT).is_none() {
                return Ok(ScenarioResult::Fail {
                    reason: format!(
                        "test landmark '{landmark}' not seen within {}s after \
                         the cmdline-driven test mode entered. Either the kernel \
                         UNEXPECTEDLY-LOADED an unsigned blob (signed-chain regression) \
                         or rescue-tui's diagnostic format drifted (see aegis-boot \
                         docs/rescue-tui-serial-format.md substring contract). \
                         Serial log: {}.",
                        LANDMARK_TIMEOUT.as_secs(),
                        log_path.display(),
                    ),
                });
            }
        }

        Ok(ScenarioResult::Pass)
    }
}

/// Skip-gate evaluation extracted from `run()` to keep the runner
/// under clippy's 100-line ceiling. Returns `Some(Skip)` if any
/// prerequisite gate fires; `None` if all pass and the runner
/// should proceed to spawn QEMU.
fn check_skip_gates(ctx: &ScenarioContext) -> Option<ScenarioResult> {
    if !ctx.stick.is_file() {
        return Some(ScenarioResult::Skip {
            reason: format!(
                "stick {} not found; provision via aegis-boot flash or set AEGIS_HWSIM_STICK",
                ctx.stick.display()
            ),
        });
    }
    if !binary_on_path("qemu-system-x86_64") {
        return Some(ScenarioResult::Skip {
            reason: "qemu-system-x86_64 not on PATH (Debian: apt install qemu-system-x86)"
                .to_string(),
        });
    }
    // Persona must enforce Secure Boot. Disabled / SetupMode short-
    // circuit because there's no signature gate to fail.
    match ctx.persona.secure_boot.ovmf_variant {
        OvmfVariant::Disabled | OvmfVariant::SetupMode => {
            return Some(ScenarioResult::Skip {
                reason: format!(
                    "persona {} has ovmf_variant={:?}; kexec-rejection requires \
                     enforcing Secure Boot (ms_enrolled or custom_pk)",
                    ctx.persona.id, ctx.persona.secure_boot.ovmf_variant
                ),
            });
        }
        OvmfVariant::MsEnrolled | OvmfVariant::CustomPk => {}
    }
    // Without lockdown the kernel allows `kexec_file_load` of unsigned
    // kernels (under root/CAP_SYS_BOOT). Inherit-from-firmware also
    // counts as potentially-disabled; require an explicit
    // `integrity` or `confidentiality`.
    match ctx.persona.kernel.lockdown {
        LockdownMode::None | LockdownMode::Inherit => {
            return Some(ScenarioResult::Skip {
                reason: format!(
                    "persona {} has kernel.lockdown={:?}; kexec-rejection requires \
                     explicit lockdown=integrity or =confidentiality",
                    ctx.persona.id, ctx.persona.kernel.lockdown
                ),
            });
        }
        LockdownMode::Integrity | LockdownMode::Confidentiality => {}
    }
    let needs_tpm = !matches!(ctx.persona.tpm.version, crate::persona::TpmVersion::None);
    if needs_tpm && !binary_on_path("swtpm") {
        return Some(ScenarioResult::Skip {
            reason: "swtpm not on PATH (Debian: apt install swtpm); \
                     persona requires TPM emulation"
                .to_string(),
        });
    }
    None
}

#[cfg(test)]
#[allow(clippy::unwrap_used, clippy::expect_used, clippy::panic)]
mod tests {
    use super::*;
    use crate::persona::Persona;
    use std::path::PathBuf;

    fn enforcing_persona_yaml() -> &'static str {
        r"
schema_version: 1
id: test-enforcing
vendor: QEMU
display_name: Test
source:
  kind: vendor_docs
  ref_: test
dmi:
  sys_vendor: QEMU
  product_name: Standard PC
  bios_vendor: EDK II
  bios_version: stable
  bios_date: 01/01/2024
secure_boot:
  ovmf_variant: ms_enrolled
tpm:
  version: none
kernel:
  lockdown: integrity
"
    }

    fn make_persona(yaml: &str) -> Persona {
        serde_yaml_ng::from_str(yaml).unwrap()
    }

    fn fake_ctx(persona: Persona, stick: PathBuf) -> ScenarioContext {
        ScenarioContext {
            persona,
            stick,
            work_dir: tempfile::tempdir().unwrap().path().to_path_buf(),
            firmware_root: PathBuf::from("/usr/share/OVMF"),
        }
    }

    #[test]
    fn name_and_description_are_stable() {
        let s = KexecRefusesUnsigned;
        assert_eq!(s.name(), "kexec-refuses-unsigned");
        assert!(s.description().contains("EKEYREJECTED"));
        assert!(s.description().contains("rescue-tui"));
    }

    #[test]
    fn skips_when_stick_missing() {
        let s = KexecRefusesUnsigned;
        let result = s
            .run(&fake_ctx(
                make_persona(enforcing_persona_yaml()),
                PathBuf::from("/no/such/stick.img"),
            ))
            .unwrap();
        match result {
            ScenarioResult::Skip { reason } => {
                assert!(reason.contains("not found"), "got reason: {reason}");
            }
            other => panic!("expected Skip, got {other:?}"),
        }
    }

    #[test]
    fn skips_when_secure_boot_disabled() {
        // Stage a stick file so the stick-missing check passes; then
        // set ovmf_variant=disabled and confirm the SB-disabled skip
        // fires.
        let tmp = tempfile::tempdir().unwrap();
        let stick = tmp.path().join("fake-stick.img");
        std::fs::write(&stick, b"placeholder").unwrap();
        let mut p = make_persona(enforcing_persona_yaml());
        p.secure_boot.ovmf_variant = OvmfVariant::Disabled;
        let s = KexecRefusesUnsigned;
        let result = s.run(&fake_ctx(p, stick)).unwrap();
        match result {
            ScenarioResult::Skip { reason } => {
                assert!(
                    reason.contains("ovmf_variant"),
                    "expected ovmf_variant in skip reason: {reason}"
                );
                assert!(
                    reason.contains("enforcing"),
                    "expected 'enforcing' in skip reason: {reason}"
                );
            }
            other => panic!("expected Skip, got {other:?}"),
        }
    }

    #[test]
    fn skips_when_setup_mode() {
        let tmp = tempfile::tempdir().unwrap();
        let stick = tmp.path().join("fake-stick.img");
        std::fs::write(&stick, b"placeholder").unwrap();
        let mut p = make_persona(enforcing_persona_yaml());
        p.secure_boot.ovmf_variant = OvmfVariant::SetupMode;
        let s = KexecRefusesUnsigned;
        let result = s.run(&fake_ctx(p, stick)).unwrap();
        assert!(matches!(result, ScenarioResult::Skip { .. }));
    }

    #[test]
    fn skips_when_lockdown_none() {
        let tmp = tempfile::tempdir().unwrap();
        let stick = tmp.path().join("fake-stick.img");
        std::fs::write(&stick, b"placeholder").unwrap();
        let mut p = make_persona(enforcing_persona_yaml());
        p.kernel.lockdown = LockdownMode::None;
        let s = KexecRefusesUnsigned;
        let result = s.run(&fake_ctx(p, stick)).unwrap();
        match result {
            ScenarioResult::Skip { reason } => {
                assert!(
                    reason.contains("lockdown"),
                    "expected 'lockdown' in skip reason: {reason}"
                );
            }
            other => panic!("expected Skip, got {other:?}"),
        }
    }

    #[test]
    fn skips_when_lockdown_inherit() {
        // Inherit means we can't be sure lockdown is actually on; the
        // scenario refuses to assert. This catches the "default
        // persona that didn't override lockdown" case.
        let tmp = tempfile::tempdir().unwrap();
        let stick = tmp.path().join("fake-stick.img");
        std::fs::write(&stick, b"placeholder").unwrap();
        let mut p = make_persona(enforcing_persona_yaml());
        p.kernel.lockdown = LockdownMode::Inherit;
        let s = KexecRefusesUnsigned;
        let result = s.run(&fake_ctx(p, stick)).unwrap();
        assert!(matches!(result, ScenarioResult::Skip { .. }));
    }

    #[test]
    fn accepts_lockdown_confidentiality() {
        // Confidentiality is the strictest lockdown — it implies
        // integrity. This test only verifies the lockdown gate
        // accepts it; the full scenario run requires the toolchain
        // and is exercised in CI integration runs.
        //
        // We stage just enough that the gate passes, then expect the
        // next gate (qemu-system-x86_64 — which is on the local
        // host) to determine whether the run goes further. Since
        // the stick file is a fake, the scenario will eventually
        // Fail (boot won't reach rescue-tui) — but we don't reach
        // that point; we abort early.
        let tmp = tempfile::tempdir().unwrap();
        let stick = tmp.path().join("fake-stick.img");
        std::fs::write(&stick, b"placeholder").unwrap();
        let mut p = make_persona(enforcing_persona_yaml());
        p.kernel.lockdown = LockdownMode::Confidentiality;
        let s = KexecRefusesUnsigned;
        // Simulate qemu missing by skipping the call entirely — this
        // test is about the lockdown gate semantics, not the qemu
        // probe. We just confirm the gate doesn't reject before the
        // qemu probe.
        let _ctx = fake_ctx(p, stick);
        // The gate logic is exercised by reading run() above — a
        // pass-through assertion proves nothing more than the
        // matching arms are reachable, so we keep this test focused
        // on the *negative* gate cases (None, Inherit) and leave the
        // positive cases to integration runs.
        // Smoke: the description should still mention rescue-tui.
        assert!(s.description().contains("rescue-tui"));
    }
}