ktstr 0.4.14

Test harness for Linux process schedulers
# Snapshots

A **snapshot** is a frozen record of guest BPF map state and scheduler
globals captured at a specific point in a scenario. The freeze
coordinator pauses every vCPU long enough to walk the kernel's BPF
maps, BTF-render every captured value, and bundle the result into a
`FailureDumpReport` keyed by a name you choose. Test code then reads
it back via the [`Snapshot`](#reading-the-captured-report) accessor for typed traversal.

`Op::snapshot("name")` is the **on-demand** capture trigger. Use it to
ask "what does the scheduler look like *right now*?" at a precise
point in the scenario. For automatic capture on a kernel write to a
specific symbol, see [Watch Snapshots](watch-snapshots.md).

## Issuing a snapshot

`Op::snapshot(name)` is a single op in a [`Step`](../concepts/ops.md)'s op list. The
executor invokes the active [`SnapshotBridge`](#wiring-the-bridge)'s capture callback,
which performs the freeze rendezvous and returns the report; the
bridge stores the report under `name`.

```rust,ignore
use ktstr::prelude::*;

let steps = vec![Step {
    setup: vec![CgroupDef::named("workers").workers(2)].into(),
    ops: vec![
        Op::snapshot("after_spawn"),
        // ... other ops ...
        Op::snapshot("after_workload"),
    ],
    hold: HoldSpec::FULL,
}];
execute_steps(ctx, steps)?;
```

A scenario may issue any number of `Op::snapshot` ops with distinct
names. Reusing a name overwrites the prior capture (and emits a
`tracing::warn!`).

## Wiring the bridge

The bridge is what turns an `Op::snapshot` into stored data. The host
typically wires it before `execute_steps` runs, but a scenario can
install one inline:

```rust,ignore
use ktstr::prelude::*;

let cb: CaptureCallback = std::sync::Arc::new(|_name: &str| {
    // Production: freeze the VM and build a real FailureDumpReport.
    // Tests: return a hand-crafted report so the executor + bridge
    // pipeline runs without booting a guest.
    Some(FailureDumpReport::default())
});
let bridge = SnapshotBridge::new(cb);
let bridge_handle = bridge.clone();
let _guard = bridge.set_thread_local();

execute_steps(ctx, steps)?;

let captured = bridge_handle.drain();
let report = captured.get("after_spawn").expect("snapshot recorded");
```

`set_thread_local` returns a [`BridgeGuard`](#wiring-the-bridge) that restores the prior
bridge on drop, so a nested scenario inside an outer one cannot leak
its bridge into the outer scope. Bind the guard to an
underscore-prefixed identifier such as `_guard` so the binding lives
for the scope of the scenario — a bare `let _ = bridge.set_thread_local()`
drops the guard immediately and clears the bridge before any op runs.
`must_use` will warn if the return value is discarded entirely.

If no bridge is installed, `Op::snapshot` is a no-op with a
`tracing::warn!` and the scenario continues. If the capture callback
returns `None` (capture pipeline unavailable), the bridge stays empty
and the scenario continues. Existing scenarios that never declare
snapshot ops keep working unchanged.

## Reading the captured report

[`Snapshot::new(report)`](#reading-the-captured-report) builds a borrowed view over a
`FailureDumpReport`. The view does not copy the report; accessor
methods walk the report in place and return further borrowed views.

### Map-name lookup

```rust,ignore
let snap = Snapshot::new(report);
let map = snap.map("scx_per_task")?;        // SnapshotMap
```

`Snapshot::map(name)` returns `Result<SnapshotMap, SnapshotError>`. A
miss yields `SnapshotError::MapNotFound { requested, available }` —
the `available` list enumerates every captured map name so a typo
surfaces in test output.

### Top-level globals (.bss / .data / .rodata)

```rust,ignore
let nr_cpus = snap.var("nr_cpus_onln").as_u64()?;
```

`Snapshot::var(name)` walks every `*.bss`, `*.data`, and `*.rodata`
global-section map for a top-level member named `name` and returns the
first hit as a [`SnapshotField`](#terminal-accessors). A miss yields
`SnapshotError::VarNotFound { requested, available }` with the union
of every section's top-level member names.

### Entries inside a map

```rust,ignore
let map = snap.map("scx_per_task")?;
let first = map.at(0);                          // by ordinal index
let busy = map.find(|e| e.get("tid").as_i64().unwrap_or(-1) == 1234);
let busiest = map.max_by(|e| e.get("runtime_ns").as_u64().unwrap_or(0));
let all_active = map.filter(|e| e.get("runtime_ns").as_u64().unwrap_or(0) > 0);
```

`SnapshotMap` exposes:

- `at(n)` — entry at ordinal index `n`. Out of range returns
  `SnapshotEntry::Missing(SnapshotError::IndexOutOfRange)`.
- `find(predicate)` — first matching entry. No match returns
  `SnapshotEntry::Missing(SnapshotError::NoMatch { op: "find", ... })`.
- `filter(predicate)` — every matching entry collected into a `Vec`.
- `max_by(key_fn)` — entry whose `key_fn` produces the maximum `u64`.
  Empty map returns `Missing` with `op: "max_by"`.

### Per-CPU maps

`BPF_MAP_TYPE_PERCPU_ARRAY` / `_PERCPU_HASH` / `_LRU_PERCPU_HASH` maps
require narrowing to a CPU before reading individual values:

```rust,ignore
let map = snap.map("scx_pcpu")?;
let entry = map.cpu(1).at(0);                    // CPU 1's slot
let value = entry.get("").as_u64()?;             // empty path = root
```

`SnapshotMap::cpu(n)` narrows subsequent `at` / `find` calls to a
specific CPU's slot. An out-of-range CPU returns `Missing` with
`SnapshotError::PerCpuSlot { unmapped: false, len, ... }`; an
unmapped slot (`None` in the per-CPU vec) returns the same error
variant with `unmapped: true`.

Calling `entry.get(path)` on a per-CPU entry **without** narrowing
first surfaces `SnapshotError::PerCpuNotNarrowed { map }` — call
`.cpu(N)` first.

## Field accessors and dotted paths

`SnapshotEntry::get(path)` and `SnapshotField::get(path)` walk the
entry's value side along a dotted path. Each component matches a
struct member; pointer dereferences are followed transparently.

```rust,ignore
let weight = entry.get("ctx.weight").as_u64()?;
let policy = entry.get("ctx.policy").as_str()?;     // enum variant name
let pid    = entry.get("leader.pid").as_i64()?;     // pointer chase
```

The dotted-path walker:

1. **Pointer chase.** When a path step lands on
   `RenderedValue::Ptr { deref: Some(...) }`, the walker
   transparently follows the dereference (up to 16 hops) before
   matching the next component. The test author writes the path the
   BTF would suggest; pointer indirection is invisible.

2. **Empty path.** `get("")` returns the current value as a
   `SnapshotField::Value` — useful for terminal accessors on per-CPU
   slots that hold a scalar directly.

3. **Composability.** Two-segment paths are equivalent to chained
   `get` calls: `entry.get("ctx.weight")`   `entry.get("ctx").get("weight")`.

   Note that [`Snapshot::var`]#top-level-globals-bss--data--rodata does **not** split — it treats the full
   string as one global name. To walk into a struct, use
   `snap.var("ctx").get("weight")`.

### Terminal accessors

`SnapshotField` exposes typed terminal reads, all returning
`Result<T, SnapshotError>`:

| Method | Returns | Accepts |
|---|---|---|
| `as_u64()` | `u64` | `Uint`, non-negative `Int`/`Enum`, `Bool` (0/1), `Char` (raw byte), `Ptr` (pointer value), per-CPU array key |
| `as_i64()` | `i64` | `Int`, `Uint` ≤ i64::MAX, `Bool`, `Char`, `Enum`, per-CPU array key |
| `as_bool()` | `bool` | `Bool` direct; `Int`/`Uint`/`Char`/`Enum`/`Ptr` non-zero is true; per-CPU array key |
| `as_f64()` | `f64` | `Float`, `Int`, `Uint`, `Enum`, per-CPU array key |
| `as_str()` | `&str` | `Enum` with a resolved variant name |
| `rendered()` | `Option<&RenderedValue>` | the underlying value when present |

Type mismatches surface as `SnapshotError::TypeMismatch { requested,
actual, path }` — for example, `as_str()` on a `Uint` reports
`actual: "Uint"`.

## Error handling

[`SnapshotError`](#error-handling) is the unified error type for every fallible
accessor. Each variant carries the path or available alternatives
needed to fix the call site without re-running the test:

- `MapNotFound { requested, available }``Snapshot::map(name)` miss.
- `VarNotFound { requested, available }``Snapshot::var(name)` miss.
- `FieldNotFound { path, walked, component, available }` — a path
  component did not match any struct member at that depth. `walked`
  is the prefix that resolved successfully; `component` is the
  failing segment.
- `NotAStruct { path, walked, component, kind }` — a path component
  reached a non-struct value where a struct was expected (e.g.
  descending into a `Uint` leaf). `kind` names the actual variant.
- `TypeMismatch { requested, actual, path }` — terminal accessor
  called on a rendered shape it cannot decode.
- `IndexOutOfRange { map, index, len }``SnapshotMap::at(n)` past
  the entry list end.
- `PerCpuSlot { map, cpu, len, unmapped }` — out-of-range or unmapped
  per-CPU slot; `unmapped: true` distinguishes a `None` slot from an
  out-of-range CPU.
- `NoMatch { map, op }` — predicate-based lookup (`find`, `max_by`)
  found no match. `op` names the operation.
- `EmptyPathComponent { path }` — a path string contained an empty
  component (e.g. `"a..b"`).
- `PerCpuNotNarrowed { map }``entry.get` called on a per-CPU entry
  without `cpu(N)` first.
- `NoRendered { map, side }` — entry has no rendered key/value side
  (BTF type id missing at capture time, leaving hex bytes only).

`SnapshotError` implements `std::error::Error` and `Display`, so it
composes with `?` and `anyhow`. The `Display` impl includes the path
and any available alternatives so a failure message points the test
author at the fix.

## Worked example

Capture a snapshot, look up a map, walk into its first entry, and
read a nested field:

```rust,ignore
use ktstr::prelude::*;

fn snapshot_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
    // Wire a bridge for the duration of the scenario.
    let cb: CaptureCallback = std::sync::Arc::new(|_name| {
        // Production: freeze + build a real FailureDumpReport. The
        // host installs this callback in real runs.
        Some(FailureDumpReport::default())
    });
    let bridge = SnapshotBridge::new(cb);
    let handle = bridge.clone();
    let _guard = bridge.set_thread_local();

    // Run the scenario, capturing once after spawn.
    let steps = vec![Step {
        setup: vec![CgroupDef::named("workers").workers(2)].into(),
        ops: vec![Op::snapshot("after_spawn")],
        hold: HoldSpec::FULL,
    }];
    let mut result = execute_steps(ctx, steps)?;

    // Drain the bridge and inspect the captured report.
    let captured = handle.drain();
    let report = captured
        .get("after_spawn")
        .ok_or_else(|| anyhow::anyhow!("snapshot 'after_spawn' missing"))?;
    let snap = Snapshot::new(report);

    // Top-level scalar.
    if let Ok(nr_cpus) = snap.var("nr_cpus_onln").as_u64() {
        result.details.push(AssertDetail::new(
            DetailKind::Other,
            format!("captured nr_cpus_onln = {nr_cpus}"),
        ));
    }

    Ok(result)
}
```

For the executor + bridge wiring outside a VM, see the host-side
smoke tests in `tests/snapshot_e2e.rs` — they exercise the same
pipeline against a hand-crafted `FailureDumpReport` so the assertion
shape is covered without booting a guest.

## Composing reads with writes

Snapshots are the **read** half of the host↔guest interaction. The
**write** half — pre-seeding a BPF map value before the scenario
starts — is the `#[ktstr_test]` attribute `bpf_map_write = CONST`,
which targets a `BpfMapWrite` constant:

```rust,ignore
use ktstr::prelude::*;

const TRIGGER_FAULT: BpfMapWrite = BpfMapWrite {
    map_name_suffix: ".bss",   // matched against discovered maps
    offset: 42,                // byte offset within the map's value
    value: 1,                  // u32 written by the host
};

#[ktstr_test(bpf_map_write = TRIGGER_FAULT, expect_err = true)]
fn fault_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
    // The host has already written `1` at `.bss + 42` before
    // the scenario started. Capture and inspect the resulting
    // scheduler state mid-run.
    /* bridge wiring + Op::snapshot + Snapshot::new as above */
    Ok(AssertResult::pass())
}
```

The write is event-driven: the host polls for BPF map
discoverability (scheduler loaded), polls the SHM ring for
scenario start, then writes the configured u32 at the configured
offset. Only `BPF_MAP_TYPE_ARRAY` maps are supported; the framework
finds the map by `map_name_suffix` (e.g. `".bss"`) via
`BpfMapAccessor::find_map`. See [Monitor → BPF map writes](../architecture/monitor.md)
for the prerequisites (vmlinux) and the full host-side
contract.

Read+write workflows then compose naturally: the test pre-seeds
guest state with `bpf_map_write`, lets the scheduler run, and
asserts on the resulting state with `Op::snapshot` + the
[`Snapshot`](#reading-the-captured-report) accessor:

1. **Write (pre-scenario)**`bpf_map_write` flips a `.bss` flag
   the scheduler reads.
2. **Run** — the scenario's ops drive workload behavior; the
   scheduler reacts to the flag.
3. **Read (mid-scenario)**`Op::snapshot("after")` captures the
   scheduler state at the chosen point.
4. **Assert**`Snapshot::var(...).as_u64()` /
   `Snapshot::map(...).find(...).get(...).as_*()` verifies the
   reaction. Errors carry the available alternatives so a typo or
   stale field name surfaces before the test author hand-edits the
   case.

The write side is a single one-shot poke at scheduler-load time;
there is no `Op` variant for runtime writes. Ergonomic mid-scenario
state mutation is reserved for cases where the scheduler itself
exports a writable interface (sysfs, debugfs, BPF map command
interface) and the test invokes that interface from a workload
process.