# Snapshots
A **snapshot** is a frozen record of guest BPF map state and scheduler
globals captured at a specific point in a scenario. The freeze
coordinator pauses every vCPU long enough to walk the kernel's BPF
maps, BTF-render every captured value, and bundle the result into a
`FailureDumpReport` keyed by a name you choose. Test code then reads
it back via the [`Snapshot`](#reading-the-captured-report) accessor for typed traversal.
`Op::snapshot("name")` is the **on-demand** capture trigger. Use it to
ask "what does the scheduler look like *right now*?" at a precise
point in the scenario. For automatic capture on a kernel write to a
specific symbol, see [Watch Snapshots](watch-snapshots.md). For
**cadenced** capture across the workload window without invoking
`Op::snapshot` from the scenario body, see
[Periodic Capture](periodic-capture.md) — it produces a time-ordered
[`SampleSeries`](temporal-assertions.md#sampleseries) that flows into
the [temporal-assertion](temporal-assertions.md) patterns
(`nondecreasing`, `rate_within`, `steady_within`, `converges_to`,
`always_true`, `ratio_within`).
## Issuing a snapshot
`Op::snapshot(name)` is a single op in a [`Step`](../concepts/ops.md)'s op list. The
executor invokes the active [`SnapshotBridge`](#wiring-the-bridge)'s capture callback,
which performs the freeze rendezvous and returns the report; the
bridge stores the report under `name`.
```rust,ignore
use ktstr::prelude::*;
let steps = vec![Step {
setup: vec![CgroupDef::named("workers").workers(2)].into(),
ops: vec![
Op::snapshot("after_spawn"),
// ... other ops ...
Op::snapshot("after_workload"),
],
hold: HoldSpec::FULL,
}];
execute_steps(ctx, steps)?;
```
A scenario may issue any number of `Op::snapshot` ops with distinct
names. Reusing a name overwrites the prior capture (and emits a
`tracing::warn!`).
## Wiring the bridge
The bridge is what turns an `Op::snapshot` into stored data. The host
typically wires it before `execute_steps` runs, but a scenario can
install one inline:
```rust,ignore
use ktstr::prelude::*;
let captured = bridge_handle.drain();
let report = captured.get("after_spawn").expect("snapshot recorded");
```
`set_thread_local` returns a [`BridgeGuard`](#wiring-the-bridge) that restores the prior
bridge on drop, so a nested scenario inside an outer one cannot leak
its bridge into the outer scope. Bind the guard to an
underscore-prefixed identifier such as `_guard` so the binding lives
for the scope of the scenario — a bare `let _ = bridge.set_thread_local()`
drops the guard immediately and clears the bridge before any op runs.
`must_use` will warn if the return value is discarded entirely.
If no bridge is installed, `Op::snapshot` is a no-op with a
`tracing::warn!` and the scenario continues. If the capture callback
returns `None` (capture pipeline unavailable), the bridge stays empty
and the scenario continues. Existing scenarios that never declare
snapshot ops keep working unchanged.
## Reading the captured report
[`Snapshot::new(report)`](#reading-the-captured-report) builds a borrowed view over a
`FailureDumpReport`. The view does not copy the report; accessor
methods walk the report in place and return further borrowed views.
### Map-name lookup
```rust,ignore
let snap = Snapshot::new(report);
let map = snap.map("scx_per_task")?; // SnapshotMap
```
`Snapshot::map(name)` returns `Result<SnapshotMap, SnapshotError>`. A
miss yields `SnapshotError::MapNotFound { requested, available }` —
the `available` list enumerates every captured map name so a typo
surfaces in test output.
### Top-level globals (.bss / .data / .rodata)
```rust,ignore
let nr_cpus = snap.var("nr_cpus_onln").as_u64()?;
```
`Snapshot::var(name)` walks every `*.bss`, `*.data`, and `*.rodata`
global-section map for a top-level member named `name` and returns
the unique match as a [`SnapshotField`](#terminal-accessors).
Multiple matches yield
`SnapshotError::AmbiguousVar { requested, found_in }` —
disambiguate via `Snapshot::map(name)`. A miss yields
`SnapshotError::VarNotFound { requested, available }` with the
union of every section's top-level member names.
### Entries inside a map
```rust,ignore
let map = snap.map("scx_per_task")?;
let first = map.at(0); // by ordinal index
let all_active = map.filter(|e| e.get("runtime_ns").as_u64().unwrap_or(0) > 0);
```
`SnapshotMap` exposes:
- `at(n)` — entry at ordinal index `n`. Out of range returns
`SnapshotEntry::Missing(SnapshotError::IndexOutOfRange)`.
- `find(predicate)` — first matching entry. No match returns
`SnapshotEntry::Missing(SnapshotError::NoMatch { op: "find", ... })`.
- `filter(predicate)` — every matching entry collected into a `Vec`.
- `max_by(key_fn)` — entry whose `key_fn` produces the maximum `u64`.
Empty map returns `Missing` with `op: "max_by"`.
### Per-CPU maps
`BPF_MAP_TYPE_PERCPU_ARRAY` / `_PERCPU_HASH` / `_LRU_PERCPU_HASH` maps
require narrowing to a CPU before reading individual values:
```rust,ignore
let map = snap.map("scx_pcpu")?;
let entry = map.cpu(1).at(0); // CPU 1's slot
let value = entry.get("").as_u64()?; // empty path = root
```
`SnapshotMap::cpu(n)` narrows subsequent `at` / `find` calls to a
specific CPU's slot. An out-of-range CPU returns `Missing` with
`SnapshotError::PerCpuSlot { unmapped: false, len, ... }`; an
unmapped slot (`None` in the per-CPU vec) returns the same error
variant with `unmapped: true`.
Calling `entry.get(path)` on a per-CPU entry **without** narrowing
first surfaces `SnapshotError::PerCpuNotNarrowed { map }` — call
`.cpu(N)` first.
## Field accessors and dotted paths
`SnapshotEntry::get(path)` and `SnapshotField::get(path)` walk the
entry's value side along a dotted path. Each component matches a
struct member; pointer dereferences are followed transparently.
```rust,ignore
let weight = entry.get("ctx.weight").as_u64()?;
let policy = entry.get("ctx.policy").as_str()?; // enum variant name
let pid = entry.get("leader.pid").as_i64()?; // pointer chase
```
The dotted-path walker:
1. **Pointer chase.** When a path step lands on
`RenderedValue::Ptr { deref: Some(...) }`, the walker
transparently follows the dereference (up to 16 hops) before
matching the next component. The test author writes the path the
BTF would suggest; pointer indirection is invisible.
2. **Empty path.** `get("")` returns the current value as a
`SnapshotField::Value` — useful for terminal accessors on per-CPU
slots that hold a scalar directly.
3. **Composability.** Two-segment paths are equivalent to chained
`get` calls: `entry.get("ctx.weight")` ≡
`entry.get("ctx").get("weight")`.
Note that [`Snapshot::var`](#top-level-globals-bss--data--rodata) does **not** split — it treats the full
string as one global name. To walk into a struct, use
`snap.var("ctx").get("weight")`.
### Terminal accessors
`SnapshotField` exposes typed terminal reads, all returning
`Result<T, SnapshotError>`:
| `as_u64()` | `u64` | `Uint`, non-negative `Int`/`Enum`, `Bool` (0/1), `Char` (raw byte), `Ptr` (pointer value, including cast-recovered pointers — see [Cast-recovered pointers](#cast-recovered-pointers)), per-CPU array key |
| `as_i64()` | `i64` | `Int`, `Uint` ≤ i64::MAX, `Bool`, `Char`, `Enum`, per-CPU array key |
| `as_bool()` | `bool` | `Bool` direct; `Int`/`Uint`/`Char`/`Enum`/`Ptr` non-zero is true; per-CPU array key |
| `as_f64()` | `f64` | `Float`, `Int`, `Uint`, `Enum`, per-CPU array key |
| `as_str()` | `&str` | `Enum` with a resolved variant name |
| `rendered()` | `Option<&RenderedValue>` | the underlying value when present |
Type mismatches surface as `SnapshotError::TypeMismatch { expected,
actual, requested }` — for example, `as_str()` on a `Uint` reports
`expected: "Enum"`, `actual: "Uint"`.
### Cast-recovered pointers
Schedulers stash kernel pointers (`task_struct *`, `cgroup *`, …)
and arena pointers in BPF map fields whose BTF declares them as
`u64` because BTF cannot express a pointer to a per-allocation
type. The host-side
[cast analyzer](../architecture/monitor.md#cast-analysis) walks the
scheduler's `.bpf.o` instruction stream during load, recovers the
target struct for each provable `(source_struct, field_offset) →
target_struct` mapping, and feeds the result into the renderer.
When the renderer encounters a `u64` slot the analyzer flagged, it
emits a [`RenderedValue::Ptr`](#field-accessors-and-dotted-paths)
with `cast_annotation` set and chases the dereference through the
address-space-appropriate reader. The full set of
`cast_annotation` values:
| `"cast→arena"` | Cast analyzer flagged a `u64` field; chase resolved to an arena allocation via the BTF-typed pointee. |
| `"cast→kernel"` | Cast analyzer flagged a `u64` field; chase resolved to a kernel slab / vmalloc / per-cpu allocation. |
| `"sdt_alloc"` | BTF-typed `Type::Ptr` whose pointee was a `BTF_KIND_FWD`; the renderer recovered the real payload struct id via the `sdt_alloc` bridge. No cast-analyzer hit was involved. |
| `"cast→arena (sdt_alloc)"` | Cast analyzer flagged a `u64` field AND the chase target peeled to a Fwd; the bridge recovered the real arena payload struct id. |
| `"cast→kernel (sdt_alloc)"` | Cast analyzer flagged a `u64` field AND the chase target peeled to a Fwd; the bridge recovered the real kernel-side struct id. |
A parallel cross-BTF Fwd resolution path is consulted whenever a
chase target survives the local same-BTF Fwd resolve as a
`BTF_KIND_FWD`: when the body lives in a sibling embedded BPF
object's BTF (the multi-`.bpf.objs` shape), the renderer switches
the recursion to that sibling BTF and renders the full body.
Cross-BTF resolution does NOT add a new annotation — the body is
recovered transparently and the rendered subtree carries whichever
annotation (`"cast→arena"`, `"cast→kernel"`, or `None` for a
BTF-typed `Type::Ptr`) it would have had if the same struct lived
in the entry BTF.
From the test author's perspective:
- `as_u64()` returns the raw pointer value (matching pre-analysis
behavior, so existing tests do not need updating).
- `entry.get("ctx.task")` and similar dotted-path walks transparently
follow the cast-recovered chase; nested struct fields appear under
the same path the BTF would suggest for a natively-typed pointer.
- The `cast_annotation` is visible in failure-dump rendering and
diagnostic output so an operator can distinguish cast-recovered
pointers from BTF-typed ones; the test API does not require any
extra calls to consume them.
## Error handling
[`SnapshotError`](#error-handling) is the unified error type for every fallible
accessor. Each variant carries the path or available alternatives
needed to fix the call site without re-running the test:
- `MapNotFound { requested, available }` — `Snapshot::map(name)` miss.
- `VarNotFound { requested, available }` — `Snapshot::var(name)` miss.
- `AmbiguousVar { requested, found_in }` — more than one
`*.bss`/`*.data`/`*.rodata` map exposes a top-level member with the
requested name. `found_in` lists every map (in capture order)
where the name was seen; disambiguate via `Snapshot::map(name)` +
`.at(0).get(...)` against a specific map.
- `FieldNotFound { requested, walked, component, available }` — a
path component did not match any struct member at that depth.
`walked` is the prefix that resolved successfully; `component` is
the failing segment; `requested` is the original user-supplied
path.
- `NotAStruct { requested, walked, component, kind }` — a path
component reached a non-struct value where a struct was expected
(e.g. descending into a `Uint` leaf). `kind` names the actual
variant.
- `TypeMismatch { expected, actual, requested }` — terminal
accessor called on a rendered shape it cannot decode. `expected`
names the scalar type the accessor requires; `actual` names the
rendered variant; `requested` is the user-supplied lookup string
(empty when the accessor was invoked on a leaf without a path
walk).
- `IndexOutOfRange { map, index, len }` — `SnapshotMap::at(n)` past
the entry list end.
- `PerCpuSlot { map, cpu, len, unmapped }` — out-of-range or unmapped
per-CPU slot; `unmapped: true` distinguishes a `None` slot from an
out-of-range CPU.
- `NoMatch { map, op }` — predicate-based lookup (`find`, `max_by`)
found no match. `op` names the operation.
- `EmptyPathComponent { requested }` — a path string contained an
empty component (e.g. `"a..b"`).
- `PerCpuNotNarrowed { map }` — `entry.get` called on a per-CPU entry
without `cpu(N)` first.
- `NoRendered { map, side }` — entry has no rendered key/value side
(BTF type id missing at capture time, leaving hex bytes only).
- `PlaceholderSample { tag, reason }` — a periodic-capture sample's
underlying `FailureDumpReport` is a placeholder produced by the
freeze-rendezvous timeout fallback. Surfaces when projecting via
[`SampleSeries::bpf`](temporal-assertions.md#projecting-from-bpf-state);
temporal patterns route the variant through their skip path so a
placeholder never falsely registers as zero progress against a
monotonicity / rate / steady / ratio band. `reason` carries the
rendezvous-timeout cause text.
- `MissingStats { tag }` — a [`SampleSeries::stats`](temporal-assertions.md#projecting-from-scx_stats-json)
projection ran on a sample whose `stats` slot is `None` (stats
client not wired or per-sample stats request failed). Distinct
from in-JSON path misses (`FieldNotFound` / `TypeMismatch`) so the
assertion site can branch on the cause without re-walking the
source.
`SnapshotError` implements `std::error::Error` and `Display`, so it
composes with `?` and `anyhow`. The `Display` impl includes the path
and any available alternatives so a failure message points the test
author at the fix.
## Worked example
Capture a snapshot, look up a map, walk into its first entry, and
read a nested field:
```rust,ignore
use ktstr::prelude::*;
fn snapshot_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
// Wire a bridge for the duration of the scenario.
let cb: CaptureCallback = std::sync::Arc::new(|_name| {
// Production: freeze + build a real FailureDumpReport. The
// host installs this callback in real runs.
Some(FailureDumpReport::default())
});
let bridge = SnapshotBridge::new(cb);
let handle = bridge.clone();
let _guard = bridge.set_thread_local();
// Run the scenario, capturing once after spawn.
let steps = vec![Step {
setup: vec![CgroupDef::named("workers").workers(2)].into(),
ops: vec![Op::snapshot("after_spawn")],
hold: HoldSpec::FULL,
}];
let mut result = execute_steps(ctx, steps)?;
// Drain the bridge and inspect the captured report.
let captured = handle.drain();
let report = captured
.get("after_spawn")
.ok_or_else(|| anyhow::anyhow!("snapshot 'after_spawn' missing"))?;
let snap = Snapshot::new(report);
// Top-level scalar.
if let Ok(nr_cpus) = snap.var("nr_cpus_onln").as_u64() {
result.details.push(AssertDetail::new(
DetailKind::Other,
format!("captured nr_cpus_onln = {nr_cpus}"),
));
}
Ok(result)
}
```
For the executor + bridge wiring outside a VM, see the host-side
smoke tests in `tests/snapshot_e2e.rs` — they exercise the same
pipeline against a hand-crafted `FailureDumpReport` so the assertion
shape is covered without booting a guest.
## Composing reads with writes
Snapshots are the **read** half of the host↔guest interaction. The
**write** half — pre-seeding a BPF map value before the scenario
starts — is the `#[ktstr_test]` attribute `bpf_map_write = CONST`,
which targets a `BpfMapWrite` constant:
```rust,ignore
use ktstr::prelude::*;
const TRIGGER_FAULT: BpfMapWrite = BpfMapWrite {
map_name_suffix: ".bss", // matched against discovered maps
offset: 42, // byte offset within the map's value
value: 1, // u32 written by the host
};
#[ktstr_test(bpf_map_write = TRIGGER_FAULT, expect_err = true)]
fn fault_then_inspect(ctx: &Ctx) -> Result<AssertResult> {
// The host has already written `1` at `.bss + 42` before
// the scenario started. Capture and inspect the resulting
// scheduler state mid-run.
/* bridge wiring + Op::snapshot + Snapshot::new as above */
Ok(AssertResult::pass())
}
```
The write is event-driven: the host polls for BPF map
discoverability (scheduler loaded), polls the SHM ring for
scenario start, then writes the configured u32 at the configured
offset. Only `BPF_MAP_TYPE_ARRAY` maps are supported; the framework
finds the map by `map_name_suffix` (e.g. `".bss"`) via
`BpfMapAccessor::find_map`. See [Monitor → BPF map writes](../architecture/monitor.md)
for the prerequisites (vmlinux) and the full host-side
contract.
Read+write workflows then compose naturally: the test pre-seeds
guest state with `bpf_map_write`, lets the scheduler run, and
asserts on the resulting state with `Op::snapshot` + the
[`Snapshot`](#reading-the-captured-report) accessor:
1. **Write (pre-scenario)** — `bpf_map_write` flips a `.bss` flag
the scheduler reads.
2. **Run** — the scenario's ops drive workload behavior; the
scheduler reacts to the flag.
3. **Read (mid-scenario)** — `Op::snapshot("after")` captures the
scheduler state at the chosen point.
4. **Assert** — `Snapshot::var(...).as_u64()` /
`Snapshot::map(...).find(...).get(...).as_*()` verifies the
reaction. Errors carry the available alternatives so a typo or
stale field name surfaces before the test author hand-edits the
case.
The write side is a single one-shot poke at scheduler-load time;
there is no `Op` variant for runtime writes. Ergonomic mid-scenario
state mutation is reserved for cases where the scheduler itself
exports a writable interface (sysfs, debugfs, BPF map command
interface) and the test invokes that interface from a workload
process.