1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
//! Ready-made [`Payload`](ktstr::Payload) fixtures for the
//! benchmark binaries that dominate scheduler-regression testing:
//! `fio` (disk IO throughput, emits JSON), `stress-ng`
//! (synthetic CPU/memory stressors, exit-code only), and
//! `schbench` (latency percentiles, routed through LlmExtract).
//!
//! Each fixture is declared via
//! [`#[derive(Payload)]`](ktstr::Payload), the same path downstream
//! test authors use — so this module doubles as an end-to-end
//! exercise of the derive macro. The emitted `const` follows the
//! derive's naming convention: `struct FooPayload` produces
//! `const FOO: Payload`.
//!
//! These fixtures live under `tests/common/` rather than inside the
//! library's `src/` tree because they are TEST SCAFFOLDING, not
//! shipped API. A downstream scheduler author who wants the same
//! `fio` / `stress-ng` / `schbench` shapes should either copy the
//! declarations below into their own crate or write their own via
//! `#[derive(Payload)]`. The library does not ship fio, stress-ng,
//! or schbench binaries — the `kind = PayloadKind::Binary(name)`
//! just declares the name; host-side include-files resolution picks
//! the path up at test time.
//!
//! The fixtures cover all three
//! [`OutputFormat`](ktstr::test_support::OutputFormat) variants
//! — plus the hinted subvariant of `LlmExtract`:
//!
//! - [`FIO`] and [`FIO_JSON`] declare `OutputFormat::Json` with a
//! set of [`MetricHint`](ktstr::test_support::MetricHint)s
//! describing the canonical read/write throughput + latency paths.
//! Extracted metrics land with correct polarity/unit automatically.
//! - [`STRESS_NG`] uses `OutputFormat::ExitCode` with a single
//! `exit_code_eq(0)` default — stress-ng reports via exit code
//! (bogo_ops land in stderr and are not machine-extractable
//! without `--metrics-brief --yaml`).
//! - [`SCHBENCH`] uses `OutputFormat::LlmExtract(None)` — schbench
//! emits human-readable percentile tables, so extraction is
//! routed through the local LLM pipeline rather than the JSON
//! walker.
//! - [`SCHBENCH_HINTED`] declares
//! `OutputFormat::LlmExtract(Some("wakeup latency percentiles"))`
//! — identical to [`SCHBENCH`] in every other field, exercising
//! the derive's `LlmExtract("hint")` call form and the
//! hint-threading path through
//! [`extract_via_llm`](ktstr::test_support::model::extract_via_llm).
//!
//! All fixtures use short, stable `name` fields matching their
//! binary names — except FIO_JSON (`"fio_json"`) and
//! SCHBENCH_HINTED (`"schbench_hinted"`), which use distinct
//! names so they can coexist with FIO and SCHBENCH respectively
//! under the pairwise-dedup rule on `#[ktstr_test(workloads =
//! [...])]`. The binary names themselves (`"fio"`, `"stress-ng"`,
//! `"schbench"`) are what ktstr's include-files infrastructure
//! resolves inside the guest.
//!
//! # Polarity::Unknown downstream
//!
//! Metrics extracted from a hinted payload are matched against the
//! payload's `metrics` table by name in
//! [`PayloadRun`](ktstr::scenario::payload_run::PayloadRun)'s post-exit pipeline;
//! names with no matching hint land with
//! [`Polarity::Unknown`](ktstr::test_support::Polarity::Unknown) and
//! an empty unit. Unknown propagates as follows:
//!
//! - **`MetricCheck` assertion pass** — [`MetricCheck`](ktstr::test_support::MetricCheck)
//! variants (`Min`, `Max`, `Range`, `Exists`, `ExitCodeEq`) compare
//! values to thresholds without consulting polarity. An Unknown
//! metric fails checks the same way a hinted metric does; polarity
//! plays no role at assert time.
//! - **`AssertResult::merge` per-key worst-case** — when multiple
//! cgroups contribute the same ext_metric, the merge consults the
//! crate-internal `MetricDef` from the `METRICS` registry. Names
//! absent from the registry (the case for any Unknown metric not
//! also registered at crate scope) default to `higher_is_worse=true`
//! and merge by taking the max — conservative for regressions, but
//! NOT a declared polarity for the metric.
//! - **`cargo ktstr test-stats` cross-run comparison** — the
//! crate-internal `compare_runs` iterates the `METRICS` registry
//! only, so Unknown metrics extracted purely via `MetricHint`
//! absence are NOT classified as regression or improvement. They
//! are recorded to the sidecar for later manual inspection; to
//! surface them in a comparison verdict, register a `MetricDef` in
//! `src/stats.rs` or add a `MetricHint` on the payload with an
//! explicit polarity.
use Payload;
/// `fio` — flexible IO tester. Canonical workload for disk/IO
/// scheduler regressions.
///
/// Output format: JSON. Supply `--output-format=json` at the call
/// site (via `.arg(...)` on the
/// [`PayloadRun`](ktstr::scenario::payload_run::PayloadRun) builder returned by
/// `ctx.payload(&FIO)`, or via a scheduler default_args entry) or
/// use [`FIO_JSON`] which bakes it into `default_args` for the
/// common "just give me metrics" path.
///
/// **Caveat:** `FIO` leaves `default_args` empty, so invoking it
/// without `--output-format=json` causes `fio` to emit its
/// human-readable output, `extract_metrics` finds no JSON region,
/// and the check pass records every referenced metric as missing
/// without otherwise failing. Prefer [`FIO_JSON`] unless the test
/// author intentionally overrides the output mode.
///
/// Metric hints cover the first-job read/write leaf names. Fio's
/// JSON output is deeply nested (`jobs[N].read.iops`,
/// `.write.iops`, `.read.lat_ns.mean`, etc.); the hints pin the
/// four most-commonly-asserted paths. Unhinted paths land as
/// [`Polarity::Unknown`](ktstr::test_support::Polarity::Unknown)
/// and are still extracted for sidecar regression tracking.
;
/// `fio` with `--output-format=json` pre-baked into `default_args`.
///
/// Compared to [`FIO`], this fixture differs in exactly two
/// fields:
///
/// 1. **`name`** — `"fio_json"` instead of `"fio"`. Uses a
/// distinct name so sidecar files and log output can
/// disambiguate the two fixtures. The `binary` field (the name
/// resolved by the include-files infrastructure) is still
/// `"fio"` in both.
/// 2. **`default_args`** — `&["--output-format=json"]` instead of
/// `&[]`. Everything else — `kind`, `output`, `default_checks`,
/// `metrics` — is character-for-character identical to [`FIO`].
///
/// **Caveat: simultaneous FIO + FIO_JSON.** Both fixtures have
/// `kind = PayloadKind::Binary("fio")`, so a scenario that lists
/// `#[ktstr_test(workloads = [FIO, FIO_JSON])]` spawns the `fio`
/// binary TWICE — each with its own argv set, inside whatever
/// cgroup the framework places each fixture in. The pairwise-dedup
/// on the `workloads` attribute only rejects identical Payload
/// paths; two distinct Payload constants that happen to share a
/// binary are NOT deduped. Test authors who want the same fio
/// binary once should pick ONE of the two fixtures, and extend it
/// via `ctx.payload(&FIO).arg("--output-format=json")` if the
/// `FIO_JSON` preset's args don't match their scenario.
;
/// `stress-ng` — synthetic load generator (CPU, memory, IO, VM,
/// etc.). Canonical workload for exercising scheduler decisions
/// under configurable contention.
///
/// Output format: `ExitCode`. stress-ng emits human-readable
/// progress lines to stderr; its machine-readable-metrics flags
/// write structured output to a caller-named file, not to stdout
/// or stderr. Since the extraction pipeline only consumes stdout,
/// no default stress-ng invocation feeds `extract_metrics`; the
/// fixture stays in exit-code mode and the happy path is a zero
/// exit.
///
/// **Caveat:** `default_args` is empty, so invoking `STRESS_NG`
/// without at least one stressor flag (e.g. `--cpu 1`, `--vm 1`)
/// causes stress-ng to print usage and exit nonzero on some
/// versions. Always append a stressor via `.arg(...)` on the
/// [`PayloadRun`](ktstr::scenario::payload_run::PayloadRun) builder returned
/// by `ctx.payload(&STRESS_NG)`.
///
/// Tests that want bogo_ops/sec metrics should declare their own
/// custom `Payload` via [`#[derive(Payload)]`](ktstr::Payload) and
/// pair it with a post-hoc stderr-to-stdout bridge, or declare
/// `output = LlmExtract("bogo ops")`. stress-ng emits bogo_ops on
/// stderr by default and its machine-readable-metrics flags write
/// to a caller-named file, not stdout, so a wrapper that captures
/// or redirects structured output onto stdout is still required
/// to give the LLM a non-empty body.
;
/// Latency-focused scheduler benchmark. Uses `LlmExtract` to
/// exercise the LLM extraction pipeline (schbench supports `--json`
/// but this fixture intentionally uses the third acquisition path).
///
/// **Contract — no polarity-classified metrics by design.** A bare
/// `ctx.payload(&SCHBENCH)` run is expected to reach the
/// `default_check(exit_code_eq(0))` gate cleanly and produce **no
/// polarity-tagged metrics**. Not a bug, not a pipeline failure:
/// the fixture declares `metrics = []`, so no hint ever binds an
/// extracted metric name to a polarity / unit pair, and whatever
/// the LLM pulls out of schbench's output stays at
/// [`Polarity::Unknown`](ktstr::test_support::Polarity::Unknown).
/// The happy-path assertion is the exit-code gate; anything else
/// is the caller's responsibility. Two orthogonal reasons drive
/// this, each expanded below:
///
/// 1. schbench writes latency tables and summary lines to **stderr**
/// by default — stdout is empty unless `--json -` is passed.
/// [`OutputFormat::LlmExtract`](ktstr::test_support::OutputFormat::LlmExtract)
/// runs stdout-primary with a stderr fallback: the LLM extracts
/// first against (empty) stdout, observes zero metrics, then
/// retries against stderr's latency text. See the stderr-fallback
/// section below for the details.
/// 2. LLM-extracted metric names are non-deterministic, so any hint
/// declared on this fixture would silently fail to match. The
/// fixture ships with `metrics = []` so the absence of polarity
/// classification is visible by construction rather than implied.
///
/// Tests that want actual schbench metrics should use
/// [`SCHBENCH_JSON`] (pre-baked `--json -`, stable dotted-path
/// schema, no LLM in the loop) or append `--json -` to this
/// fixture's builder.
///
/// **No metric hints.** schbench emits canonical latency stats
/// (`Wakeup Latencies`, `Request Latencies`, `RPS`) with standard
/// percentiles that would otherwise be obvious
/// [`Polarity::LowerBetter`](ktstr::test_support::Polarity::LowerBetter)
/// candidates — yet `metrics` is deliberately empty. The reason is
/// upstream: hints in [`Payload`](ktstr::Payload)'s `metrics`
/// field bind a fixed dotted-path name to a polarity/unit pair,
/// and the post-extraction resolver inside
/// [`PayloadRun`](ktstr::scenario::payload_run::PayloadRun) looks
/// each extracted metric up by exact name. The
/// [`OutputFormat::LlmExtract`](ktstr::test_support::OutputFormat::LlmExtract)
/// path produces metric names from whatever JSON the local model
/// emits — not from a stable schema. The model's dotted paths vary
/// with weights, prompt, and the hinted-focus string (even under
/// ArgMax, a different base model produces different keys). A hint
/// declared against e.g. `"wakeup_latency_pct99"` would miss when
/// the model emits `"wakeup.latency.p99"` or
/// `"wakeup_latency.99th_percentile"`. Rather than ship hints that
/// silently fail to apply and leave every metric at
/// [`Polarity::Unknown`](ktstr::test_support::Polarity::Unknown)
/// anyway, the fixture leaves `metrics` empty so the absence of
/// polarity classification is visible by construction. Tests that
/// need strict regression direction for schbench should pipe
/// `--json -` instead and declare an `OutputFormat::Json` fixture
/// with concrete hint paths that match the fixed schbench schema
/// (e.g. `"int.wakeup_latency_pct99.0"` from `write_json_stats`
/// in schbench.c).
///
/// **Stderr-fallback extraction.** schbench writes its percentile
/// tables (`show_latencies` → `fprintf(stderr, ...)`) and summary
/// lines (`avg worker transfer`, `average rps`, `sched delay`) to
/// **stderr** by default; stdout only carries output when
/// `--json -` is passed. Under the stdout-primary / stderr-fallback
/// contract on
/// [`OutputFormat`](ktstr::test_support::OutputFormat) (documented
/// on [`PayloadRun::run`](ktstr::scenario::payload_run::PayloadRun::run)),
/// [`extract_via_llm`](ktstr::test_support::model::extract_via_llm)
/// is invoked against stdout first — for this fixture that yields
/// zero metrics because stdout is empty — and then re-invoked
/// against stderr since the stdout pass produced nothing and
/// stderr is non-empty. The LLM therefore receives schbench's
/// latency text on the fallback leg. The extracted metric names
/// are model-dependent (see "No metric hints." above), so with
/// `metrics = []` none of them pick up a polarity / unit, and the
/// happy-path assertion remains the exit-code gate in
/// `default_checks`. For a stable regression-direction schema,
/// append `--json -` via `.arg("--json").arg("-")` on the runtime
/// builder — the JSON block lands on stdout, the stdout-primary
/// pass consumes it, and the stderr fallback never fires.
///
/// **`--runtime 5` tradeoff.** The 5-second `default_args` runtime
/// is sized for fast CI smoke signal, NOT for tail-latency
/// regression hunting — schbench's sample count scales with
/// runtime (`nr_samples` in `show_latencies`), so 5 s collects
/// roughly a sixth of what `--runtime 30` does and leaves p99.9+
/// estimates dominated by variance. Scheduler authors
/// investigating tail regressions should override via
/// `.arg("--runtime").arg("30")` on the
/// [`PayloadRun`](ktstr::scenario::payload_run::PayloadRun)
/// builder returned by `ctx.payload(&SCHBENCH)`. schbench parses
/// argv with `getopt_long` and each `case 'r'` overwrites
/// `runtime = atoi(optarg)`, so the trailing setting wins on
/// duplicates and the appended override takes effect.
;
/// Hint-carrying sibling of [`SCHBENCH`] — identical in every
/// field except `name` (uses a distinct name so sidecar files and
/// log output can disambiguate the two fixtures) and `output`.
///
/// Declares `output = LlmExtract("wakeup latency percentiles")`.
/// The derive macro translates the call form into
/// [`OutputFormat::LlmExtract(Some(...))`](ktstr::test_support::OutputFormat::LlmExtract),
/// and the stored `&'static str` is inserted between the template
/// and the stdout block as a `Focus:` directive by
/// [`extract_via_llm`](ktstr::test_support::model::extract_via_llm)
/// when the fixture runs — steering the model toward the stat the
/// scheduler regression cares about instead of whatever numeric
/// leaf the model picks first.
///
/// Exists as a fixture (rather than only as an ad-hoc
/// `#[derive(Payload)]` inside the test file) so downstream
/// scheduler-author crates have a copy-ready template for the
/// hint-carrying shape — the bare [`SCHBENCH`] covers the
/// no-hint form, this fixture covers the with-hint form.
///
/// **Empty-metrics contract inherited from [`SCHBENCH`].** This
/// fixture declares `metrics = []` just like its sibling, so the
/// "no polarity-classified metrics by design" contract described
/// on [`SCHBENCH`] applies here verbatim — the hint steers which
/// numeric leaf the LLM surfaces, but every extracted metric
/// still lands at [`Polarity::Unknown`](ktstr::test_support::Polarity::Unknown).
/// See the SCHBENCH doc's "Contract — no polarity-classified
/// metrics by design" section for the full rationale and the
/// stdout-vs-stderr extraction fallback.
///
/// **`--runtime 5` tradeoff inherited from [`SCHBENCH`].** See the
/// SCHBENCH doc for the fast-CI vs tail-latency tradeoff and the
/// `.arg("--runtime").arg("30")` override path.
;
/// `schbench` with `--json -` pre-baked into `default_args`.
///
/// Schbench writes a JSON summary block to stdout when invoked with
/// `--json -` (the third argument hyphen selects stdout over a file
/// path). That block is parseable by the
/// [`OutputFormat::Json`](ktstr::test_support::OutputFormat::Json)
/// extraction pipeline — stable dotted-path metric names pinned at
/// schbench's source-level JSON schema (`write_json_stats` in
/// `schbench.c`), no LLM in the loop.
///
/// Compared to [`SCHBENCH`] and [`SCHBENCH_HINTED`], this fixture
/// differs in:
///
/// 1. **`name`** — `"schbench_json"` instead of `"schbench"`. Uses
/// a distinct name so sidecar files and log output can
/// disambiguate the three fixtures. The `binary` field stays
/// `"schbench"` in all three.
/// 2. **`output`** — `OutputFormat::Json` rather than `LlmExtract`,
/// so extraction goes through `find_and_parse_json` +
/// `walk_json_leaves` and bypasses the 2.44 GiB model load
/// altogether. Tests that only want schbench metrics (no
/// scheduler-regression narrative) save minutes of CPU time by
/// picking this over [`SCHBENCH`].
/// 3. **`default_args`** — includes `--json -` so a bare
/// `ctx.payload(&SCHBENCH_JSON)` call immediately produces the
/// JSON body.
/// 4. **`metrics`** — concrete hints on the schbench JSON schema.
/// Polarity annotations follow schbench convention: latency
/// percentiles are `LowerBetter`, request-per-second is
/// `HigherBetter`.
///
/// **Caveat: simultaneous SCHBENCH + SCHBENCH_JSON.** All three
/// fixtures share `kind = PayloadKind::Binary("schbench")`. A
/// scenario that lists multiple of them as workloads spawns
/// schbench once per fixture — each with its own argv set. The
/// pairwise-dedup on the `workloads` attribute only rejects
/// identical Payload paths; distinct constants that share a binary
/// are NOT deduped. Pick one fixture per scenario.
///
/// Hint paths match the JSON keys emitted by schbench's
/// `write_json_stats` in `schbench.c`. Unhinted paths still land in
/// the extracted metric set with
/// [`Polarity::Unknown`](ktstr::test_support::Polarity::Unknown),
/// so the JSON blob is surfaced in sidecar output for regression
/// tracking even when a specific percentile is not pinned here.
///
/// **`--runtime 5` tradeoff inherited from [`SCHBENCH`].** See the
/// SCHBENCH doc for the fast-CI vs tail-latency tradeoff and the
/// `.arg("--runtime").arg("30")` override path.
;