# Criterion baselines
This directory documents how to record and compare Criterion baselines for
the `rustsim` benchmarks. Actual baseline data (JSON under Criterion's
`target/criterion/*/base/`) is **not** checked in — it is machine- and
toolchain-specific. What is checked in here is the procedure and the
interpretation of results.
## Benches
| `end_to_end_bench.rs` | Umbrella end-to-end scenarios (collects the hot loop). |
Sibling crate benches live under `crates/<crate>/benches/` and are recorded
independently.
## Recording a new baseline
Run the full suite on a quiet machine:
```sh
cargo bench --workspace -- --save-baseline main
```
This stores the data under `target/criterion/<group>/main/`.
## Comparing against a baseline
After making a change:
```sh
cargo bench --workspace -- --baseline main
```
Criterion prints per-measurement percentage deltas. A regression >5% on a
primary bench (or >10% on a secondary bench) should be investigated before
merging.
## Baselines in CI
Two complementary CI jobs cover performance regression visibility:
- **`bench-baseline`** — runs `cargo bench --workspace -- --save-baseline nightly`
on the nightly cron (05:00 UTC) and on `workflow_dispatch`. It uploads
the resulting `target/criterion/` tree as the `criterion-nightly`
artifact. This is the canonical baseline.
- **`bench-compare`** — gated on the `perf` label on pull requests (and
also available via `workflow_dispatch`). It downloads the latest
`criterion-nightly` artifact from `main`, re-runs the benchmarks with
`--baseline nightly`, and pipes Criterion's output through
`tools/bench_compare.py --tolerance 10` to produce a human-readable
regression summary. Both the raw log and the summary are uploaded as
the `criterion-pr-compare` artifact.
Both jobs are `continue-on-error: true`; the PR-time comparison **warns**
rather than **blocks** until `1.0.0-rc`. The tolerance band is currently
±10%: anything outside that band is flagged as a regression (positive
change) or an improvement (negative change) in the summary.
To enable PR-time checking, add the `perf` label to the PR. CPU-bound
benches benefit from a quiet runner, so ad-hoc comparisons for subtle
changes are better run locally against a locally-saved `main` baseline
(see below).
## Rotating baselines
When intentional improvements land, re-save `main`:
```sh
cargo bench --workspace -- --save-baseline main
```
and note the bump in `CHANGELOG.md` under `Performance`.