rustsim 0.0.1

High-performance agent-based modelling engine - top-level orchestration crate
Documentation
# Criterion baselines


This directory documents how to record and compare Criterion baselines for
the `rustsim` benchmarks. Actual baseline data (JSON under Criterion's
`target/criterion/*/base/`) is **not** checked in — it is machine- and
toolchain-specific. What is checked in here is the procedure and the
interpretation of results.

## Benches


| Bench file | Purpose |
|---|---|
| `end_to_end_bench.rs` | Umbrella end-to-end scenarios (collects the hot loop). |

Sibling crate benches live under `crates/<crate>/benches/` and are recorded
independently.

## Recording a new baseline


Run the full suite on a quiet machine:

```sh
cargo bench --workspace -- --save-baseline main
```

This stores the data under `target/criterion/<group>/main/`.

## Comparing against a baseline


After making a change:

```sh
cargo bench --workspace -- --baseline main
```

Criterion prints per-measurement percentage deltas. A regression >5% on a
primary bench (or >10% on a secondary bench) should be investigated before
merging.

## Baselines in CI


Two complementary CI jobs cover performance regression visibility:

- **`bench-baseline`** — runs `cargo bench --workspace -- --save-baseline nightly`
  on the nightly cron (05:00 UTC) and on `workflow_dispatch`. It uploads
  the resulting `target/criterion/` tree as the `criterion-nightly`
  artifact. This is the canonical baseline.
- **`bench-compare`** — gated on the `perf` label on pull requests (and
  also available via `workflow_dispatch`). It downloads the latest
  `criterion-nightly` artifact from `main`, re-runs the benchmarks with
  `--baseline nightly`, and pipes Criterion's output through
  `tools/bench_compare.py --tolerance 10` to produce a human-readable
  regression summary. Both the raw log and the summary are uploaded as
  the `criterion-pr-compare` artifact.

Both jobs are `continue-on-error: true`; the PR-time comparison **warns**
rather than **blocks** until `1.0.0-rc`. The tolerance band is currently
±10%: anything outside that band is flagged as a regression (positive
change) or an improvement (negative change) in the summary.

To enable PR-time checking, add the `perf` label to the PR. CPU-bound
benches benefit from a quiet runner, so ad-hoc comparisons for subtle
changes are better run locally against a locally-saved `main` baseline
(see below).

## Rotating baselines


When intentional improvements land, re-save `main`:

```sh
cargo bench --workspace -- --save-baseline main
```

and note the bump in `CHANGELOG.md` under `Performance`.