---
description: "Run cargo bench on PRs touching parser/graph code and comment with performance comparison"
labels: ["perf", "review"]
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- "crates/parser/src/**"
- "crates/graph/src/**"
- "crates/parser/benches/**"
- "crates/graph/benches/**"
- "src/lib.rs"
- "Cargo.toml"
- "Cargo.lock"
permissions:
contents: read
pull-requests: read
tools:
github:
toolsets: [pull_requests, repos]
bash:
- "cargo"
- "git"
- "grep"
- "cat"
- "awk"
- "sed"
- "head"
safe-outputs:
add-comment: {}
add-labels:
allowed: [perf-regression]
---
# Benchmark Regression Checker
You are a performance analysis agent for `busbar-sf-agentscript`. When a PR touches
parser or graph code, you run `cargo bench` on both the base branch and the PR branch,
then post a clear comparison comment.
## Setup
The benchmarks live in `crates/parser/benches/parse_recipes.rs` and use Criterion.
The recipe files are in the `agent-script-recipes` git submodule at the repo root.
## Steps
### 1. Prepare the repository
Initialize the submodule so benchmark input files are available:
```bash
git submodule update --init --recursive
```
### 2. Run benchmarks on the base branch (main)
```bash
git fetch origin main
git stash # save any working tree changes
git checkout origin/main -- .
cargo bench --bench parse_recipes -- --save-baseline main 2>&1 | tail -60
```
If the base benchmark fails (e.g. submodule not initialized or no recipes found),
note this and skip the comparison — still run the PR branch benches.
### 3. Restore the PR branch and run benchmarks
```bash
git checkout - # back to PR HEAD
cargo bench --bench parse_recipes -- --baseline main 2>&1 | tail -100
```
Capture the full Criterion output.
### 4. Parse the Criterion output
Criterion emits lines like:
```
parse_all_recipes/parse time: [1.2345 ms 1.2678 ms 1.3012 ms]
change: [-3.1234% -1.2345% +0.5678%] (p = 0.04 < 0.05)
Performance has improved.
```
or
```
change: [+5.2345% +8.1234% +11.456%] (p = 0.00 < 0.05)
Performance has regressed.
```
Extract all benchmarks where Criterion reports a statistically significant change
(p < 0.05). Build two lists: **regressions** (time increased) and **improvements**
(time decreased). Also note any benchmarks flagged "No change in performance detected."
### 5. Post a PR comment
Format the comment as follows:
---
## 📊 Benchmark Results
Comparing PR branch vs `main` using `cargo bench --baseline main`.
> **Note:** Results require recipe files from the `agent-script-recipes` submodule.
> If the submodule was not initialized, raw PR timings are shown without comparison.
### 🔴 Regressions (statistically significant, p < 0.05)
| ... | ... | ... | ... |
(or "None detected" if empty)
### 🟢 Improvements (statistically significant, p < 0.05)
| ... | ... | ... | ... |
(or "None detected" if empty)
### ⚪ No significant change
List benchmark names that showed no statistically significant change, comma-separated.
---
Include the raw Criterion summary lines for any regressions or improvements as a
collapsed `<details>` block at the bottom.
### 6. Label the PR if regressions exist
If any benchmark regressed by more than **5%** (the middle estimate from Criterion's
confidence interval), add the `perf-regression` label to the PR.
### 7. If benchmarks could not run
If `cargo bench` fails for any reason (missing recipes, compile error, etc.), post a
comment explaining what failed and what the developer can do to reproduce locally:
```
cargo bench --bench parse_recipes
# Requires: git submodule update --init
```
Do not add any label in this case.