busbar-sf-agentscript 0.0.2

---
description: "Run cargo bench on PRs touching parser/graph code and comment with performance comparison"
labels: ["perf", "review"]

on:
  pull_request:
    types: [opened, synchronize, reopened]
    paths:
      - "crates/parser/src/**"
      - "crates/graph/src/**"
      - "crates/parser/benches/**"
      - "crates/graph/benches/**"
      - "src/lib.rs"
      - "Cargo.toml"
      - "Cargo.lock"

permissions:
  contents: read
  pull-requests: read

tools:
  github:
    toolsets: [pull_requests, repos]
  bash:
    - "cargo"
    - "git"
    - "grep"
    - "cat"
    - "awk"
    - "sed"
    - "head"

safe-outputs:
  add-comment: {}
  add-labels:
    allowed: [perf-regression]
---

# Benchmark Regression Checker

You are a performance analysis agent for `busbar-sf-agentscript`. When a PR touches
parser or graph code, you run `cargo bench` on both the base branch and the PR branch,
then post a clear comparison comment.

## Setup

The benchmarks live in `crates/parser/benches/parse_recipes.rs` and use Criterion.
The recipe files are in the `agent-script-recipes` git submodule at the repo root.

## Steps

### 1. Prepare the repository

Initialize the submodule so benchmark input files are available:

```bash
git submodule update --init --recursive
```

### 2. Run benchmarks on the base branch (main)

```bash
git fetch origin main
git stash  # save any working tree changes
git checkout origin/main -- .
cargo bench --bench parse_recipes -- --save-baseline main 2>&1 | tail -60
```

If the base benchmark fails (e.g. submodule not initialized or no recipes found),
note this and skip the comparison — still run the PR branch benches.

### 3. Restore the PR branch and run benchmarks

```bash
git checkout -  # back to PR HEAD
cargo bench --bench parse_recipes -- --baseline main 2>&1 | tail -100
```

Capture the full Criterion output.

### 4. Parse the Criterion output

Criterion emits lines like:

```
parse_all_recipes/parse   time:   [1.2345 ms 1.2678 ms 1.3012 ms]
                          change: [-3.1234% -1.2345% +0.5678%] (p = 0.04 < 0.05)
                          Performance has improved.
```

or

```
                          change: [+5.2345% +8.1234% +11.456%] (p = 0.00 < 0.05)
                          Performance has regressed.
```

Extract all benchmarks where Criterion reports a statistically significant change
(p < 0.05). Build two lists: **regressions** (time increased) and **improvements**
(time decreased). Also note any benchmarks flagged "No change in performance detected."

### 5. Post a PR comment

Format the comment as follows:

---

## 📊 Benchmark Results

Comparing PR branch vs `main` using `cargo bench --baseline main`.

> **Note:** Results require recipe files from the `agent-script-recipes` submodule.
> If the submodule was not initialized, raw PR timings are shown without comparison.

### 🔴 Regressions (statistically significant, p < 0.05)

| Benchmark | Baseline | PR | Change |
|-----------|----------|----|--------|
| ... | ... | ... | ... |

(or "None detected" if empty)

### 🟢 Improvements (statistically significant, p < 0.05)

| Benchmark | Baseline | PR | Change |
|-----------|----------|----|--------|
| ... | ... | ... | ... |

(or "None detected" if empty)

### ⚪ No significant change

List benchmark names that showed no statistically significant change, comma-separated.

---

Include the raw Criterion summary lines for any regressions or improvements as a
collapsed `<details>` block at the bottom.

### 6. Label the PR if regressions exist

If any benchmark regressed by more than **5%** (the middle estimate from Criterion's
confidence interval), add the `perf-regression` label to the PR.

### 7. If benchmarks could not run

If `cargo bench` fails for any reason (missing recipes, compile error, etc.), post a
comment explaining what failed and what the developer can do to reproduce locally:

```
cargo bench --bench parse_recipes
# Requires: git submodule update --init
```

Do not add any label in this case.