# Phase 4 Round 15 — Retrieval Baseline
> Captured `2026-05-08` on the post-Round-15 codebase (HEAD includes
> `score_breakdown` plumbing + term-window excerpt + 5000-note bench
> tier). Hardware: macOS arm64. Configuration: `cargo bench --bench
> retrieval`, criterion default with `sample_size = 10`.
## Method
The benchmark generates a synthetic vault of N markdown notes split
across `10-Projects/`, `20-Areas/`, `40-Permanent/`. About 1/8 of the
notes carry the routing keywords (`repo_path`, `routing`,
`[[Project Matcher]]`); the rest are filler. The same `RouteInput`
("Refine Repo-Path routing benchmark", `files=[…matcher.rs,
…scanner.rs]`, target `Codex`) drives every iteration.
Three scoped benchmarks measure the layered cost:
- `scan_notes_with_debug` — pure vault walk + parse, no scoring
- `build_context` — scan output → routed + scored bundle
- `build_bundle` — entrypoint that owns scan + routing + render
## Numbers
| scan_notes_with_debug | 250 | **4.97 ms** (4.01 — 5.69) |
| scan_notes_with_debug | 1000 | **27.4 ms** (23.2 — 37.6) |
| scan_notes_with_debug | 5000 | **95.3 ms** (88.4 — 98.9) |
| build_context | 250 | **8.57 ms** (8.22 — 8.97) |
| build_context | 1000 | **34.5 ms** (32.4 — 37.6) |
| build_context | 5000 | **168.8 ms** (167.3 — 172.1) |
| build_bundle | 250 | **8.25 ms** (8.08 — 8.69) |
| build_bundle | 1000 | **32.6 ms** (32.3 — 32.8) |
| build_bundle | 5000 | **178.6 ms** (167.6 — 197.3) |
## Reading
- **Scaling is roughly linear in N**. Going from 250 → 5000 notes
is a 20× input grow; runtime grows ~19× for `scan_notes_with_debug`
and ~22× for `build_bundle`. No quadratic blow-up has crept in.
- **Scoring overhead is small**. `build_context` minus
`scan_notes_with_debug` at N=5000 is ~73 ms — about 43% of the
total bundle cost. Most of that is the per-note scoring loop (now
also producing a `Vec<ScoreContribution>` per scored note); the
rest is excerpt construction and route metadata.
- **5000-note vault under 200 ms** is well below the latency budget
for an interactive `spool get` call. There is no immediate need
for a persistent index. Reconsider only if real vaults exceed
~25 000 notes or if scoring rules grow to push `build_bundle`
past ~500 ms.
- **Variance at 250 notes** is wider than at 5000 (relative
spread). That is expected — the warmup and timer noise matter
more at sub-10 ms scales. Treat the 5000 number as the most
trustworthy capacity signal.
## What Round 16 should monitor
If a future change risks regressing these numbers, re-run the bench
and update this file. Specific watch-outs:
- adding any per-section work that loops across all sections of
every note (e.g. semantic similarity)
- introducing IO inside the scoring path (e.g. ledger lookups per
candidate)
- materializing extra owned `String`s inside the hot loop in
`Accumulator::add` — currently each contribution allocates one
reason string and two field/term strings; that is the linear
per-contribution cost we accept for the breakdown surface
## Re-run
```bash
cargo bench --bench retrieval
```
Output is written to `target/criterion/` (HTML reports under
`target/criterion/<group>/<size>/report/index.html`). Numbers above
were captured from the criterion summary at the end of stdout.