spool-memory 0.2.3

# Phase 4 Round 15 — Retrieval Baseline

> Captured `2026-05-08` on the post-Round-15 codebase (HEAD includes
> `score_breakdown` plumbing + term-window excerpt + 5000-note bench
> tier). Hardware: macOS arm64. Configuration: `cargo bench --bench
> retrieval`, criterion default with `sample_size = 10`.

## Method

The benchmark generates a synthetic vault of N markdown notes split
across `10-Projects/`, `20-Areas/`, `40-Permanent/`. About 1/8 of the
notes carry the routing keywords (`repo_path`, `routing`,
`[[Project Matcher]]`); the rest are filler. The same `RouteInput`
("Refine Repo-Path routing benchmark", `files=[…matcher.rs,
…scanner.rs]`, target `Codex`) drives every iteration.

Three scoped benchmarks measure the layered cost:

- `scan_notes_with_debug` — pure vault walk + parse, no scoring
- `build_context` — scan output → routed + scored bundle
- `build_bundle` — entrypoint that owns scan + routing + render

## Numbers

| group               |   N | criterion estimate (mean ± edges) |
|---------------------|----:|------------------------------------|
| scan_notes_with_debug |  250 | **4.97 ms**  (4.01 — 5.69) |
| scan_notes_with_debug | 1000 | **27.4 ms**  (23.2 — 37.6) |
| scan_notes_with_debug | 5000 | **95.3 ms**  (88.4 — 98.9) |
| build_context         |  250 | **8.57 ms**  (8.22 — 8.97) |
| build_context         | 1000 | **34.5 ms**  (32.4 — 37.6) |
| build_context         | 5000 | **168.8 ms** (167.3 — 172.1) |
| build_bundle          |  250 | **8.25 ms**  (8.08 — 8.69) |
| build_bundle          | 1000 | **32.6 ms**  (32.3 — 32.8) |
| build_bundle          | 5000 | **178.6 ms** (167.6 — 197.3) |

## Reading

- **Scaling is roughly linear in N**. Going from 250 → 5000 notes
  is a 20× input grow; runtime grows ~19× for `scan_notes_with_debug`
  and ~22× for `build_bundle`. No quadratic blow-up has crept in.
- **Scoring overhead is small**. `build_context` minus
  `scan_notes_with_debug` at N=5000 is ~73 ms — about 43% of the
  total bundle cost. Most of that is the per-note scoring loop (now
  also producing a `Vec<ScoreContribution>` per scored note); the
  rest is excerpt construction and route metadata.
- **5000-note vault under 200 ms** is well below the latency budget
  for an interactive `spool get` call. There is no immediate need
  for a persistent index. Reconsider only if real vaults exceed
  ~25 000 notes or if scoring rules grow to push `build_bundle`
  past ~500 ms.
- **Variance at 250 notes** is wider than at 5000 (relative
  spread). That is expected — the warmup and timer noise matter
  more at sub-10 ms scales. Treat the 5000 number as the most
  trustworthy capacity signal.

## What Round 16 should monitor

If a future change risks regressing these numbers, re-run the bench
and update this file. Specific watch-outs:

- adding any per-section work that loops across all sections of
  every note (e.g. semantic similarity)
- introducing IO inside the scoring path (e.g. ledger lookups per
  candidate)
- materializing extra owned `String`s inside the hot loop in
  `Accumulator::add` — currently each contribution allocates one
  reason string and two field/term strings; that is the linear
  per-contribution cost we accept for the breakdown surface

## Re-run

```bash
cargo bench --bench retrieval
```

Output is written to `target/criterion/` (HTML reports under
`target/criterion/<group>/<size>/report/index.html`). Numbers above
were captured from the criterion summary at the end of stdout.