specters 4.2.2 - Docs.rs

# Specter Test & Build Optimization Shared Plan

Status: implemented for the low-risk test/build optimization phases; CI sharding and deeper H2/H3 timing cleanup remain deferred.

Source plan: `/Users/jaredboynton/.kimi/plans/daken-martian-manhunter-blue-marvel.md`

Created: 2026-05-25

## Purpose

Reduce local and CI validation latency for many concurrent workers without changing product behavior, weakening final validation, or disturbing the in-progress native H3/RFC9220 proof artifacts.

This plan was built from six read-only subagent passes:

- 3x `gpt-5.4-mini` mappers for test waits, nextest/selective testing, and CI/build surfaces.
- 3x `gpt-5.5` medium planners for phase ordering, measurement/validation, and worker coordination.

## Implementation Update — 2026-05-25

Closed work:

- Added `just test-changed` and updated `just test`, `just test-cargo`, `just clippy`, and `just check` to use locked cargo invocations where applicable.
- Added nextest `h3-stateful` and `streaming-heavy` groups plus CI profile tuning in `.config/nextest.toml`.
- Added `profile.fast-test` for inner-loop compile/test iteration in `Cargo.toml`.
- Removed fixed 5-second connection-hold sleeps, H1 startup sleeps, and compression post-response sleeps from the lower-risk test set.
- Removed the RFC 9111 cache expiry wall-clock sleep by retaining validator-backed `max-age=0` responses as immediately stale and revalidatable.
- Added Rust cache/sccache and target-specific BoringSSL cache coverage to cargo-heavy CI, Node release, and Python release jobs.
- Added concurrent-worker test/build guidance to `AGENTS.md`.

Validation completed:

- `cargo nextest run --all-features --locked --no-fail-fast -E 'binary(=timeout_budget) | binary(=rfc9111_caching) | binary(=h1_rfc_compliance) | binary(=error_handling) | binary(=h1_streaming) | binary(=streaming_public_api) | binary(=compression) | binary(=builder_knobs)'` — 53 passed, 0 skipped.
- `cargo nextest run --all-features --locked -E 'binary(=rfc9111_caching)'` — 3 passed, 0 skipped.

Remaining deferred work:

- CI job splitting and nextest archive sharding were not implemented; keep them gated on real cold/warm CI duration evidence.
- H2 frame-timeout centralization and H3 settle-sleep replacement were not implemented; keep them separate from the active native H3/RFC9220 work.
- Full-suite repeated flake-gate runs were not completed locally because other workers were running expensive native H3 benchmark builds in the shared worktree.

## Non-Goals

- Avoid runtime HTTP/H2/H3/WebSocket behavior changes as part of test/build optimization work; the one landed exception is the RFC 9111 cache fix that preserves validator-backed `max-age=0` responses for immediate revalidation instead of weakening the cache test.
- Do not change README benchmark tables unless fresh reproducible benchmark artifacts and `CHANGELOG.md` cause entries support the update.
- Do not edit temporary native H3 proof artifacts unless the native H3/RFC9220 gap set is actually resolved.
- Do not treat `just test-changed` or any selective helper as merge-ready final validation.
- Do not mask flakes with retries, shorter arbitrary sleeps, or polling loops.

## Original Repo Anchors

- These anchors record the pre-implementation snapshot used by the subagents; see the implementation update above for the current state.
- `just test` ran `cargo nextest run --all-features` from `justfile:160`.
- `just test-cargo` ran `cargo test --all-features` from `justfile:176`.
- `just check` ran `fmt-check`, `clippy`, then `test` sequentially at `justfile:211`.
- `.config/nextest.toml:1` defined only minimal default/CI/pre-push profiles; there were no test groups or overrides yet.
- `.config/nextest.toml:3` used `test-threads = "num-cpus"`.
- `.config/nextest.toml:22` had CI `fail-fast = true`.
- `.config/nextest.toml:31` had pre-push `fail-fast = false`.
- `.github/workflows/ci.yml:27` and `.github/workflows/ci.yml:30` already added sccache and Cargo registry/git cache to the macOS test job.
- `.github/workflows/ci.yml:41`, `.github/workflows/ci.yml:54`, and `.github/workflows/ci.yml:63` ran fmt, nextest, and examples sequentially in one job.
- `.github/workflows/ci.yml:73` and `.github/workflows/ci.yml:106` defined Linux and Windows build matrix jobs without equivalent Rust cache/sccache setup.
- `Cargo.toml:105` and `Cargo.toml:112` already tuned `dev` and `test` debug info, but there was no separate `fast-test` profile.
- `scripts/install-boringssl-prebuilt.sh:42` already used `cargo metadata --locked`.
- `scripts/install-boringssl-prebuilt.sh:58` verified SHA256 checksums.
- `scripts/install-boringssl-prebuilt.sh:142` exported `BORING_BSSL_PATH` for CI.
- `scripts/lib-bssl-env.sh:41` resolved repo-local BoringSSL paths after env/user-wide prebuilts.

## Corrected Kimi Plan Claims

- These were the pre-implementation corrections used to scope the work.
- The overall optimization opportunity was real: tests contained many fixed waits/timeouts, and shared nextest/CI controls were minimal.
- `tests/h1_pooling.rs` had mapped sleeps at `tests/h1_pooling.rs:23`, `tests/h1_pooling.rs:54`, `tests/h1_pooling.rs:69`, `tests/h1_pooling.rs:87`, `tests/h1_pooling.rs:188`, and `tests/h1_pooling.rs:226`. Only the first few were startup-style waits.
- `tests/h3_streaming_pool.rs` had 13 mapped settle sleeps, not 15.
- `tests/validation_h2_streaming.rs` had 22 `timeout(Duration::from_secs(3), conn.read_frame())` guards, not roughly 30.
- “CI/build has no caching in most matrix jobs” was overstated: the macOS CI test job already had sccache and Cargo registry/git cache, but Linux/Windows build jobs and release workflows still lacked equivalent Rust caching.
- “Current CI config uses `fail-fast = false`” was stale: CI had `fail-fast = true` at `.config/nextest.toml:22`; pre-push had `false`.
- “No `--locked` usage exists” was stale repo-wide; helper scripts already used locked metadata and some scripts used locked cargo runs, but GitHub workflow cargo commands mostly still omitted `--locked`.

## Hotspot Map

### 5-Second Connection Holds

These are high-confidence P0 fixes because they hold a connection open and can be replaced with `tokio::sync::oneshot` parking:

- `tests/error_handling.rs:84`
- `tests/error_handling.rs:228`
- `tests/h1_streaming.rs:186`
- `tests/streaming_public_api.rs:197`

Implementation rule:

- Replace fixed `tokio::time::sleep(Duration::from_secs(5))` in background tasks with a parked receiver.
- Keep the stream owned by the spawned task so the connection remains open.
- Do not introduce a shorter sleep.

### Startup Sleeps

These should be removed only when readiness is already deterministic or replaced with explicit readiness signaling:

- `tests/error_handling.rs:134`
- `tests/error_handling.rs:186`
- `tests/h1_rfc_compliance.rs:29`
- `tests/h1_rfc_compliance.rs:47`
- `tests/h1_rfc_compliance.rs:81`
- `tests/h1_rfc_compliance.rs:109`
- `tests/h1_rfc_compliance.rs:135`
- `tests/h1_rfc_compliance.rs:166`
- `tests/h1_rfc_compliance.rs:197`
- `tests/h1_rfc_compliance.rs:228`
- `tests/h1_rfc_compliance.rs:257`
- `tests/h1_rfc_compliance.rs:287`
- `tests/h1_rfc_compliance.rs:316`
- `tests/h1_rfc_compliance.rs:344`
- `tests/h1_rfc_compliance.rs:377`
- `tests/h1_pooling.rs:23`
- `tests/h1_pooling.rs:54`
- `tests/h1_pooling.rs:87`

Implementation rule:

- Prefer bound-socket readiness, server-start return guarantees, `oneshot`, or `Notify`.
- If deleting a startup sleep creates connection-refused flakes, restore the test by adding readiness signaling, not by adding another fixed delay.

### H3 Settle Sleeps

These are medium-risk and should be deferred until lower-risk H1/H2 cleanup lands:

- `tests/h3_streaming_correctness.rs:29`
- `tests/h3_streaming_correctness.rs:98`
- `tests/h3_streaming_correctness.rs:185`
- `tests/h3_streaming_correctness.rs:187`
- `tests/h3_streaming_correctness.rs:241`
- `tests/h3_streaming_correctness.rs:243`
- `tests/h3_streaming_correctness.rs:384`
- `tests/h3_streaming_correctness.rs:503`
- `tests/h3_streaming_correctness.rs:551`
- `tests/h3_streaming_pool.rs:97`
- `tests/h3_streaming_pool.rs:163`
- `tests/h3_streaming_pool.rs:213`
- `tests/h3_streaming_pool.rs:343`
- `tests/h3_streaming_pool.rs:410`
- `tests/h3_streaming_pool.rs:458`
- `tests/h3_streaming_pool.rs:477`
- `tests/h3_streaming_pool.rs:507`
- `tests/h3_streaming_pool.rs:545`
- `tests/h3_streaming_pool.rs:564`
- `tests/h3_streaming_pool.rs:592`
- `tests/h3_streaming_pool.rs:630`
- `tests/h3_streaming_pool.rs:652`

Implementation rule:

- Replace settle sleeps with explicit H3 test-local state signaling using `Notify`, `watch`, or protocol-event observation.
- Do not make product-code H3 transport changes in the same ticket unless the test cannot be made deterministic without a real bug fix.
- Do not update native H3 proof docs from this work unless the native H3 gap set is actually closed.

### Compression Sleeps

These are likely safe after confirming the helper returns after listener bind/readiness:

- `tests/compression.rs:94`
- `tests/compression.rs:122`
- `tests/compression.rs:148`
- `tests/compression.rs:174`
- `tests/compression.rs:200`
- `tests/compression.rs:224`

Implementation rule:

- Delete only after the gzip/deflate/brotli/zstd/identity/raw-byte tests pass repeatedly.
- If a race appears, signal server readiness from `start_encoding_server`, not a fixed delay.

### Blocking Cache Sleep — Closed

This was P1 because it was a real wall-clock wait, but it proved cache expiry behavior:

- `tests/rfc9111_caching.rs:83`

Closed implementation:

- Removed the wall-clock sleep from `tests/rfc9111_caching.rs`.
- Updated `HttpCache` to retain `max-age=0` responses when they include `ETag` or `Last-Modified`, so they are stale immediately and return `CacheStatus::Revalidate`.

### H2 Frame Timeout Guards

These are risky to lower in one sweep because they may convert slow CI into flakes:

- `tests/validation_h2_streaming.rs:51`
- `tests/validation_h2_streaming.rs:180`
- `tests/validation_h2_streaming.rs:307`
- `tests/validation_h2_streaming.rs:430`
- `tests/validation_h2_streaming.rs:531`
- `tests/validation_h2_streaming.rs:617`
- `tests/validation_h2_streaming.rs:756`
- `tests/validation_h2_streaming.rs:865`
- `tests/validation_h2_streaming.rs:1101`
- `tests/validation_h2_streaming.rs:1207`
- `tests/validation_h2_streaming.rs:1351`
- `tests/validation_h2_streaming.rs:1503`
- `tests/validation_h2_streaming.rs:1620`
- `tests/validation_h2_streaming.rs:1751`
- `tests/validation_h2_streaming.rs:1851`
- `tests/validation_h2_streaming.rs:1964`
- `tests/validation_h2_streaming.rs:2125`
- `tests/validation_h2_streaming.rs:2245`
- `tests/validation_h2_streaming.rs:2370`
- `tests/validation_h2_streaming.rs:2537`
- `tests/validation_h2_streaming.rs:2692`
- `tests/validation_h2_streaming.rs:3022`

Implementation rule:

- Prefer a shared timeout helper or outer request/test deadline over blanket 500ms frame deadlines.
- Keep frame-level guards only where the test needs a precise protocol-step failure.
- Make timeout values env-tunable if CI variability remains high.

### Timeout Budget Guard

Current guardrails:

- `tests/timeout_budget.rs:14` sets `MAX_TIMEOUT_SECS = 15`.
- `tests/timeout_budget.rs:15` sets `MAX_SLEEP_SECS = 1`.

Implementation rule:

- Tighten only after the sleep removals and timeout-helper work land.
- Lowering this first will create noisy policy failures before the suite has been cleaned.

## Nextest And Selective Testing Plan

### Implemented State

- Nextest config includes `h3-stateful` and `streaming-heavy` test groups in `.config/nextest.toml`.
- Default parallelism remains `num-cpus`; CI uses `test-threads = 4`.
- CI invokes `cargo nextest run --all-features --profile ci --locked`.
- `just test-changed` now provides a conservative changed-file selector; manual exact filters remain useful for focused debugging.

### Design Guidance

- Use nextest `binary()` selectors for integration-test binaries, not unit-test-style `test(/^tests::.../)` filters.
- Use exact binary filters like `binary(=error_handling)` for changed `tests/error_handling.rs`.
- Use prefix binary filters for families only after validating syntax with `cargo nextest list -E`.
- Use `test-group` with `max-threads = 1` for mutual exclusion.
- Use `threads-required` only for tests that need more execution slots, not for exclusivity.
- Validate every new nextest filter with `cargo nextest list --all-features -E '<filter>'` before landing.

### `just test-changed` Requirements

- Print changed files and the selected command before running.
- Compute a safe merge base instead of assuming `main...HEAD`.
- For changed `tests/*.rs`, run the matching integration binary with an exact `binary(=stem)` filter.
- Fall back to the full suite for:
  - `src/**`
  - `Cargo.toml`
  - `Cargo.lock`
  - `tests/helpers/**`
  - `src/lib.rs`
  - `.config/nextest.toml`
  - shared scripts or unknown paths
- Treat `just test-changed` as inner-loop acceleration only.

## CI And Build Plan

### Implemented State

- The macOS test job keeps `CARGO_INCREMENTAL=0`, sccache, Rust cache, and BoringSSL cache coverage.
- Linux and Windows build jobs now use sccache, Rust cache, and target-specific BoringSSL cache coverage.
- Node release and Python release cargo-heavy jobs now use sccache/Rust cache; wheel/develop cargo invocations use `--locked` where supported.
- BoringSSL install steps remain the source of truth and checksum verification remains intact.

### Design Guidance

- Add Rust cache/sccache only where it is missing and useful; do not duplicate or fight the existing macOS test-job cache.
- Add target-specific `lib/boringssl` cache keys if BoringSSL download/install time is material.
- Preserve `scripts/install-boringssl-prebuilt.sh` as the release workflow source of truth.
- Keep checksum verification intact.
- Add `--locked` to workflow cargo commands where supported.
- Split lint/test/examples only after the cache changes are stable.
- Add nextest archive/sharding only after baseline and cache measurements prove it is worth the extra workflow complexity.

## Phase Plan

### Phase 0 — Baseline

Goal: measure current runtime and capture an artifact trail before changing behavior.

Scope:

- No tracked file edits.
- Write local logs under `target/test-optimization/baseline/`.

Commands:

```bash
mkdir -p target/test-optimization/baseline
git rev-parse HEAD | tee target/test-optimization/baseline/commit.txt
git status --short | tee target/test-optimization/baseline/status.txt
rustc --version | tee target/test-optimization/baseline/rustc.txt
cargo --version | tee target/test-optimization/baseline/cargo.txt
cargo nextest --version | tee target/test-optimization/baseline/nextest.txt
cargo nextest list --all-features | tee target/test-optimization/baseline/nextest-list.txt
/usr/bin/time -l just test 2>&1 | tee target/test-optimization/baseline/just-test.log
```

Stop conditions:

- The working tree has unrelated edits in a planned write scope.
- Another worker owns the same file cluster.
- Baseline cannot run because of a repo-wide compile failure unrelated to this plan.

### Phase 1 — Fast Local Test Wins

Goal: remove avoidable fixed waits without changing product behavior.

Owned files:

- `tests/error_handling.rs`
- `tests/streaming_public_api.rs`
- `tests/h1_streaming.rs`
- `tests/h1_rfc_compliance.rs`
- `tests/h1_pooling.rs`
- `tests/compression.rs`
- Optionally `tests/timeout_budget.rs` after cleanup lands.

Work:

- Replace 5-second hold sleeps with `oneshot` parking.
- Remove startup sleeps only where readiness is proven.
- Remove compression sleeps after proving server readiness.
- Defer H3 settle sleeps and H2 blanket timeout reductions to later phases.

Validation:

```bash
cargo nextest run --all-features -E 'binary(=error_handling) | binary(=streaming_public_api) | binary(=h1_streaming) | binary(=h1_rfc_compliance) | binary(=h1_pooling) | binary(=compression)'
```

Final gate:

- Repeat targeted tests enough times to catch timing flakes.
- Run broader test coverage if shared helpers or `tests/timeout_budget.rs` changed.

### Phase 2 — Nextest Concurrency Controls

Goal: improve worker behavior with low-risk config changes.

Owned files:

- `.config/nextest.toml`

Work:

- Add conservative test groups and profile tuning.
- Cap CI concurrency if CI shows CPU/port contention.
- Set CI `fail-fast = false` only if failure reporting needs full visibility.
- Add overrides only after validating each filter with `cargo nextest list -E`.

Validation:

```bash
cargo nextest list --all-features
cargo nextest run --all-features --profile ci
```

Stop conditions:

- Runtime increases on normal local execution.
- Filters do not match intended binaries.
- Retries hide flakes rather than surfacing them.

### Phase 3 — Selective Test Helper

Goal: provide a safe inner-loop shortcut.

Owned files:

- `justfile`
- Optional helper script under `scripts/` if the shell logic becomes too large.

Work:

- Add `just test-changed`.
- Map changed `tests/*.rs` files to exact nextest binary filters.
- Fall back to full suite for shared infrastructure and ambiguous changes.
- Print selected command before running.

Validation:

```bash
just test-changed main
cargo nextest list --all-features -E 'binary(=error_handling)'
```

Stop conditions:

- The helper skips relevant tests for source changes.
- It fails when the base branch is missing.
- It encourages replacing final full-surface validation.

### Phase 4 — CI Cache And Build Reuse

Goal: reduce CI wall time without changing tests.

Owned files:

- `.github/workflows/ci.yml`
- `.github/workflows/node-release.yml`
- `.github/workflows/python-release.yml`

Work:

- Add sccache and Rust cache to cargo-heavy jobs that lack them.
- Add target-specific BoringSSL cache if install/download time is material.
- Add `--locked` to supported cargo commands.
- Preserve release workflow BoringSSL install and SHA256 verification.

Validation:

- Workflow syntax review.
- Cold-cache and warm-cache GitHub Actions duration comparison.
- Release workflows still build expected Node/Python artifacts.

Stop conditions:

- Cache restore masks missing BoringSSL install steps.
- Wrong-target BoringSSL artifacts can be reused.
- Release prebuilt checksum verification is weakened.

### Phase 5 — CI Sharding And Job Split

Goal: scale test execution after cache behavior is stable.

Owned files:

- `.github/workflows/ci.yml`

Work:

- Split lint/test/examples where useful.
- Compile nextest archive once.
- Run sharded nextest partitions from the archive.
- Preserve complete failure output.

Validation:

```bash
cargo nextest archive --all-features --profile ci --archive-file target/test-optimization/phase5/tests.tar.zst
cargo nextest run --archive-file target/test-optimization/phase5/tests.tar.zst --extract-to target/test-optimization/phase5/archive-extract-1 --partition count:1/2 --profile ci
cargo nextest run --archive-file target/test-optimization/phase5/tests.tar.zst --extract-to target/test-optimization/phase5/archive-extract-2 --partition count:2/2 --profile ci
```

Stop conditions:

- Shards recompile instead of consuming the archive.
- Shards omit tests or duplicate unexpected tests.
- Failure reporting becomes harder than the current workflow.

### Phase 6 — Fast Compile Profile

Goal: improve local compile/test iteration after selection and nextest profiles exist.

Owned files:

- `Cargo.toml`
- Optional `justfile` recipe if needed.

Work:

- Benchmark whether a separate `fast-test` profile still adds value on top of current `profile.dev` and `profile.test` tuning.
- If useful, add it as inner-loop only.
- Do not use it for release, benchmark, or superiority claims.

Validation:

```bash
cargo nextest run --all-features --cargo-profile fast-test
cargo nextest run --all-features
```

Stop conditions:

- The profile changes release or benchmark behavior.
- Tests behave differently between `fast-test` and normal profiles.
- Speedup is too small to justify another profile.

### Phase 7 — H2/H3 Deep Timing Cleanup

Goal: remove riskier protocol-test waits after lower-risk cleanup has landed.

Owned files:

- `tests/validation_h2_streaming.rs`
- `tests/h3_streaming_pool.rs`
- `tests/h3_streaming_correctness.rs`
- `tests/rfc9111_caching.rs`

Work:

- Centralize or outer-scope H2 frame-read timeouts.
- Replace H3 settle sleeps with explicit state signals.
- Replace cache wall-clock expiry with mock clock or injectable TTL if practical.

Validation:

```bash
cargo nextest run --all-features -E 'binary(=validation_h2_streaming) | binary(=h3_streaming_pool) | binary(=h3_streaming_correctness) | binary(=rfc9111_caching)'
cargo test --test validation_h2_streaming -- --nocapture
cargo check --benches
```

Stop conditions:

- H3 fixes require product-code changes while native H3 work is active.
- A timeout change creates CI-only flakes.
- Cache semantics are weakened.

### Phase 8 — Shared Conventions Update

Goal: update agent/contributor guidance only after commands and behavior are real.

Owned files:

- `AGENTS.md`

Work:

- Add test/build conventions for concurrent workers.
- Preserve existing README benchmark and temporary native H3 artifact instructions.
- State that `just test-changed` is inner-loop only.
- Add “no fixed sleeps for synchronization” guidance.

Suggested wording:

```markdown
## Test & Build Conventions for Concurrent Workers

- Prefer `just test-changed` for local inner-loop validation when it exists and when shared infrastructure did not change.
- Use targeted `cargo nextest run` filters for changed integration-test files before broader validation.
- Do not add fixed sleeps to tests for synchronization; use `oneshot`, `Notify`, `watch`, readiness probes, or explicit protocol events.
- Bind local test servers to `127.0.0.1:0`; do not introduce fixed ports.
- Use per-test temporary directories for artifacts unless a shared fixture is protected by `OnceLock` or equivalent.
- Treat `justfile`, nextest config, Cargo profiles, and CI workflows as shared coordination files.
- Selective tests are not final merge proof; run validation matching every touched surface before handing off.
```

Stop conditions:

- Commands documented do not exist yet.
- Wording conflicts with benchmark artifact or native H3 artifact instructions.
- Wording could cause agents to skip final validation.

## Ticket Backlog

| ID | Priority | Axis | Scope | Files | Status | Validation |
| --- | --- | --- | --- | --- | --- | --- |
| T1 | P0 | waits | Replace 5-second connection holds | `tests/error_handling.rs`, `tests/streaming_public_api.rs`, `tests/h1_streaming.rs` | closed | targeted suite passed |
| T2 | P0 | waits | Remove proven H1 startup sleeps | `tests/h1_rfc_compliance.rs`, `tests/h1_pooling.rs`, `tests/error_handling.rs` | closed | targeted suite passed |
| T3 | P1 | waits | Remove compression sleeps | `tests/compression.rs` | closed | targeted suite passed |
| T4 | P1 | waits | Replace cache wall-clock sleep | `tests/rfc9111_caching.rs`, `src/cache.rs` | closed | `binary(=rfc9111_caching)` passed |
| T5 | P1 | waits | Centralize H2 streaming timeouts | `tests/validation_h2_streaming.rs` | deferred | not implemented |
| T6 | P1 | waits | Replace H3 settle sleeps | `tests/h3_streaming_pool.rs`, `tests/h3_streaming_correctness.rs` | deferred | not implemented |
| T7 | P0 | nextest | Add groups/profile tuning | `.config/nextest.toml` | closed | filter/list validation and targeted suite passed |
| T8 | P0 | selective | Add `just test-changed` | `justfile` | closed | `just --list`, filter validation |
| T9 | P0 | CI | Add missing Rust cache/sccache | `.github/workflows/*.yml` | closed | YAML parse passed |
| T10 | P1 | CI | Cache BoringSSL prebuilts safely | `.github/workflows/*.yml` | closed | YAML parse passed |
| T11 | P1 | CI | Split lint/test/examples | `.github/workflows/ci.yml` | deferred | not implemented |
| T12 | P1 | CI | Add nextest archive sharding | `.github/workflows/ci.yml` | deferred | not implemented |
| T13 | P1 | build | Evaluate/add `fast-test` profile | `Cargo.toml` | closed | `--cargo-profile fast-test` smoke passed |
| T14 | P2 | docs | Add AGENTS conventions | `AGENTS.md` | closed | reviewed against actual commands |

## Coordination Rules

- Claim a ticket before editing.
- One owner per file cluster.
- Check `git status --short` before editing and stop on unrelated edits in your target files.
- Do not revert or overwrite another worker’s changes.
- Keep tickets narrow; do not combine CI/cache work with test-behavior changes.
- Record exact validation commands and pass/fail evidence in the ticket row or handoff.
- Prefer append-only coordination notes over rewriting another worker’s status.
- If removing a wait reveals a race, mark the ticket blocked with a repro; do not replace it with a shorter fixed delay.

## Measurement Artifacts

Use untracked directories for local proof:

```text
target/test-optimization/baseline/
target/test-optimization/phase1/
target/test-optimization/phase2/
target/test-optimization/phase3/
target/test-optimization/phase4/
target/test-optimization/final/
```

Capture:

- `commit.txt`
- `status.txt`
- `environment.txt`
- `nextest-list.txt`
- targeted command logs
- full-suite command logs
- CI job duration summaries
- cache hit/miss evidence
- `summary.md`
- `summary.json`

Only promote results into `docs/benchmarks/<YYYY-MM-DD>-test-build-optimization/` if the run is reproducible enough to become a durable artifact.

## Flake Gate

Before declaring timing-sensitive changes stable:

```bash
mkdir -p target/test-optimization/flake

for i in 1 2 3 4 5; do
  /usr/bin/time -l cargo nextest run --all-features --profile ci \
    2>&1 | tee "target/test-optimization/flake/full-ci-repeat-${i}.log"
done

for i in 1 2 3 4 5 6 7 8 9 10; do
  /usr/bin/time -l cargo nextest run --all-features \
    -E 'binary(=error_handling) | binary(=h1_rfc_compliance) | binary(=h1_pooling) | binary(=validation_h2_streaming) | binary(=h3_streaming_pool) | binary(=h3_streaming_correctness) | binary(=rfc9111_caching) | binary(=compression)' \
    2>&1 | tee "target/test-optimization/flake/targeted-repeat-${i}.log"
done
```

Acceptance:

- Zero failures across repeated targeted runs for edited sleep/timeout/network tests.
- No retry-only passes accepted as clean proof.
- Failures under high parallelism must be triaged as contention vs logic.
- Full-suite failures must be compared against targeted logs for shared filesystem, dynamic port, or runtime starvation causes.

## Final Validation Matrix

| Touched Surface | Inner Loop | Final Validation |
| --- | --- | --- |
| Individual `tests/*.rs` files | matching nextest binary | all touched binaries, repeated if timing-sensitive |
| Shared test helpers | nearby binaries | full `just test` or equivalent |
| `.config/nextest.toml` | `cargo nextest list` and representative filters | full default and CI-profile nextest runs |
| `justfile` test recipes | recipe scenario tests | recipe plus full touched-surface validation |
| `Cargo.toml` profiles | fast-profile touched binaries | normal-profile touched-surface tests |
| CI workflows | syntax/command review | full relevant GitHub Actions workflow |
| README benchmark table | none | fresh repeated benchmark artifacts and `CHANGELOG.md` cause |
| Native H3 tests | H3-specific binaries | H3 plus affected transport suites |

## Decision Log

- `just test-changed` is useful, but it is not final validation.
- Nextest filters in implementation examples must be validated locally; a syntax check attempted during planning triggered compilation and was stopped because another artifact lock was active.
- `binary()`-based filters are preferred for this repo’s integration-test layout.
- `threads-required` is not a mutual-exclusion mechanism; use `test-group` for exclusive resources.
- H3 settle sleep cleanup is deferred behind lower-risk H1/H2 and config work.
- A separate `fast-test` profile must be benchmarked before adoption because `Cargo.toml` already tunes `profile.dev` and `profile.test`.