specters 4.1.7

Rust HTTP client with browser-like Chrome and Firefox fingerprints across TLS, HTTP/1.1, HTTP/2, HTTP/3, and WebSockets
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
# Specter Test & Build Optimization Shared Plan

Status: implemented for the low-risk test/build optimization phases; CI sharding and deeper H2/H3 timing cleanup remain deferred.

Source plan: `/Users/jaredboynton/.kimi/plans/daken-martian-manhunter-blue-marvel.md`

Created: 2026-05-25

## Purpose

Reduce local and CI validation latency for many concurrent workers without changing product behavior, weakening final validation, or disturbing the in-progress native H3/RFC9220 proof artifacts.

This plan was built from six read-only subagent passes:

- 3x `gpt-5.4-mini` mappers for test waits, nextest/selective testing, and CI/build surfaces.
- 3x `gpt-5.5` medium planners for phase ordering, measurement/validation, and worker coordination.

## Implementation Update — 2026-05-25

Closed work:

- Added `just test-changed` and updated `just test`, `just test-cargo`, `just clippy`, and `just check` to use locked cargo invocations where applicable.
- Added nextest `h3-stateful` and `streaming-heavy` groups plus CI profile tuning in `.config/nextest.toml`.
- Added `profile.fast-test` for inner-loop compile/test iteration in `Cargo.toml`.
- Removed fixed 5-second connection-hold sleeps, H1 startup sleeps, and compression post-response sleeps from the lower-risk test set.
- Removed the RFC 9111 cache expiry wall-clock sleep by retaining validator-backed `max-age=0` responses as immediately stale and revalidatable.
- Added Rust cache/sccache and target-specific BoringSSL cache coverage to cargo-heavy CI, Node release, and Python release jobs.
- Added concurrent-worker test/build guidance to `AGENTS.md`.

Validation completed:

- `cargo nextest run --all-features --locked --no-fail-fast -E 'binary(=timeout_budget) | binary(=rfc9111_caching) | binary(=h1_rfc_compliance) | binary(=error_handling) | binary(=h1_streaming) | binary(=streaming_public_api) | binary(=compression) | binary(=builder_knobs)'` — 53 passed, 0 skipped.
- `cargo nextest run --all-features --locked -E 'binary(=rfc9111_caching)'` — 3 passed, 0 skipped.

Remaining deferred work:

- CI job splitting and nextest archive sharding were not implemented; keep them gated on real cold/warm CI duration evidence.
- H2 frame-timeout centralization and H3 settle-sleep replacement were not implemented; keep them separate from the active native H3/RFC9220 work.
- Full-suite repeated flake-gate runs were not completed locally because other workers were running expensive native H3 benchmark builds in the shared worktree.

## Non-Goals

- Avoid runtime HTTP/H2/H3/WebSocket behavior changes as part of test/build optimization work; the one landed exception is the RFC 9111 cache fix that preserves validator-backed `max-age=0` responses for immediate revalidation instead of weakening the cache test.
- Do not change README benchmark tables unless fresh reproducible benchmark artifacts and `CHANGELOG.md` cause entries support the update.
- Do not edit temporary native H3 proof artifacts unless the native H3/RFC9220 gap set is actually resolved.
- Do not treat `just test-changed` or any selective helper as merge-ready final validation.
- Do not mask flakes with retries, shorter arbitrary sleeps, or polling loops.

## Original Repo Anchors

- These anchors record the pre-implementation snapshot used by the subagents; see the implementation update above for the current state.
- `just test` ran `cargo nextest run --all-features` from `justfile:160`.
- `just test-cargo` ran `cargo test --all-features` from `justfile:176`.
- `just check` ran `fmt-check`, `clippy`, then `test` sequentially at `justfile:211`.
- `.config/nextest.toml:1` defined only minimal default/CI/pre-push profiles; there were no test groups or overrides yet.
- `.config/nextest.toml:3` used `test-threads = "num-cpus"`.
- `.config/nextest.toml:22` had CI `fail-fast = true`.
- `.config/nextest.toml:31` had pre-push `fail-fast = false`.
- `.github/workflows/ci.yml:27` and `.github/workflows/ci.yml:30` already added sccache and Cargo registry/git cache to the macOS test job.
- `.github/workflows/ci.yml:41`, `.github/workflows/ci.yml:54`, and `.github/workflows/ci.yml:63` ran fmt, nextest, and examples sequentially in one job.
- `.github/workflows/ci.yml:73` and `.github/workflows/ci.yml:106` defined Linux and Windows build matrix jobs without equivalent Rust cache/sccache setup.
- `Cargo.toml:105` and `Cargo.toml:112` already tuned `dev` and `test` debug info, but there was no separate `fast-test` profile.
- `scripts/install-boringssl-prebuilt.sh:42` already used `cargo metadata --locked`.
- `scripts/install-boringssl-prebuilt.sh:58` verified SHA256 checksums.
- `scripts/install-boringssl-prebuilt.sh:142` exported `BORING_BSSL_PATH` for CI.
- `scripts/lib-bssl-env.sh:41` resolved repo-local BoringSSL paths after env/user-wide prebuilts.

## Corrected Kimi Plan Claims

- These were the pre-implementation corrections used to scope the work.
- The overall optimization opportunity was real: tests contained many fixed waits/timeouts, and shared nextest/CI controls were minimal.
- `tests/h1_pooling.rs` was not just four startup sleeps; mapped sleeps were at `tests/h1_pooling.rs:23`, `tests/h1_pooling.rs:54`, `tests/h1_pooling.rs:69`, `tests/h1_pooling.rs:87`, `tests/h1_pooling.rs:188`, and `tests/h1_pooling.rs:226`. Only the first few were startup-style waits.
- `tests/h3_streaming_pool.rs` had 13 mapped settle sleeps, not 15.
- `tests/validation_h2_streaming.rs` had 22 `timeout(Duration::from_secs(3), conn.read_frame())` guards, not roughly 30.
- “CI/build has no caching in most matrix jobs” was overstated: the macOS CI test job already had sccache and Cargo registry/git cache, but Linux/Windows build jobs and release workflows still lacked equivalent Rust caching.
- “Current CI config uses `fail-fast = false`” was stale: CI had `fail-fast = true` at `.config/nextest.toml:22`; pre-push had `false`.
- “No `--locked` usage exists” was stale repo-wide; helper scripts already used locked metadata and some scripts used locked cargo runs, but GitHub workflow cargo commands mostly still omitted `--locked`.

## Hotspot Map

### 5-Second Connection Holds

These are high-confidence P0 fixes because they hold a connection open and can be replaced with `tokio::sync::oneshot` parking:

- `tests/error_handling.rs:84`
- `tests/error_handling.rs:228`
- `tests/h1_streaming.rs:186`
- `tests/streaming_public_api.rs:197`

Implementation rule:

- Replace fixed `tokio::time::sleep(Duration::from_secs(5))` in background tasks with a parked receiver.
- Keep the stream owned by the spawned task so the connection remains open.
- Do not introduce a shorter sleep.

### Startup Sleeps

These should be removed only when readiness is already deterministic or replaced with explicit readiness signaling:

- `tests/error_handling.rs:134`
- `tests/error_handling.rs:186`
- `tests/h1_rfc_compliance.rs:29`
- `tests/h1_rfc_compliance.rs:47`
- `tests/h1_rfc_compliance.rs:81`
- `tests/h1_rfc_compliance.rs:109`
- `tests/h1_rfc_compliance.rs:135`
- `tests/h1_rfc_compliance.rs:166`
- `tests/h1_rfc_compliance.rs:197`
- `tests/h1_rfc_compliance.rs:228`
- `tests/h1_rfc_compliance.rs:257`
- `tests/h1_rfc_compliance.rs:287`
- `tests/h1_rfc_compliance.rs:316`
- `tests/h1_rfc_compliance.rs:344`
- `tests/h1_rfc_compliance.rs:377`
- `tests/h1_pooling.rs:23`
- `tests/h1_pooling.rs:54`
- `tests/h1_pooling.rs:87`

Implementation rule:

- Prefer bound-socket readiness, server-start return guarantees, `oneshot`, or `Notify`.
- If deleting a startup sleep creates connection-refused flakes, restore the test by adding readiness signaling, not by adding another fixed delay.

### H3 Settle Sleeps

These are medium-risk and should be deferred until lower-risk H1/H2 cleanup lands:

- `tests/h3_streaming_correctness.rs:29`
- `tests/h3_streaming_correctness.rs:98`
- `tests/h3_streaming_correctness.rs:185`
- `tests/h3_streaming_correctness.rs:187`
- `tests/h3_streaming_correctness.rs:241`
- `tests/h3_streaming_correctness.rs:243`
- `tests/h3_streaming_correctness.rs:384`
- `tests/h3_streaming_correctness.rs:503`
- `tests/h3_streaming_correctness.rs:551`
- `tests/h3_streaming_pool.rs:97`
- `tests/h3_streaming_pool.rs:163`
- `tests/h3_streaming_pool.rs:213`
- `tests/h3_streaming_pool.rs:343`
- `tests/h3_streaming_pool.rs:410`
- `tests/h3_streaming_pool.rs:458`
- `tests/h3_streaming_pool.rs:477`
- `tests/h3_streaming_pool.rs:507`
- `tests/h3_streaming_pool.rs:545`
- `tests/h3_streaming_pool.rs:564`
- `tests/h3_streaming_pool.rs:592`
- `tests/h3_streaming_pool.rs:630`
- `tests/h3_streaming_pool.rs:652`

Implementation rule:

- Replace settle sleeps with explicit H3 test-local state signaling using `Notify`, `watch`, or protocol-event observation.
- Do not make product-code H3 transport changes in the same ticket unless the test cannot be made deterministic without a real bug fix.
- Do not update native H3 proof docs from this work unless the native H3 gap set is actually closed.

### Compression Sleeps

These are likely safe after confirming the helper returns after listener bind/readiness:

- `tests/compression.rs:94`
- `tests/compression.rs:122`
- `tests/compression.rs:148`
- `tests/compression.rs:174`
- `tests/compression.rs:200`
- `tests/compression.rs:224`

Implementation rule:

- Delete only after the gzip/deflate/brotli/zstd/identity/raw-byte tests pass repeatedly.
- If a race appears, signal server readiness from `start_encoding_server`, not a fixed delay.

### Blocking Cache Sleep — Closed

This was P1 because it was a real wall-clock wait, but it proved cache expiry behavior:

- `tests/rfc9111_caching.rs:83`

Closed implementation:

- Removed the wall-clock sleep from `tests/rfc9111_caching.rs`.
- Updated `HttpCache` to retain `max-age=0` responses when they include `ETag` or `Last-Modified`, so they are stale immediately and return `CacheStatus::Revalidate`.

### H2 Frame Timeout Guards

These are risky to lower in one sweep because they may convert slow CI into flakes:

- `tests/validation_h2_streaming.rs:51`
- `tests/validation_h2_streaming.rs:180`
- `tests/validation_h2_streaming.rs:307`
- `tests/validation_h2_streaming.rs:430`
- `tests/validation_h2_streaming.rs:531`
- `tests/validation_h2_streaming.rs:617`
- `tests/validation_h2_streaming.rs:756`
- `tests/validation_h2_streaming.rs:865`
- `tests/validation_h2_streaming.rs:1101`
- `tests/validation_h2_streaming.rs:1207`
- `tests/validation_h2_streaming.rs:1351`
- `tests/validation_h2_streaming.rs:1503`
- `tests/validation_h2_streaming.rs:1620`
- `tests/validation_h2_streaming.rs:1751`
- `tests/validation_h2_streaming.rs:1851`
- `tests/validation_h2_streaming.rs:1964`
- `tests/validation_h2_streaming.rs:2125`
- `tests/validation_h2_streaming.rs:2245`
- `tests/validation_h2_streaming.rs:2370`
- `tests/validation_h2_streaming.rs:2537`
- `tests/validation_h2_streaming.rs:2692`
- `tests/validation_h2_streaming.rs:3022`

Implementation rule:

- Prefer a shared timeout helper or outer request/test deadline over blanket 500ms frame deadlines.
- Keep frame-level guards only where the test needs a precise protocol-step failure.
- Make timeout values env-tunable if CI variability remains high.

### Timeout Budget Guard

Current guardrails:

- `tests/timeout_budget.rs:14` sets `MAX_TIMEOUT_SECS = 15`.
- `tests/timeout_budget.rs:15` sets `MAX_SLEEP_SECS = 1`.

Implementation rule:

- Tighten only after the sleep removals and timeout-helper work land.
- Lowering this first will create noisy policy failures before the suite has been cleaned.

## Nextest And Selective Testing Plan

### Implemented State

- Nextest config includes `h3-stateful` and `streaming-heavy` test groups in `.config/nextest.toml`.
- Default parallelism remains `num-cpus`; CI uses `test-threads = 4`.
- CI invokes `cargo nextest run --all-features --profile ci --locked`.
- `just test-changed` now provides a conservative changed-file selector; manual exact filters remain useful for focused debugging.

### Design Guidance

- Use nextest `binary()` selectors for integration-test binaries, not unit-test-style `test(/^tests::.../)` filters.
- Use exact binary filters like `binary(=error_handling)` for changed `tests/error_handling.rs`.
- Use prefix binary filters for families only after validating syntax with `cargo nextest list -E`.
- Use `test-group` with `max-threads = 1` for mutual exclusion.
- Use `threads-required` only for tests that need more execution slots, not for exclusivity.
- Validate every new nextest filter with `cargo nextest list --all-features -E '<filter>'` before landing.

### `just test-changed` Requirements

- Print changed files and the selected command before running.
- Compute a safe merge base instead of assuming `main...HEAD`.
- For changed `tests/*.rs`, run the matching integration binary with an exact `binary(=stem)` filter.
- Fall back to the full suite for:
  - `src/**`
  - `Cargo.toml`
  - `Cargo.lock`
  - `tests/helpers/**`
  - `src/lib.rs`
  - `.config/nextest.toml`
  - shared scripts or unknown paths
- Treat `just test-changed` as inner-loop acceleration only.

## CI And Build Plan

### Implemented State

- The macOS test job keeps `CARGO_INCREMENTAL=0`, sccache, Rust cache, and BoringSSL cache coverage.
- Linux and Windows build jobs now use sccache, Rust cache, and target-specific BoringSSL cache coverage.
- Node release and Python release cargo-heavy jobs now use sccache/Rust cache; wheel/develop cargo invocations use `--locked` where supported.
- BoringSSL install steps remain the source of truth and checksum verification remains intact.

### Design Guidance

- Add Rust cache/sccache only where it is missing and useful; do not duplicate or fight the existing macOS test-job cache.
- Add target-specific `lib/boringssl` cache keys if BoringSSL download/install time is material.
- Preserve `scripts/install-boringssl-prebuilt.sh` as the release workflow source of truth.
- Keep checksum verification intact.
- Add `--locked` to workflow cargo commands where supported.
- Split lint/test/examples only after the cache changes are stable.
- Add nextest archive/sharding only after baseline and cache measurements prove it is worth the extra workflow complexity.

## Phase Plan

### Phase 0 — Baseline

Goal: measure current runtime and capture an artifact trail before changing behavior.

Scope:

- No tracked file edits.
- Write local logs under `target/test-optimization/baseline/`.

Commands:

```bash
mkdir -p target/test-optimization/baseline
git rev-parse HEAD | tee target/test-optimization/baseline/commit.txt
git status --short | tee target/test-optimization/baseline/status.txt
rustc --version | tee target/test-optimization/baseline/rustc.txt
cargo --version | tee target/test-optimization/baseline/cargo.txt
cargo nextest --version | tee target/test-optimization/baseline/nextest.txt
cargo nextest list --all-features | tee target/test-optimization/baseline/nextest-list.txt
/usr/bin/time -l just test 2>&1 | tee target/test-optimization/baseline/just-test.log
```

Stop conditions:

- The working tree has unrelated edits in a planned write scope.
- Another worker owns the same file cluster.
- Baseline cannot run because of a repo-wide compile failure unrelated to this plan.

### Phase 1 — Fast Local Test Wins

Goal: remove avoidable fixed waits without changing product behavior.

Owned files:

- `tests/error_handling.rs`
- `tests/streaming_public_api.rs`
- `tests/h1_streaming.rs`
- `tests/h1_rfc_compliance.rs`
- `tests/h1_pooling.rs`
- `tests/compression.rs`
- Optionally `tests/timeout_budget.rs` after cleanup lands.

Work:

- Replace 5-second hold sleeps with `oneshot` parking.
- Remove startup sleeps only where readiness is proven.
- Remove compression sleeps after proving server readiness.
- Defer H3 settle sleeps and H2 blanket timeout reductions to later phases.

Validation:

```bash
cargo nextest run --all-features -E 'binary(=error_handling) | binary(=streaming_public_api) | binary(=h1_streaming) | binary(=h1_rfc_compliance) | binary(=h1_pooling) | binary(=compression)'
```

Final gate:

- Repeat targeted tests enough times to catch timing flakes.
- Run broader test coverage if shared helpers or `tests/timeout_budget.rs` changed.

### Phase 2 — Nextest Concurrency Controls

Goal: improve worker behavior with low-risk config changes.

Owned files:

- `.config/nextest.toml`

Work:

- Add conservative test groups and profile tuning.
- Cap CI concurrency if CI shows CPU/port contention.
- Set CI `fail-fast = false` only if failure reporting needs full visibility.
- Add overrides only after validating each filter with `cargo nextest list -E`.

Validation:

```bash
cargo nextest list --all-features
cargo nextest run --all-features --profile ci
```

Stop conditions:

- Runtime increases on normal local execution.
- Filters do not match intended binaries.
- Retries hide flakes rather than surfacing them.

### Phase 3 — Selective Test Helper

Goal: provide a safe inner-loop shortcut.

Owned files:

- `justfile`
- Optional helper script under `scripts/` if the shell logic becomes too large.

Work:

- Add `just test-changed`.
- Map changed `tests/*.rs` files to exact nextest binary filters.
- Fall back to full suite for shared infrastructure and ambiguous changes.
- Print selected command before running.

Validation:

```bash
just test-changed main
cargo nextest list --all-features -E 'binary(=error_handling)'
```

Stop conditions:

- The helper skips relevant tests for source changes.
- It fails when the base branch is missing.
- It encourages replacing final full-surface validation.

### Phase 4 — CI Cache And Build Reuse

Goal: reduce CI wall time without changing tests.

Owned files:

- `.github/workflows/ci.yml`
- `.github/workflows/node-release.yml`
- `.github/workflows/python-release.yml`

Work:

- Add sccache and Rust cache to cargo-heavy jobs that lack them.
- Add target-specific BoringSSL cache if install/download time is material.
- Add `--locked` to supported cargo commands.
- Preserve release workflow BoringSSL install and SHA256 verification.

Validation:

- Workflow syntax review.
- Cold-cache and warm-cache GitHub Actions duration comparison.
- Release workflows still build expected Node/Python artifacts.

Stop conditions:

- Cache restore masks missing BoringSSL install steps.
- Wrong-target BoringSSL artifacts can be reused.
- Release prebuilt checksum verification is weakened.

### Phase 5 — CI Sharding And Job Split

Goal: scale test execution after cache behavior is stable.

Owned files:

- `.github/workflows/ci.yml`

Work:

- Split lint/test/examples where useful.
- Compile nextest archive once.
- Run sharded nextest partitions from the archive.
- Preserve complete failure output.

Validation:

```bash
cargo nextest archive --all-features --profile ci --archive-file target/test-optimization/phase5/tests.tar.zst
cargo nextest run --archive-file target/test-optimization/phase5/tests.tar.zst --extract-to target/test-optimization/phase5/archive-extract-1 --partition count:1/2 --profile ci
cargo nextest run --archive-file target/test-optimization/phase5/tests.tar.zst --extract-to target/test-optimization/phase5/archive-extract-2 --partition count:2/2 --profile ci
```

Stop conditions:

- Shards recompile instead of consuming the archive.
- Shards omit tests or duplicate unexpected tests.
- Failure reporting becomes harder than the current workflow.

### Phase 6 — Fast Compile Profile

Goal: improve local compile/test iteration after selection and nextest profiles exist.

Owned files:

- `Cargo.toml`
- Optional `justfile` recipe if needed.

Work:

- Benchmark whether a separate `fast-test` profile still adds value on top of current `profile.dev` and `profile.test` tuning.
- If useful, add it as inner-loop only.
- Do not use it for release, benchmark, or superiority claims.

Validation:

```bash
cargo nextest run --all-features --cargo-profile fast-test
cargo nextest run --all-features
```

Stop conditions:

- The profile changes release or benchmark behavior.
- Tests behave differently between `fast-test` and normal profiles.
- Speedup is too small to justify another profile.

### Phase 7 — H2/H3 Deep Timing Cleanup

Goal: remove riskier protocol-test waits after lower-risk cleanup has landed.

Owned files:

- `tests/validation_h2_streaming.rs`
- `tests/h3_streaming_pool.rs`
- `tests/h3_streaming_correctness.rs`
- `tests/rfc9111_caching.rs`

Work:

- Centralize or outer-scope H2 frame-read timeouts.
- Replace H3 settle sleeps with explicit state signals.
- Replace cache wall-clock expiry with mock clock or injectable TTL if practical.

Validation:

```bash
cargo nextest run --all-features -E 'binary(=validation_h2_streaming) | binary(=h3_streaming_pool) | binary(=h3_streaming_correctness) | binary(=rfc9111_caching)'
cargo test --test validation_h2_streaming -- --nocapture
cargo check --benches
```

Stop conditions:

- H3 fixes require product-code changes while native H3 work is active.
- A timeout change creates CI-only flakes.
- Cache semantics are weakened.

### Phase 8 — Shared Conventions Update

Goal: update agent/contributor guidance only after commands and behavior are real.

Owned files:

- `AGENTS.md`

Work:

- Add test/build conventions for concurrent workers.
- Preserve existing README benchmark and temporary native H3 artifact instructions.
- State that `just test-changed` is inner-loop only.
- Add “no fixed sleeps for synchronization” guidance.

Suggested wording:

```markdown
## Test & Build Conventions for Concurrent Workers

- Prefer `just test-changed` for local inner-loop validation when it exists and when shared infrastructure did not change.
- Use targeted `cargo nextest run` filters for changed integration-test files before broader validation.
- Do not add fixed sleeps to tests for synchronization; use `oneshot`, `Notify`, `watch`, readiness probes, or explicit protocol events.
- Bind local test servers to `127.0.0.1:0`; do not introduce fixed ports.
- Use per-test temporary directories for artifacts unless a shared fixture is protected by `OnceLock` or equivalent.
- Treat `justfile`, nextest config, Cargo profiles, and CI workflows as shared coordination files.
- Selective tests are not final merge proof; run validation matching every touched surface before handing off.
```

Stop conditions:

- Commands documented do not exist yet.
- Wording conflicts with benchmark artifact or native H3 artifact instructions.
- Wording could cause agents to skip final validation.

## Ticket Backlog

| ID | Priority | Axis | Scope | Files | Status | Validation |
| --- | --- | --- | --- | --- | --- | --- |
| T1 | P0 | waits | Replace 5-second connection holds | `tests/error_handling.rs`, `tests/streaming_public_api.rs`, `tests/h1_streaming.rs` | closed | targeted suite passed |
| T2 | P0 | waits | Remove proven H1 startup sleeps | `tests/h1_rfc_compliance.rs`, `tests/h1_pooling.rs`, `tests/error_handling.rs` | closed | targeted suite passed |
| T3 | P1 | waits | Remove compression sleeps | `tests/compression.rs` | closed | targeted suite passed |
| T4 | P1 | waits | Replace cache wall-clock sleep | `tests/rfc9111_caching.rs`, `src/cache.rs` | closed | `binary(=rfc9111_caching)` passed |
| T5 | P1 | waits | Centralize H2 streaming timeouts | `tests/validation_h2_streaming.rs` | deferred | not implemented |
| T6 | P1 | waits | Replace H3 settle sleeps | `tests/h3_streaming_pool.rs`, `tests/h3_streaming_correctness.rs` | deferred | not implemented |
| T7 | P0 | nextest | Add groups/profile tuning | `.config/nextest.toml` | closed | filter/list validation and targeted suite passed |
| T8 | P0 | selective | Add `just test-changed` | `justfile` | closed | `just --list`, filter validation |
| T9 | P0 | CI | Add missing Rust cache/sccache | `.github/workflows/*.yml` | closed | YAML parse passed |
| T10 | P1 | CI | Cache BoringSSL prebuilts safely | `.github/workflows/*.yml` | closed | YAML parse passed |
| T11 | P1 | CI | Split lint/test/examples | `.github/workflows/ci.yml` | deferred | not implemented |
| T12 | P1 | CI | Add nextest archive sharding | `.github/workflows/ci.yml` | deferred | not implemented |
| T13 | P1 | build | Evaluate/add `fast-test` profile | `Cargo.toml` | closed | `--cargo-profile fast-test` smoke passed |
| T14 | P2 | docs | Add AGENTS conventions | `AGENTS.md` | closed | reviewed against actual commands |

## Coordination Rules

- Claim a ticket before editing.
- One owner per file cluster.
- Check `git status --short` before editing and stop on unrelated edits in your target files.
- Do not revert or overwrite another worker’s changes.
- Keep tickets narrow; do not combine CI/cache work with test-behavior changes.
- Record exact validation commands and pass/fail evidence in the ticket row or handoff.
- Prefer append-only coordination notes over rewriting another worker’s status.
- If removing a wait reveals a race, mark the ticket blocked with a repro; do not replace it with a shorter fixed delay.

## Measurement Artifacts

Use untracked directories for local proof:

```text
target/test-optimization/baseline/
target/test-optimization/phase1/
target/test-optimization/phase2/
target/test-optimization/phase3/
target/test-optimization/phase4/
target/test-optimization/final/
```

Capture:

- `commit.txt`
- `status.txt`
- `environment.txt`
- `nextest-list.txt`
- targeted command logs
- full-suite command logs
- CI job duration summaries
- cache hit/miss evidence
- `summary.md`
- `summary.json`

Only promote results into `docs/benchmarks/<YYYY-MM-DD>-test-build-optimization/` if the run is reproducible enough to become a durable artifact.

## Flake Gate

Before declaring timing-sensitive changes stable:

```bash
mkdir -p target/test-optimization/flake

for i in 1 2 3 4 5; do
  /usr/bin/time -l cargo nextest run --all-features --profile ci \
    2>&1 | tee "target/test-optimization/flake/full-ci-repeat-${i}.log"
done

for i in 1 2 3 4 5 6 7 8 9 10; do
  /usr/bin/time -l cargo nextest run --all-features \
    -E 'binary(=error_handling) | binary(=h1_rfc_compliance) | binary(=h1_pooling) | binary(=validation_h2_streaming) | binary(=h3_streaming_pool) | binary(=h3_streaming_correctness) | binary(=rfc9111_caching) | binary(=compression)' \
    2>&1 | tee "target/test-optimization/flake/targeted-repeat-${i}.log"
done
```

Acceptance:

- Zero failures across repeated targeted runs for edited sleep/timeout/network tests.
- No retry-only passes accepted as clean proof.
- Failures under high parallelism must be triaged as contention vs logic.
- Full-suite failures must be compared against targeted logs for shared filesystem, dynamic port, or runtime starvation causes.

## Final Validation Matrix

| Touched Surface | Inner Loop | Final Validation |
| --- | --- | --- |
| Individual `tests/*.rs` files | matching nextest binary | all touched binaries, repeated if timing-sensitive |
| Shared test helpers | nearby binaries | full `just test` or equivalent |
| `.config/nextest.toml` | `cargo nextest list` and representative filters | full default and CI-profile nextest runs |
| `justfile` test recipes | recipe scenario tests | recipe plus full touched-surface validation |
| `Cargo.toml` profiles | fast-profile touched binaries | normal-profile touched-surface tests |
| CI workflows | syntax/command review | full relevant GitHub Actions workflow |
| README benchmark table | none | fresh repeated benchmark artifacts and `CHANGELOG.md` cause |
| Native H3 tests | H3-specific binaries | H3 plus affected transport suites |

## Decision Log

- `just test-changed` is useful, but it is not final validation.
- Nextest filters in implementation examples must be validated locally; a syntax check attempted during planning triggered compilation and was stopped because another artifact lock was active.
- `binary()`-based filters are preferred for this repo’s integration-test layout.
- `threads-required` is not a mutual-exclusion mechanism; use `test-group` for exclusive resources.
- H3 settle sleep cleanup is deferred behind lower-risk H1/H2 and config work.
- A separate `fast-test` profile must be benchmarked before adoption because `Cargo.toml` already tunes `profile.dev` and `profile.test`.