# dev-flaky — Project Specification (REPS)
> Rust Engineering Project Specification.
> Normative language follows RFC 2119.
## 1. Purpose
`dev-flaky` MUST run a project's test suite repeatedly and emit
per-test reliability as `dev-report::Report`. Output MUST be
machine-readable so AI agents and CI gates can quarantine flaky tests
automatically.
## 2. Scope
This crate MUST provide:
- A `FlakyRun` builder with `iterations`, `in_dir`, `workspace`,
`features`, `test_filter`, `allow`, `allow_all`,
`reliability_threshold`, plus `subject` / `subject_version`
accessors and `execute`.
- A `TestReliability` struct with `reliability()`,
`reliability_pct()`, `is_stable()`, `is_flaky()`, `is_broken()`,
and `classification(threshold)`.
- A `Classification` enum (`Stable`, `Flaky`, `Broken`) with
`severity()` and `label()` methods.
- A `FlakyResult` with `stable_count`, `flaky_count`,
`broken_count`, `total_count`, and `into_report` integration.
- A `FlakyProducer` adapter implementing `dev_report::Producer`.
- `cargo test` repeated-run orchestration with libtest output
parsing.
This crate MAY provide later:
- Test isolation (one test per invocation) for stricter classification
in the presence of test-to-test contamination.
- Historical trend analysis (compare reliability across runs).
- Per-test minimum iteration counts for confidence intervals.
This crate MUST NOT:
- Modify test code.
- Quarantine tests automatically. We report; the user decides.
- Run tests through a custom harness. We use `cargo test`.
## 3. Iteration count
Minimum iterations is `2`. Below `2`, the distinction between stable
and flaky is meaningless. The `iterations(n)` setter MUST clamp to a
minimum of `2`.
## 4. Classification policy
| `> 0` | `0` | Stable |
| `> 0` | `> 0` | Flaky |
| `0` | `> 0` | Broken |
| `0` | `0` | (not possible) |
Verdict mapping in the emitted `CheckResult`:
| Stable | Pass | (none) |
| Flaky | Warn | Warning |
| Broken | Fail | Error |
`FlakyRun::reliability_threshold(pct)` MAY demote a `Stable`
classification to `Flaky` when the reliability percentage drops below
the threshold. It MUST NOT promote `Broken` to anything else.
## 5. Determinism
Flakiness is inherently non-deterministic. We do not claim two runs
of `dev-flaky` produce the same `FlakyResult`.
However:
- Given a fixed `FlakyResult`, `into_report()` MUST produce a
deterministic `Report` (modulo the auto-stamped timestamps).
- Per-test classification rules MUST be a pure function of `passes`,
`failures`, and the optional `threshold_pct`.
- The `tests` vector inside a `FlakyResult` MUST be sorted by
`name` for diff-friendly output.
## 6. Tool dependency
`cargo` MUST be on PATH. The crate detects its absence and emits
`FlakyError::ToolNotInstalled`. Subprocess failures with no
recoverable output across all iterations MUST surface as
`FlakyError::SubprocessFailed(stderr)`. Per-iteration subprocess
failures (e.g. a transient compile error in iteration 3 out of 20)
MUST NOT abort the run — they MUST be recorded as iteration noise
and the run MUST continue.
`cargo test` returns non-zero when any test fails. The crate MUST
parse stdout regardless of exit code: a test failure IS the success
path for `dev-flaky`.
## 7. Producer contract
`FlakyProducer::produce()` MUST always return a `Report`. It MUST
NOT panic on subprocess failure; instead, it MUST emit a single
`CheckResult::fail("flaky::scan", Severity::Critical)` carrying the
error message in `detail` and the tags `flaky` + `subprocess`.
## 8. Target-dir-lock note
Running `FlakyRun::execute()` from inside another `cargo` invocation
that already holds the workspace target-dir lock will deadlock. The
crate's `FlakyProducer` test that exercises the full pipeline is
`#[ignore]`d for this reason. Documentation MUST surface the
`CARGO_TARGET_DIR=/tmp/...` workaround for contributors who want to
exercise the real subprocess path.
## 9. Stability
Through `0.9.x` the public API MAY shift. The `1.0` release pins the
API and the classification policy.