What it does
dev-flaky runs your test suite many times and tracks each test's
pass / fail history. Stable tests pass every iteration. Flaky tests
fail sometimes for no apparent reason. Broken tests fail every time.
After the run, every test gets a reliability score in [0.0, 1.0]
and a classification (Stable / Flaky / Broken), emitted as a
dev-report::Report.
Why flaky tests matter
Flaky tests are corrosive. After enough false alarms, developers start ignoring CI failures, and real failures get missed. Detecting flakiness automatically lets you quarantine the worst offenders before they erode trust in the suite.
Quick start
[]
= "0.9"
use FlakyRun;
let run = new.iterations;
let result = run.execute?;
println!;
let report = result.into_report;
println!;
# Ok::
Builder surface
| Method | What it does |
|---|---|
iterations(n) |
How many times to run cargo test. Clamped to ≥ 2. |
in_dir(path) |
Working dir for cargo test. Default: CWD. |
workspace() |
Pass --workspace. |
features(list) |
Pass --features <list>. |
test_filter(substring) |
Pass the libtest positional filter (cargo test <filter>). |
allow(name) / allow_all(iter) |
Suppress known-flaky tests by full test path. |
reliability_threshold(pct) |
Demote Stable to Flaky below this reliability percentage. |
Classification
| Pass count | Fail count | Classification | Verdict | Severity |
|---|---|---|---|---|
> 0 |
0 |
Stable | Pass | (none) |
> 0 |
> 0 |
Flaky | Warn | Warning |
0 |
> 0 |
Broken | Fail | Error |
Each finding emits a CheckResult named flaky::<test> tagged
flaky + the classification label (stable / flaky / broken),
with numeric evidence for reliability_pct, passes, and failures.
Allow-list
use FlakyRun;
let run = new
.iterations
.allow
.allow_all;
Matches the full test path emitted by libtest.
Reliability threshold
use FlakyRun;
let run = new
.iterations
.reliability_threshold;
With threshold 99.0, a test that passes 99/100 iterations is still
flagged as Flaky even though it had zero failures within that run's
classification rule. Useful when you want to surface tests that are
nearly always stable but occasionally produce intermittent partial
failures (when extended with sub-iteration counters in a future
release).
Producer integration
FlakyProducer plugs the run into a multi-producer pipeline driven
by dev-tools:
use ;
use Producer;
let producer = new;
let report = producer.produce;
println!;
Subprocess failures map to a single failing CheckResult named
flaky::scan with Severity::Critical.
Target-dir-lock note
Running FlakyRun::execute() from inside another cargo invocation
that already holds the workspace target-dir lock will deadlock. This
is a property of cargo, not of dev-flaky. When running examples or
the #[ignore]d integration test that exercises the real subprocess
pipeline:
CARGO_TARGET_DIR=/tmp/flaky-target
CARGO_TARGET_DIR=/tmp/flaky-target
Wire format
FlakyResult, TestReliability, Classification are all
serde-derived. JSON uses snake_case field names and lowercase
enum variants:
Examples
| File | What it shows |
|---|---|
examples/basic.rs |
Default 5-iteration run; graceful tool-missing handling. |
examples/iterations_high.rs |
50 iterations + test filter for low-rate flakiness. |
examples/threshold.rs |
reliability_threshold + allow-list. |
examples/producer.rs |
FlakyProducer (gated by DEV_FLAKY_EXAMPLE_RUN). |
The dev-* collection
dev-flaky ships independently and is also re-exported by the
dev-tools umbrella crate as
the flaky feature. Sister crates cover the other verification
dimensions:
dev-report— report schema everything emitsdev-fixtures— deterministic test fixturesdev-bench— performance and regression detectiondev-async— async runtime verificationdev-stress— stress and soak workloadsdev-chaos— fault injection and recovery testingdev-coverage— code coverage with regression gatesdev-security— CVE / license / banned-crate auditdev-deps— unused / outdated dep detectiondev-ci— GitHub Actions workflow generatordev-fuzz— fuzz testing workflowdev-mutate— mutation testing
Status
v0.9.x is the pre-1.0 stabilization line. Feature-complete for
repeated-run flakiness detection, classification, threshold, allow-
list, and producer integration. 1.0 will pin the public API and
the classification policy.
Minimum supported Rust version
1.85 — pinned in Cargo.toml via rust-version and verified by
the MSRV job in CI.
License
Apache-2.0. See LICENSE.