dev-flaky 0.9.0

Flaky-test detection for Rust. Repeated-run reliability tracking with per-test confidence scoring. Part of the dev-* verification suite.
Documentation

What it does

dev-flaky runs your test suite many times and tracks each test's pass / fail history. Stable tests pass every iteration. Flaky tests fail sometimes for no apparent reason. Broken tests fail every time.

After the run, every test gets a reliability score in [0.0, 1.0] and a classification (Stable / Flaky / Broken), emitted as a dev-report::Report.

Why flaky tests matter

Flaky tests are corrosive. After enough false alarms, developers start ignoring CI failures, and real failures get missed. Detecting flakiness automatically lets you quarantine the worst offenders before they erode trust in the suite.

Quick start

[dependencies]
dev-flaky = "0.9"
use dev_flaky::FlakyRun;

let run = FlakyRun::new("my-crate", "0.1.0").iterations(20);
let result = run.execute()?;
println!("flaky: {}, broken: {}", result.flaky_count(), result.broken_count());
let report = result.into_report();
println!("{}", report.to_json()?);
# Ok::<(), Box<dyn std::error::Error>>(())

Builder surface

Method What it does
iterations(n) How many times to run cargo test. Clamped to ≥ 2.
in_dir(path) Working dir for cargo test. Default: CWD.
workspace() Pass --workspace.
features(list) Pass --features <list>.
test_filter(substring) Pass the libtest positional filter (cargo test <filter>).
allow(name) / allow_all(iter) Suppress known-flaky tests by full test path.
reliability_threshold(pct) Demote Stable to Flaky below this reliability percentage.

Classification

Pass count Fail count Classification Verdict Severity
> 0 0 Stable Pass (none)
> 0 > 0 Flaky Warn Warning
0 > 0 Broken Fail Error

Each finding emits a CheckResult named flaky::<test> tagged flaky + the classification label (stable / flaky / broken), with numeric evidence for reliability_pct, passes, and failures.

Allow-list

use dev_flaky::FlakyRun;

let run = FlakyRun::new("my-crate", "0.1.0")
    .iterations(20)
    .allow("known_flaky::flaky_under_load")
    .allow_all(["integration::slow_test", "net::flaky_endpoint"]);

Matches the full test path emitted by libtest.

Reliability threshold

use dev_flaky::FlakyRun;

let run = FlakyRun::new("my-crate", "0.1.0")
    .iterations(100)
    .reliability_threshold(99.0);

With threshold 99.0, a test that passes 99/100 iterations is still flagged as Flaky even though it had zero failures within that run's classification rule. Useful when you want to surface tests that are nearly always stable but occasionally produce intermittent partial failures (when extended with sub-iteration counters in a future release).

Producer integration

FlakyProducer plugs the run into a multi-producer pipeline driven by dev-tools:

use dev_flaky::{FlakyProducer, FlakyRun};
use dev_report::Producer;

let producer = FlakyProducer::new(FlakyRun::new("my-crate", "0.1.0").iterations(20));
let report = producer.produce();
println!("{}", report.to_json().unwrap());

Subprocess failures map to a single failing CheckResult named flaky::scan with Severity::Critical.

Target-dir-lock note

Running FlakyRun::execute() from inside another cargo invocation that already holds the workspace target-dir lock will deadlock. This is a property of cargo, not of dev-flaky. When running examples or the #[ignore]d integration test that exercises the real subprocess pipeline:

CARGO_TARGET_DIR=/tmp/flaky-target cargo run --example basic
CARGO_TARGET_DIR=/tmp/flaky-target cargo test -- --ignored

Wire format

FlakyResult, TestReliability, Classification are all serde-derived. JSON uses snake_case field names and lowercase enum variants:

{
  "name": "my-crate",
  "version": "0.1.0",
  "iterations": 20,
  "tests": [
    { "name": "integration::flaky_one", "passes": 17, "failures": 3 }
  ]
}

Examples

File What it shows
examples/basic.rs Default 5-iteration run; graceful tool-missing handling.
examples/iterations_high.rs 50 iterations + test filter for low-rate flakiness.
examples/threshold.rs reliability_threshold + allow-list.
examples/producer.rs FlakyProducer (gated by DEV_FLAKY_EXAMPLE_RUN).

The dev-* suite

See dev-tools for the umbrella crate covering the full suite.

Status

v0.9.x is the pre-1.0 stabilization line. Feature-complete for repeated-run flakiness detection, classification, threshold, allow- list, and producer integration. 1.0 will pin the public API and the classification policy.

Minimum supported Rust version

1.85 — pinned in Cargo.toml via rust-version and verified by the MSRV job in CI.

License

Apache-2.0. See LICENSE.