shuttle 0.9.0 - Docs.rs

Shuttle is a library for testing concurrent Rust code, heavily inspired by Loom.

Shuttle focuses on randomized testing, rather than the exhaustive testing that Loom offers. This is a soundness—scalability trade-off: Shuttle is not sound (a passing Shuttle test does not prove the code is correct), but it scales to much larger test cases than Loom. Empirically, randomized testing is successful at finding most concurrency bugs, which tend not to be adversarial.

Testing concurrent code

Consider this simple piece of concurrent code:

use std::sync::{Arc, Mutex};
use std::thread;

let lock = Arc::new(Mutex::new(0u64));
let lock2 = lock.clone();

thread::spawn(move || {
    *lock.lock().unwrap() = 1;
});

assert_eq!(0, *lock2.lock().unwrap());

There is an obvious race condition here: if the spawned thread runs before the assertion, the assertion will fail. But writing a unit test that finds this execution is tricky. We could run the test many times and try to "get lucky" by finding a failing execution, but that's not a very reliable testing approach. Even if the test does fail, it will be difficult to debug: we won't be able to easily catch the failure in a debugger, and every time we make a change, we will need to run the test many times to decide whether we fixed the issue.

Randomly testing concurrent code with Shuttle

Shuttle avoids this issue by controlling the scheduling of each thread in the program, and scheduling those threads randomly. By controlling the scheduling, Shuttle allows us to reproduce failing tests deterministically. By using random scheduling, with appropriate heuristics, Shuttle can still catch most (non-adversarial) concurrency bugs even though it is not an exhaustive checker.

A Shuttle version of the above test just wraps the test body in a call to Shuttle's [check_random] function, and replaces the concurrency-related imports from std with imports from shuttle:

use shuttle::sync::{Arc, Mutex};
use shuttle::thread;

shuttle::check_random(|| {
    let lock = Arc::new(Mutex::new(0u64));
    let lock2 = lock.clone();

    thread::spawn(move || {
        *lock.lock().unwrap() = 1;
    });

    assert_eq!(0, *lock2.lock().unwrap());
}, 100);

This test detects the assertion failure with extremely high probability (over 99.9999%).

Testing non-deterministic code

Shuttle supports testing code that uses data non-determinism (random number generation). For example, this test uses the rand crate to generate a random number:

use rand::{thread_rng, Rng};

let x = thread_rng().gen::<u64>();
assert_eq!(x % 10, 7);

Shuttle provides its own implementation of [rand] that is a drop-in replacement:

use shuttle::rand::{thread_rng, Rng};

shuttle::check_random(|| {
    let x = thread_rng().gen::<u64>();
    assert_ne!(x % 10, 7);
}, 100);

This test will run the body 100 times, and fail if any of those executions fails; the test therefore fails with probability 1-(9/10)^100, or 99.997%. We can increase the 100 parameter to run more executions and increase the probability of finding the failure. Note that Shuttle isn't doing anything special to increase the probability of this test failing other than running the body multiple times.

When this test fails, Shuttle provides output that can be used to deterministically reproduce the failure:

test panicked in task "task-0" with schedule: "910102ccdedf9592aba2afd70104"
pass that schedule string into `shuttle::replay` to reproduce the failure

We can use Shuttle's [replay] function to replay the execution that causes the failure:

# // *** DON'T FORGET TO UPDATE THE TEXT OUTPUT RIGHT ABOVE THIS IF YOU CHANGE THIS TEST! ***
use shuttle::rand::{thread_rng, Rng};

shuttle::replay(|| {
    let x = thread_rng().gen::<u64>();
    assert_ne!(x % 10, 7);
}, "910102ccdedf9592aba2afd70104");

This runs the test only once, and is guaranteed to reproduce the failure.

Support for data non-determinism is most useful when combined with support for schedule non-determinism (i.e., concurrency). For example, an integration test might spawn several threads, and within each thread perform a random sequence of actions determined by thread_rng (this style of testing is often referred to as a "stress test"). By using Shuttle to implement the stress test, we can both increase the coverage of the test by exploring more thread interleavings and allow test failures to be deterministically reproducible for debugging.

Writing Shuttle tests

To test concurrent code with Shuttle, all uses of synchronization primitives from std must be replaced by their Shuttle equivalents. The simplest way to do this is via cfg flags. Specifically, if you enforce that all synchronization primitives are imported from a single sync module in your code, and implement that module like this:

#[cfg(all(feature = "shuttle", test))]
use shuttle::{sync::*, thread};
#[cfg(not(all(feature = "shuttle", test)))]
use std::{sync::*, thread};

Then a Shuttle test can be written like this:

# mod my_crate {}
#[cfg(feature = "shuttle")]
#[test]
fn concurrency_test_shuttle() {
    use my_crate::*;
    // ...
}

and be executed by running cargo test --features shuttle.

Choosing a scheduler and running a test

Shuttle tests need to choose a scheduler to use to direct the execution. The scheduler determines the order in which threads are scheduled. Different scheduling policies can increase the probability of detecting certain classes of bugs (e.g., race conditions), but at the cost of needing to test more executions.

Shuttle has a number of built-in schedulers, which implement the Scheduler trait. They are most easily accessed via convenience methods:

[check_random] runs a test using a random scheduler for a chosen number of executions.
[check_pct] runs a test using the Probabilistic Concurrency Testing (PCT) algorithm. PCT bounds the number of preemptions a test explores; empirically, most concurrency bugs can be detected with very few preemptions, and so PCT increases the probability of finding such bugs. The PCT scheduler can be configured with a "bug depth" (the number of preemptions) and a number of executions.
[check_dfs] runs a test with an exhaustive scheduler using depth-first search. Exhaustive testing is intractable for all but the very simplest programs, and so using this scheduler is not recommended, but it can be useful to thoroughly test small concurrency primitives. The DFS scheduler can be configured with a bound on the depth of schedules to explore.

When these convenience methods do not provide enough control, Shuttle provides a [Runner] object for executing a test. A runner is constructed from a chosen [scheduler], and then invoked with the [Runner::run] method. Shuttle also provides a [PortfolioRunner] object for running multiple schedulers, using parallelism to increase the number of test executions explored.