Crate shuttle[][src]

Expand description

Shuttle is a library for testing concurrent Rust code, heavily inspired by Loom.

Shuttle focuses on randomized testing, rather than the exhaustive testing that Loom offers. This is a soundness—scalability trade-off: Shuttle is not sound (a passing Shuttle test does not prove the code is correct), but it scales to much larger test cases than Loom. Empirically, randomized testing is successful at finding most concurrency bugs, which tend not to be adversarial.

Testing concurrent code

Consider this simple piece of concurrent code:

use std::sync::{Arc, Mutex};
use std::thread;

let lock = Arc::new(Mutex::new(0u64));
let lock2 = lock.clone();

thread::spawn(move || {
    *lock.lock().unwrap() = 1;
});

assert_eq!(0, *lock2.lock().unwrap());

There is an obvious race condition here: if the spawned thread runs before the assertion, the assertion will fail. But writing a unit test that finds this execution is tricky. We could run the test many times and try to “get lucky” by finding a failing execution, but that’s not a very reliable testing approach. Even if the test does fail, it will be difficult to debug: we won’t be able to easily catch the failure in a debugger, and every time we make a change, we will need to run the test many times to decide whether we fixed the issue.

Randomly testing concurrent code with Shuttle

Shuttle avoids this issue by controlling the scheduling of each thread in the program, and scheduling those threads randomly. By controlling the scheduling, Shuttle allows us to reproduce failing tests deterministically. By using random scheduling, with appropriate heuristics, Shuttle can still catch most (non-adversarial) concurrency bugs even though it is not an exhaustive checker.

A Shuttle version of the above test just wraps the test body in a call to Shuttle’s check_random function, and replaces the concurrency-related imports from std with imports from shuttle:

use shuttle::sync::{Arc, Mutex};
use shuttle::thread;

shuttle::check_random(|| {
    let lock = Arc::new(Mutex::new(0u64));
    let lock2 = lock.clone();

    thread::spawn(move || {
        *lock.lock().unwrap() = 1;
    });

    assert_eq!(0, *lock2.lock().unwrap());
}, 100);

This test detects the assertion failure with extremely high probability (over 99.9999%).

Testing non-deterministic code

Shuttle supports testing code that uses data non-determinism (random number generation). For example, this test uses the rand crate to generate a random number:

use rand::{thread_rng, Rng};

let x = thread_rng().gen::<u64>();
assert_eq!(x % 10, 7);

Shuttle provides its own implementation of rand that is a drop-in replacement:

use shuttle::rand::{thread_rng, Rng};

shuttle::check_random(|| {
    let x = thread_rng().gen::<u64>();
    assert_ne!(x % 10, 7);
}, 100);

This test will run the body 100 times, and fail if any of those executions fails; the test therefore fails with probability 1-(9/10)^100, or 99.997%. We can increase the 100 parameter to run more executions and increase the probability of finding the failure. Note that Shuttle isn’t doing anything special to increase the probability of this test failing other than running the body multiple times.

When this test fails, Shuttle provides output that can be used to deterministically reproduce the failure:

test panicked in task "task-0" with schedule: "910102ccdedf9592aba2afd70104"
pass that schedule string into `shuttle::replay` to reproduce the failure

We can use Shuttle’s replay function to replay the execution that causes the failure:

use shuttle::rand::{thread_rng, Rng};

shuttle::replay(|| {
    let x = thread_rng().gen::<u64>();
    assert_ne!(x % 10, 7);
}, "910102ccdedf9592aba2afd70104");

This runs the test only once, and is guaranteed to reproduce the failure.

Support for data non-determinism is most useful when combined with support for schedule non-determinism (i.e., concurrency). For example, an integration test might spawn several threads, and within each thread perform a random sequence of actions determined by thread_rng (this style of testing is often referred to as a “stress test”). By using Shuttle to implement the stress test, we can both increase the coverage of the test by exploring more thread interleavings and allow test failures to be deterministically reproducible for debugging.

Writing Shuttle tests

To test concurrent code with Shuttle, all uses of synchronization primitives from std must be replaced by their Shuttle equivalents. The simplest way to do this is via cfg flags. Specifically, if you enforce that all synchronization primitives are imported from a single sync module in your code, and implement that module like this:

#[cfg(all(feature = "shuttle", test))]
use shuttle::{sync::*, thread};
#[cfg(not(all(feature = "shuttle", test)))]
use std::{sync::*, thread};

Then a Shuttle test can be written like this:

#[cfg(feature = "shuttle")]
#[test]
fn concurrency_test_shuttle() {
    use my_crate::*;
    // ...
}

and be executed by running cargo test --features shuttle.

Choosing a scheduler and running a test

Shuttle tests need to choose a scheduler to use to direct the execution. The scheduler determines the order in which threads are scheduled. Different scheduling policies can increase the probability of detecting certain classes of bugs (e.g., race conditions), but at the cost of needing to test more executions.

Shuttle has a number of built-in schedulers, which implement the Scheduler trait. They are most easily accessed via convenience methods:

  • check_random runs a test using a random scheduler for a chosen number of executions.
  • check_pct runs a test using the Probabilistic Concurrency Testing (PCT) algorithm. PCT bounds the number of preemptions a test explores; empirically, most concurrency bugs can be detected with very few preemptions, and so PCT increases the probability of finding such bugs. The PCT scheduler can be configured with a “bug depth” (the number of preemptions) and a number of executions.
  • check_dfs runs a test with an exhaustive scheduler using depth-first search. Exhaustive testing is intractable for all but the very simplest programs, and so using this scheduler is not recommended, but it can be useful to thoroughly test small concurrency primitives. The DFS scheduler can be configured with a bound on the depth of schedules to explore.

When these convenience methods do not provide enough control, Shuttle provides a Runner object for executing a test. A runner is constructed from a chosen scheduler, and then invoked with the Runner::run method. Shuttle also provides a PortfolioRunner object for running multiple schedulers, using parallelism to increase the number of test executions explored.

Modules

asynch

Shuttle’s implementation of an async executor, roughly equivalent to futures::executor.

rand

Shuttle’s implementation of the rand crate, v0.7.

scheduler

Implementations of different scheduling strategies for concurrency testing.

sync

Shuttle’s implementation of std::sync.

thread

Shuttle’s implementation of std::thread.

Structs

Config

Configuration parameters for Shuttle

PortfolioRunner

A PortfolioRunner is the same as a Runner, except that it can run multiple different schedulers (a “portfolio” of schedulers) in parallel. If any of the schedulers finds a failing execution of the test, the entire run fails.

Runner

A Runner is the entry-point for testing concurrent code.

Enums

FailurePersistence

Specifies how to persist schedules when a Shuttle test fails

MaxSteps

Specifies an upper bound on the number of steps a single iteration of a Shuttle test can take, and how to react when the bound is reached.

Functions

check_dfs

Run the given function under a depth-first-search scheduler until all interleavings have been explored (but if the max_iterations bound is provided, stop after that many iterations).

check_pct

Run the given function under a PCT concurrency scheduler for some number of iterations at the given depth. Each iteration will run a (potentially) different randomized schedule.

check_random

Run the given function under a randomized concurrency scheduler for some number of iterations. Each iteration will run a (potentially) different randomized schedule.

replay

Run the given function according to a given encoded schedule, usually produced as the output of a failing Shuttle test case.

replay_from_file

Run the given function according to a schedule saved in the given file, usually produced as the output of a failing Shuttle test case.