Crate fenic

source ·
Expand description

Fenic is a tool for testing concurrent programs.

At a high level, it runs tests many times, permuting the possible concurrent executions of each test according to what constitutes valid executions under the C11 memory model. It then uses state reduction techniques to avoid combinatorial explosion of the number of possible executions.

Background

Testing concurrent programs is challenging; concurrent strands of execution can interleave in all sorts of ways, and each such interleaving might expose a concurrency bug in the program. Some bugs may be so rare that they only occur under a very small set of possible executions, and may not surface even if you run the code millions or billions of times.

Fenic provides a way to deterministically explore the various possible execution permutations without relying on random executions. This allows you to write tests that verify that your concurrent code is correct under all executions, not just “most of the time”.

Consider a simple example:

use std::sync::Arc;
use std::sync::atomic::AtomicUsize;
use std::sync::atomic::Ordering::SeqCst;
use std::thread;

#[test]
fn test_concurrent_logic() {
    let v1 = Arc::new(AtomicUsize::new(0));
    let v2 = v1.clone();

    thread::spawn(move || {
        v1.store(1, SeqCst);
    });

    assert_eq!(0, v2.load(SeqCst));
}

This program is incorrect: the main thread might yield between spawning the thread that stores to v1 and loading from v2, during which time the spawned thread may get to run and store 1 into v1. Most of the time, the main thread will get to the assertion before the spawned thread executes, so the assertion will succeed. But, every once in a while, the spawned thread manages to run just in time and the assertion will fail! This is obviously a contrived example, but in practice many concurrent programs exhibit similar behavior – they operate correctly under most executions, but some executions end up producing buggy behavior.

Historically, the strategy for testing concurrent code has been to run tests in loops and hope that an execution fails. Or to place the testing host under load while running the test suite in an attempt to produce less frequently exercised executions. However, this kind of testing is not reliable, and, in the event an iteration should fail, debugging the cause is exceedingly difficult.

The problem is compounded when other memory orderings than SeqCst are considered, where bugs may only occur on hardware with particular memory characteristics, and thus no amount of iteration will demonstrate the bug on different hardware!

Solution

Fenic fixes the problem by simulating the operating system’s scheduler and Rust’s memory model such that all possible valid behaviors are explored and tested. To see how this works out in practice, the above example can be rewritten to use fenic’s concurrency types as:

use fenic::sync::atomic::AtomicUsize;
use fenic::thread;

use std::sync::Arc;
use std::sync::atomic::Ordering::SeqCst;

#[test]
fn test_concurrent_logic() {
    fenic::model(|| {
        let v1 = Arc::new(AtomicUsize::new(0));
        let v2 = v1.clone();

        thread::spawn(move || {
            v1.store(1, SeqCst);
        });

        assert_eq!(0, v2.load(SeqCst));
    });
}

Fenic will run the closure provided to fenic::model many times over, and each time a different thread scheduling will be used. That is, one execution will have the spawned thread run after the load from v2, and another will have the spawned thread run before the store to v2. Thus, the test is guaranteed to fail.

Writing tests

Test cases using fenic must be fully deterministic. All sources of non-determism must be via fenic types so that fenic can expose different possible values on each execution of the test closure. Other sources of non-determinism like random number generation or system calls cannot be modeled directly by fenic, and must be mocked to be testable by fenic.

To model synchronization non-determinism, tests must use the fenic synchronization types, such as Atomic*, Mutex, RwLock, Condvar, as well as other concurrency primitives like thread::spawn, UnsafeCell, and lazy_static!. However, when not running fenic tests, the std should be used, since the fenic runtime won’t be active. This means that library code will need to use conditional compilation to decide which types to use.

It is recommended to use a fenic cfg flag to signal using the fenic types. You can do this by passing RUSTFLAGS="--cfg fenic" as part of the command when you want to run the fenic tests. Then modify your Cargo.toml to include fenic like this:

[target.'cfg(fenic)'.dependencies]
fenic = "0.7"

One common strategy to use the right types with and without fenic is to create a module in your crate named sync or any other name of your choosing. In this module, list out the types that need to be toggled between fenic and std:

#[cfg(fenic)]
pub(crate) use fenic::sync::atomic::AtomicUsize;

#[cfg(not(fenic))]
pub(crate) use std::sync::atomic::AtomicUsize;

Then, elsewhere in the library:

use crate::sync::AtomicUsize;

Handling Fenic API differences.

Most of fenic’s type are drop-in replacements for their counterpart in std, but sometimes there are minor API differences that you must work around. If your library must use Fenic APIs that differ from std types, then the library will be required to implement those APIs for std. For example, for UnsafeCell, in the library’s source, add the following:

#![cfg(not(fenic))]

#[derive(Debug)]
pub(crate) struct UnsafeCell<T>(std::cell::UnsafeCell<T>);

impl<T> UnsafeCell<T> {
    pub(crate) fn new(data: T) -> UnsafeCell<T> {
        UnsafeCell(std::cell::UnsafeCell::new(data))
    }

    pub(crate) fn with<R>(&self, f: impl FnOnce(*const T) -> R) -> R {
        f(self.0.get())
    }

    pub(crate) fn with_mut<R>(&self, f: impl FnOnce(*mut T) -> R) -> R {
        f(self.0.get())
    }
}

Yielding

Some concurrent algorithms assume a fair scheduler. For example, a spin lock assumes that, at some point, another thread will make enough progress for the lock to become available. This presents a challenge for fenic as its scheduler is, by design, not fair. It is specifically trying to emulate every possible execution, which may mean that another thread does not get to run for a very long time (see also Spinlocks Considered Harmful). In such cases, loops must include calls to fenic::thread::yield_now. This tells fenic that another thread needs to be scheduled in order for the current one to make progress.

Running Fenic Tests

Fenic tests must be run separately, with RUSTFLAGS="--cfg fenic" specified (assuming you went with the cfg approach suggested above). For example, if the library includes a test file: tests/fenic_my_struct.rs that includes tests with fenic::model, then run the following command:

RUSTFLAGS="--cfg fenic" cargo test --test fenic_my_struct --release

Note that you will generally want to run fenic tests with --release since fenic must execute each test closure a large number of times, at which point the speed win from optimized code makes a big difference.

Debugging Fenic Failures

Fenic’s deterministic execution allows the specific chain of events leading to a test failure can be isolated for debugging. When a fenic test fails, the first step is to isolate the exact execution path that resulted in the failure. To do this, Fenic is able to output the execution path to a file. Two environment variables are useful for this process:

  • LOOM_CHECKPOINT_FILE
  • LOOM_CHECKPOINT_INTERVAL

The first specifies the file to write to and read from. The second specifies how often to write to the file. If the execution fails on the 10,000,000th permutation, it is faster to write to a file every 10,000 iterations instead of every single one.

To isolate the exact failing path, first run the following command to generate the checkpoint for the failing scenario:

LOOM_CHECKPOINT_FILE=my_test.json [other env vars] \
    cargo test --test fenic_my_struct --release [failing test]

Then this to check that the next permutation indeed triggers the fault:

LOOM_CHECKPOINT_INTERVAL=1 LOOM_CHECKPOINT_FILE=my_test.json [other env vars] \
    cargo test --test fenic_my_struct --release [failing test]

The test should fail on the first permutation, effectively isolating the failure scenario.

The next step is to enable additional log output for just the failing permutation. Again, there are some environment variables for this:

  • LOOM_LOG
  • LOOM_LOCATION

The first environment variable, LOOM_LOG, outputs a marker on every thread switch. This helps with tracing the exact steps in a threaded environment that results in the test failure.

The second, LOOM_LOCATION, enables location tracking. This includes additional information in panic messages that helps identify which specific field resulted in the error.

Put together, the command becomes (yes, we know this is not great… but it works):

LOOM_LOG=trace \
    LOOM_LOCATION=1 \
    LOOM_CHECKPOINT_INTERVAL=1 \
    LOOM_CHECKPOINT_FILE=my_test.json \
    RUSTFLAGS="--cfg fenic" \
    [other env vars] \
    cargo test --test fenic_my_struct --release [failing test]

This should provide you with a trace of all the concurrency events leading up to the failure, which should allow you to identify how the bug is triggered.

Limitations and Caveats

Intrusive Implementation

Fenic works by intercepting all loads, stores, and other concurrency-sensitive operations (like spawning threads) that may trigger concurrency bugs in an applications. But this interception is not automatic – it requires that the code being tested specifically uses the fenic replacement types. Any code that does not use fenic’s replacement types is invisible to fenic, and thus won’t be subject to the fenic model’s permutation.

While it is relatively simple to utilize fenic’s types in a single crate through the root-level #[cfg(fenic)] mod sync approach suggested earlier, more complex use-cases may require the use of a library that itself uses concurrent constructs like locks and channels. In such cases, that library must also be augmented to support fenic to achieve complete execution coverage.

Note that fenic still works if some concurrent operations are hidden from it (for example, if you use std::sync::Arc instead of fenic::sync::Arc). It just means that fenic won’t be able to reason about the interaction between those operations and the other concurrent operations in your program, and thus certain executions that are possible in the real world won’t be modeled.

Large Models

By default, fenic runs an exhaustive check of your program’s possible concurrent executions where all possible interleavings are checked. Fenic’s state reduction algorithms (see “Implementation” below) significantly reduce the state space that must be explored, but complex models can still take significant time to complete.

To handle such large models in a more reasonable amount of time, you may need to not run an exhaustive check, and instead tell fenic to prune out interleavings that are unlikely to reveal additional bugs. You do this by providing fenic with a thread pre-emption bound. If you set such a bound, fenic will check all possible executions that include at most n thread pre-emptions (where one thread is forcibly stopped and another one runs in its place. In practice, setting the thread pre-emption bound to 2 or 3 is enough to catch most bugs while significantly reducing the number of possible executions.

To set the thread pre-emption bound, set the LOOM_MAX_PREEMPTIONS environment variable when running tests (or set Builder::preemption_bound). For example:

LOOM_MAX_PREEMPTIONS=3 RUSTFLAGS="--cfg fenic" cargo test --test fenic_my_struct --release

Relaxed Memory Ordering

The Relaxed memory ordering allows particularly strange executions. For example, in the following code snippet, it is completely legal for r1 == r2 == 42!

thread::spawn(move || {
  let r1 = y.load(Ordering::Relaxed); // A
  x.store(r1, Ordering::Relaxed);     // B
});
thread::spawn(move || {
  let r2 = x.load(Ordering::Relaxed); // C
  y.store(42, Ordering::Relaxed);     // D
});

Unfortunately, it is not possible for fenic to completely model all the interleavings that relaxed memory ordering allows. This is because the relaxed memory ordering allows memory operations to be re-ordered within a single thread – B can run before A – which fenic cannot emulate. The same restriction applies to certain reorderings that are possible across different atomic variables with other memory orderings, and means that there are certain concurrency bugs that fenic cannot catch.

Combinatorial Explosion with Many Threads

The number of possible execution interleavings grows exponentially with the number of threads, as each possible execution of each additional thread must be taken into account for each possible execution of the current threads. Fenic mitigates this to an extent by reducing the state space (see “Implementation” below) through equivalent execution elimination. For example, if two threads read from the same atomic variable, fenic does not attempt another execution given that the order in which two threads read from the same atomic cannot impact the execution.

However, even with equivalent execution elimination, the number of possible executions grows significantly with each new thread, to the point where checking becomes infeasible. Fenic therefore specifically limits the number of threads it will model (see MAX_THREADS), and tailors its implementation to that limit.

Implementation

Fenic is an implementation of techniques described in CDSChecker: Checking Concurrent Data Structures Written with C/C++ Atomics. Please see the paper for much more detail on equivalent execution elimination and the other techniques fenic uses to accurately model the C11 memory model.

Modules

  • Memory allocation APIs
  • Shareable mutable containers.
  • futurefutures
    Future related synchronization primitives.
  • Mocked versions of std::hint functions.
  • Mock implementation of the lazy_static crate.
  • Model concurrent programs.
  • Mock implementation of std::sync.
  • Mock implementation of std::thread.

Macros

Constants

  • Maximum number of threads that can be included in a model.

Functions

  • Tells fenic to explore possible concurrent executions starting at this point.
  • Run all concurrent permutations of the provided closure.
  • Tells fenic to stop exploring possible concurrent execution starting at this point.
  • Tells fenic to stop exploring possible concurrent executions starting at this point.