Expand description
Fenic is a tool for testing concurrent programs.
At a high level, it runs tests many times, permuting the possible concurrent executions of each test according to what constitutes valid executions under the C11 memory model. It then uses state reduction techniques to avoid combinatorial explosion of the number of possible executions.
Background
Testing concurrent programs is challenging; concurrent strands of execution can interleave in all sorts of ways, and each such interleaving might expose a concurrency bug in the program. Some bugs may be so rare that they only occur under a very small set of possible executions, and may not surface even if you run the code millions or billions of times.
Fenic provides a way to deterministically explore the various possible execution permutations without relying on random executions. This allows you to write tests that verify that your concurrent code is correct under all executions, not just “most of the time”.
Consider a simple example:
use std::sync::Arc;
use std::sync::atomic::AtomicUsize;
use std::sync::atomic::Ordering::SeqCst;
use std::thread;
#[test]
fn test_concurrent_logic() {
let v1 = Arc::new(AtomicUsize::new(0));
let v2 = v1.clone();
thread::spawn(move || {
v1.store(1, SeqCst);
});
assert_eq!(0, v2.load(SeqCst));
}
This program is incorrect: the main thread might yield between spawning the thread that stores
to v1
and loading from v2
, during which time the spawned thread may get to run and store 1
into v1
. Most of the time, the main thread will get to the assertion before the spawned
thread executes, so the assertion will succeed. But, every once in a while, the spawned thread
manages to run just in time and the assertion will fail! This is obviously a contrived example,
but in practice many concurrent programs exhibit similar behavior – they operate correctly
under most executions, but some executions end up producing buggy behavior.
Historically, the strategy for testing concurrent code has been to run tests in loops and hope that an execution fails. Or to place the testing host under load while running the test suite in an attempt to produce less frequently exercised executions. However, this kind of testing is not reliable, and, in the event an iteration should fail, debugging the cause is exceedingly difficult.
The problem is compounded when other memory orderings than SeqCst
are considered, where bugs
may only occur on hardware with particular memory characteristics, and thus no amount of
iteration will demonstrate the bug on different hardware!
Solution
Fenic fixes the problem by simulating the operating system’s scheduler and Rust’s memory model such that all possible valid behaviors are explored and tested. To see how this works out in practice, the above example can be rewritten to use fenic’s concurrency types as:
use fenic::sync::atomic::AtomicUsize;
use fenic::thread;
use std::sync::Arc;
use std::sync::atomic::Ordering::SeqCst;
#[test]
fn test_concurrent_logic() {
fenic::model(|| {
let v1 = Arc::new(AtomicUsize::new(0));
let v2 = v1.clone();
thread::spawn(move || {
v1.store(1, SeqCst);
});
assert_eq!(0, v2.load(SeqCst));
});
}
Fenic will run the closure provided to fenic::model
many times over, and each time a different
thread scheduling will be used. That is, one execution will have the spawned thread run after
the load from v2
, and another will have the spawned thread run before the store to v2
.
Thus, the test is guaranteed to fail.
Writing tests
Test cases using fenic must be fully deterministic. All sources of non-determism must be via fenic types so that fenic can expose different possible values on each execution of the test closure. Other sources of non-determinism like random number generation or system calls cannot be modeled directly by fenic, and must be mocked to be testable by fenic.
To model synchronization non-determinism, tests must use the fenic synchronization types, such
as Atomic*
, Mutex
, RwLock
,
Condvar
, as well as other concurrency primitives like thread::spawn
,
UnsafeCell
, and lazy_static!
. However, when not running fenic
tests, the std
should be used, since the fenic runtime won’t be active. This means that
library code will need to use conditional compilation to decide which types to use.
It is recommended to use a fenic
cfg flag to signal using the fenic types. You can do this by
passing RUSTFLAGS="--cfg fenic"
as part of the command when you want to run the fenic tests.
Then modify your Cargo.toml
to include fenic like this:
[target.'cfg(fenic)'.dependencies]
fenic = "0.7"
One common strategy to use the right types with and without fenic is to create a module in your
crate named sync
or any other name of your choosing. In this module, list out the types that
need to be toggled between fenic and std
:
#[cfg(fenic)]
pub(crate) use fenic::sync::atomic::AtomicUsize;
#[cfg(not(fenic))]
pub(crate) use std::sync::atomic::AtomicUsize;
Then, elsewhere in the library:
use crate::sync::AtomicUsize;
Handling Fenic API differences.
Most of fenic’s type are drop-in replacements for their counterpart in std
, but sometimes
there are minor API differences that you must work around. If your library must use Fenic APIs
that differ from std
types, then the library will be required to implement those APIs for
std
. For example, for UnsafeCell
, in the library’s source, add the following:
#![cfg(not(fenic))]
#[derive(Debug)]
pub(crate) struct UnsafeCell<T>(std::cell::UnsafeCell<T>);
impl<T> UnsafeCell<T> {
pub(crate) fn new(data: T) -> UnsafeCell<T> {
UnsafeCell(std::cell::UnsafeCell::new(data))
}
pub(crate) fn with<R>(&self, f: impl FnOnce(*const T) -> R) -> R {
f(self.0.get())
}
pub(crate) fn with_mut<R>(&self, f: impl FnOnce(*mut T) -> R) -> R {
f(self.0.get())
}
}
Yielding
Some concurrent algorithms assume a fair scheduler. For example, a spin lock assumes that, at
some point, another thread will make enough progress for the lock to become available. This
presents a challenge for fenic as its scheduler is, by design, not fair. It is specifically
trying to emulate every possible execution, which may mean that another thread does not get
to run for a very long time (see also Spinlocks Considered Harmful). In such cases, loops
must include calls to fenic::thread::yield_now
. This tells fenic that
another thread needs to be scheduled in order for the current one to make progress.
Running Fenic Tests
Fenic tests must be run separately, with RUSTFLAGS="--cfg fenic"
specified (assuming you went
with the cfg
approach suggested above). For example, if the library includes a test file:
tests/fenic_my_struct.rs
that includes tests with fenic::model
, then run the
following command:
RUSTFLAGS="--cfg fenic" cargo test --test fenic_my_struct --release
Note that you will generally want to run fenic tests with --release
since fenic must execute
each test closure a large number of times, at which point the speed win from optimized code
makes a big difference.
Debugging Fenic Failures
Fenic’s deterministic execution allows the specific chain of events leading to a test failure can be isolated for debugging. When a fenic test fails, the first step is to isolate the exact execution path that resulted in the failure. To do this, Fenic is able to output the execution path to a file. Two environment variables are useful for this process:
LOOM_CHECKPOINT_FILE
LOOM_CHECKPOINT_INTERVAL
The first specifies the file to write to and read from. The second specifies how often to write to the file. If the execution fails on the 10,000,000th permutation, it is faster to write to a file every 10,000 iterations instead of every single one.
To isolate the exact failing path, first run the following command to generate the checkpoint for the failing scenario:
LOOM_CHECKPOINT_FILE=my_test.json [other env vars] \
cargo test --test fenic_my_struct --release [failing test]
Then this to check that the next permutation indeed triggers the fault:
LOOM_CHECKPOINT_INTERVAL=1 LOOM_CHECKPOINT_FILE=my_test.json [other env vars] \
cargo test --test fenic_my_struct --release [failing test]
The test should fail on the first permutation, effectively isolating the failure scenario.
The next step is to enable additional log output for just the failing permutation. Again, there are some environment variables for this:
LOOM_LOG
LOOM_LOCATION
The first environment variable, LOOM_LOG
, outputs a marker on every thread switch. This helps
with tracing the exact steps in a threaded environment that results in the test failure.
The second, LOOM_LOCATION
, enables location tracking. This includes additional information in
panic messages that helps identify which specific field resulted in the error.
Put together, the command becomes (yes, we know this is not great… but it works):
LOOM_LOG=trace \
LOOM_LOCATION=1 \
LOOM_CHECKPOINT_INTERVAL=1 \
LOOM_CHECKPOINT_FILE=my_test.json \
RUSTFLAGS="--cfg fenic" \
[other env vars] \
cargo test --test fenic_my_struct --release [failing test]
This should provide you with a trace of all the concurrency events leading up to the failure, which should allow you to identify how the bug is triggered.
Limitations and Caveats
Intrusive Implementation
Fenic works by intercepting all loads, stores, and other concurrency-sensitive operations (like spawning threads) that may trigger concurrency bugs in an applications. But this interception is not automatic – it requires that the code being tested specifically uses the fenic replacement types. Any code that does not use fenic’s replacement types is invisible to fenic, and thus won’t be subject to the fenic model’s permutation.
While it is relatively simple to utilize fenic’s types in a single crate through the root-level
#[cfg(fenic)] mod sync
approach suggested earlier, more complex use-cases may require the use
of a library that itself uses concurrent constructs like locks and channels. In such cases,
that library must also be augmented to support fenic to achieve complete execution coverage.
Note that fenic still works if some concurrent operations are hidden from it (for example, if
you use std::sync::Arc
instead of fenic::sync::Arc
). It just means that fenic won’t be able
to reason about the interaction between those operations and the other concurrent operations in
your program, and thus certain executions that are possible in the real world won’t be modeled.
Large Models
By default, fenic runs an exhaustive check of your program’s possible concurrent executions where all possible interleavings are checked. Fenic’s state reduction algorithms (see “Implementation” below) significantly reduce the state space that must be explored, but complex models can still take significant time to complete.
To handle such large models in a more reasonable amount of time, you may need to not run
an exhaustive check, and instead tell fenic to prune out interleavings that are unlikely to
reveal additional bugs. You do this by providing fenic with a thread pre-emption bound. If you
set such a bound, fenic will check all possible executions that include at most n
thread
pre-emptions (where one thread is forcibly stopped and another one runs in its place. In
practice, setting the thread pre-emption bound to 2 or 3 is enough to catch most bugs while
significantly reducing the number of possible executions.
To set the thread pre-emption bound, set the LOOM_MAX_PREEMPTIONS
environment
variable when running tests (or set
Builder::preemption_bound
). For example:
LOOM_MAX_PREEMPTIONS=3 RUSTFLAGS="--cfg fenic" cargo test --test fenic_my_struct --release
Relaxed Memory Ordering
The Relaxed
memory ordering allows particularly
strange executions. For example, in the following code snippet, it is completely
legal for r1 == r2 == 42
!
thread::spawn(move || {
let r1 = y.load(Ordering::Relaxed); // A
x.store(r1, Ordering::Relaxed); // B
});
thread::spawn(move || {
let r2 = x.load(Ordering::Relaxed); // C
y.store(42, Ordering::Relaxed); // D
});
Unfortunately, it is not possible for fenic to completely model all the interleavings that relaxed memory ordering allows. This is because the relaxed memory ordering allows memory operations to be re-ordered within a single thread – B can run before A – which fenic cannot emulate. The same restriction applies to certain reorderings that are possible across different atomic variables with other memory orderings, and means that there are certain concurrency bugs that fenic cannot catch.
Combinatorial Explosion with Many Threads
The number of possible execution interleavings grows exponentially with the number of threads, as each possible execution of each additional thread must be taken into account for each possible execution of the current threads. Fenic mitigates this to an extent by reducing the state space (see “Implementation” below) through equivalent execution elimination. For example, if two threads read from the same atomic variable, fenic does not attempt another execution given that the order in which two threads read from the same atomic cannot impact the execution.
However, even with equivalent execution elimination, the number of possible executions grows
significantly with each new thread, to the point where checking becomes infeasible. Fenic
therefore specifically limits the number of threads it will model (see MAX_THREADS
), and
tailors its implementation to that limit.
Implementation
Fenic is an implementation of techniques described in CDSChecker: Checking Concurrent Data Structures Written with C/C++ Atomics. Please see the paper for much more detail on equivalent execution elimination and the other techniques fenic uses to accurately model the C11 memory model.
Modules
- Memory allocation APIs
- Shareable mutable containers.
- future
futures
Future related synchronization primitives. - Mocked versions of
std::hint
functions. - Mock implementation of the
lazy_static
crate. - Model concurrent programs.
- Mock implementation of
std::sync
. - Mock implementation of
std::thread
.
Macros
- Mock version of
lazy_static::lazy_static!
. - Mock version of
std::thread_local!
.
Constants
- Maximum number of threads that can be included in a model.
Functions
- Tells fenic to explore possible concurrent executions starting at this point.
- Run all concurrent permutations of the provided closure.
- Tells fenic to stop exploring possible concurrent execution starting at this point.
- Tells fenic to stop exploring possible concurrent executions starting at this point.