Expand description

A fail point implementation for Rust.

Fail points are code instrumentations that allow errors and other behavior to be injected dynamically at runtime, primarily for testing purposes. Fail points are flexible and can be configured to exhibit a variety of behavior, including panics, early returns, and sleeping. They can be controlled both programmatically and via the environment, and can be triggered conditionally and probabilistically.

This crate is inspired by FreeBSD’s failpoints.

Usage

First, add this to your Cargo.toml:

[dependencies]
failpoints = "0.1"

Now you can import the failpoint! macro from the fail crate and use it to inject dynamic failures.

As an example, here’s a simple program that uses a fail point to simulate an I/O panic:

use failpoints::{failpoint, FailScenario};

fn do_fallible_work() {
    failpoint!("read-dir");
    let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
    // ... do some work on the directory ...
}

let scenario = FailScenario::setup();
do_fallible_work();
scenario.teardown();
println!("done");

Here, the program calls unwrap on the result of read_dir, a function that returns a Result. In other words, this particular program expects this call to read_dir to always succeed. And in practice it almost always will, which makes the behavior of this program when read_dir fails difficult to test. By instrumenting the program with a fail point we can pretend that read_dir failed, causing the subsequent unwrap to panic, and allowing us to observe the program’s behavior under failure conditions.

When the program is run normally it just prints “done”:

$ cargo run --features fail/failpoints
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/failpointtest`
done

But now, by setting the FAILPOINTS variable we can see what happens if the read_dir fails:

FAILPOINTS=read-dir=panic cargo run --features fail/failpoints
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/failpointtest`
thread 'main' panicked at 'failpoint read-dir panic', /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/fail-0.2.0/src/lib.rs:286:25
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Usage in tests

The previous example triggers a fail point by modifying the FAILPOINT environment variable. In practice, you’ll often want to trigger fail points programmatically, in unit tests. Fail points are global resources, and Rust tests run in parallel, so tests that exercise fail points generally need to hold a lock to avoid interfering with each other. This is accomplished by FailScenario.

Here’s a basic pattern for writing unit tests tests with fail points:

use failpoints::{failpoint, FailScenario};

fn do_fallible_work() {
    failpoint!("read-dir");
    let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
    // ... do some work on the directory ...
}

#[test]
#[should_panic]
fn test_fallible_work() {
    let scenario = FailScenario::setup();
    failpoints::cfg("read-dir", "panic").unwrap();

    do_fallible_work();

    scenario.teardown();
}

Even if a test does not itself turn on any fail points, code that it runs could trigger a fail point that was configured by another thread. Because of this it is a best practice to put all fail point unit tests into their own binary. Here’s an example of a snippet from Cargo.toml that creates a fail-point-specific test binary:

[[test]]
name = "failpoints"
path = "tests/failpoints/mod.rs"
required-features = ["fail/failpoints"]

Early return

The previous examples illustrate injecting panics via fail points, but panics aren’t the only — or even the most common — error pattern in Rust. The more common type of error is propagated by Result return values, and fail points can inject those as well with “early returns”. That is, when configuring a fail point as “return” (as opposed to “panic”), the fail point will immediately return from the function, optionally with a configurable value.

The setup for early return requires a slightly diferent invocation of the failpoint! macro. To illustrate this, let’s modify the do_fallible_work function we used earlier to return a Result:

use failpoints::{failpoint, FailScenario};
use std::io;

fn do_fallible_work() -> io::Result<()> {
    failpoint!("read-dir");
    let _dir: Vec<_> = std::fs::read_dir(".")?.collect();
    // ... do some work on the directory ...
    Ok(())
}

fn main() -> io::Result<()> {
    let scenario = FailScenario::setup();
    do_fallible_work()?;
    scenario.teardown();
    println!("done");
    Ok(())
}

This example has more proper Rust error handling, with no unwraps anywhere. Instead it uses ? to propagate errors via the Result type return values. This is more realistic Rust code.

The “read-dir” fail point though is not yet configured to support early return, so if we attempt to configure it to “return”, we’ll see an error like

$ FAILPOINTS=read-dir=return cargo run --features fail/failpoints
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/failpointtest`
thread 'main' panicked at 'Return is not supported for the fail point "read-dir"', src/main.rs:7:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

This error tells us that the “read-dir” fail point is not defined correctly to support early return, and gives us the line number of that fail point. What we’re missing in the fail point definition is code describring how to return an error value, and the way we do this is by passing failpoint! a closure that returns the same type as the enclosing function.

Here’s a variation that does so:

fn do_fallible_work() -> io::Result<()> {
    failpoints::failpoint!("read-dir", |_| {
        Err(io::Error::new(io::ErrorKind::PermissionDenied, "error"))
    });
    let _dir: Vec<_> = std::fs::read_dir(".")?.collect();
    // ... do some work on the directory ...
    Ok(())
}

And now if the “read-dir” fail point is configured to “return” we get a different result:

$ FAILPOINTS=read-dir=return cargo run --features fail/failpoints
   Compiling failpointtest v0.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 2.38s
     Running `target/debug/failpointtest`
Error: Custom { kind: PermissionDenied, error: StringError("error") }

This time, do_fallible_work returned the error defined in our closure, which propagated all the way up and out of main.

Advanced usage

That’s the basics of fail points: defining them with failpoint!, configuring them with FAILPOINTS and failpoints::cfg, and configuring them to panic and return early. But that’s not all they can do. To learn more see the documentation for cfg, cfg_callback and failpoint!.

Usage considerations

For most effective fail point usage, keep in mind the following:

  • Fail points are disabled by default and can be enabled via the failpoints feature. When failpoints are disabled, no code is generated by the macro.
  • Carefully consider complex, concurrent, non-deterministic combinations of fail points. Put test cases exercising fail points into their own test crate.
  • Fail points might have the same name, in which case they take the same actions. Be careful about duplicating fail point names, either within a single crate, or across multiple crates.

Macros

Define a fail point (requires failpoints feature).

Structs

Test scenario with configured fail points.

Functions

Configure the actions for a fail point at runtime.

Configure the actions for a fail point at runtime.

Returns whether code generation for failpoints is enabled.

Get all registered fail points.

Remove a fail point.