fuzzcheck 0.3.0

A coverage-guided, structure-aware fuzzing engine for Rust functions
Documentation
# fuzzcheck

Fuzzcheck is a structure-aware, in-process, coverage-guided, evolutionary 
fuzzing engine for Rust functions. 

Its main aim is to be used as the input generator of property-based tests.

Given a function `test: (T) -> bool`, it tries to find a value of type `T` that
fails the test or leads to a crash.

Fuzzcheck works by maintaining a pool of test inputs and ranking them using
the uniqueness of the code coverage caused by running `test(input)`. 
From that pool, it selects a high-ranking input, mutates it, and runs the test
function again. If the new mutated input has an interesting code coverage then
it is added to the pool, otherwise, fuzzcheck tries again with a different 
input and mutation.

In pseudocode:

```rust
loop {
    let input = pool.select();
    mutate(&mut input);

    let analysis = analyze(test, &input);

    match analysis {
        Failed => reportFailure(input),
        Interesting(score) => pool.add(input, score),
        NotInteresting => continue
    }
}
```

Unlike other coverage-guided fuzzing engines, it doesn't work with bitstrings 
but instead works with values of any type `T` directly. The complexity of the 
inputs and the way to mutate them is given by functions defined by the user.

## Usage

The first step is to install the `cargo-fuzzcheck` executable using cargo nightly. 

```bash
cargo +nightly install cargo-fuzzcheck
```

Then, somewhere else, create a new cargo crate. It will contain the
library code that you want to fuzz-test. Also do not forget to set the rust
version to nightly.

```bash
cargo new --lib my_library
cd my_library
rustup override set nightly
```

Then, run `cargo fuzzcheck init` to initialize a `fuzz` folder that will
contain all future fuzz tests.

```
cargo fuzzcheck init
```

A sample test function was created at `fuzz/instrumented/src/lib.rs`.
The generated file contains an example of a custom data type being
fuzzed. The derive proc_macro `HasDefaultMutator` automatically generates a
mutator called `SampleDataMutator`. Calling 
`SampleData<A, B, C>::default_mutator()` returns an instance of that 
mutator if `A`, `Vec<B>`, and `C` all implement `HasDefaultMutator`. For now,
only mutators for structs whose fields do not use private types can be 
automatically generated.

```rust
#[derive(Clone, HasDefaultMutator, ...)]
pub struct SampleData<A, B, C> {
    a: A,
    b: Vec<B>,
    c: C
}
```

The file also contains a sample fuzz test that can be given to fuzzcheck
to find an input of type `Vec<SampleData<u8, Option<u8>, u8>>` that fails the
the test.

```rust
pub fn test(input: &[SampleData<u8, Option<u8>, u8>]) -> bool {
    if 
        input.len() > 5 &&
        input[0].a == 0 &&
        input[0].b == vec![Some(2), None, Some(187)] &&
        input[0].c == 9 &&
        input[1].a == 189 &&
        input[1].b.len() > 5 &&
        input[1].b[0] == Some(89) &&
        input[1].b[1] == None &&
        input[1].b[2] == Some(213) &&
        // etc.
    {
        false
    }
    else {
        true
    }
}
```

And an executable script was created at 
`fuzz/non_instrumented/fuzz_targets/target1.rs`. It launches the fuzzing engine
on the above test function using the default mutator.

```rust
/* Various import statements not included in this example */

fn main() {
    // the default mutator generated by the derive proc_macro
    let mutator = Vec::<SampleData<u8, Option<u8>, u8>>::default_mutator();
    // serde is used to serialize interesting inputs to the file system
    let serializer = SerdeSerializer::default();
    // launch fuzzcheck on the test function
    let _ = fuzzcheck::launch(test, mutator, serializer);
}
```

You can already try launching this test: 

```
cargo fuzzcheck run target1 fuzz
```

This starts a loop that will stop when a failing test has been found.

A line will be printed whenever a newsworthy event happened, along with some
statistics. For example:

```
NEW	2220741	score: 29.45	pool: 30	exec/s: 236072	cplx: 128.94
```

* `NEW` means that a new input was added to the pool of interesting inputs
* `2220741` is the number of iterations that were performed so far
* `score: 29.45` is a measure of the total code coverage caused by all inputs
in the pool
* `pool: 30` is the number of inputs in the pool
* `exec/s: 236072` is the average number of iterations performed every second
* `cplx: 128.94` is the average complexity of the inputs in the pool

When a failing test has been found, the following is printed:
```
================ TEST FAILED ================
8162617	score: 55.15	pool: 52	exec/s: 142532	cplx: 151.88	
Saving at "fuzz/artifacts/target1/2a71aa786f3e552b.json"
```

Here, the path to the artifact file is 
`fuzz/artifacts/target1/2a71aa786f3e552b.json`. 
It contains the JSON-encoded input that failed the test.

```json
[
  {
    "a": 0,
    "b": [2, null, 187],
    "c": 9
  },
  {
    "a": 189,
    "b": [89, null, 213, 189, null, 32],
    "c": 19
  },
  ...
]
```

Moreover, the fuzzer can maintain a copy of its input pool in the file system,
which is located by default at `fuzz/corpora/<target>`. Fuzzing corpora 
are useful to kick-start a fuzzing process by providing a list of known interesting inputs.
If you try to run the fuzzer again, you will see that it finds the problematic input much 
quicker. This is because it first read the values written inside the fuzz corpus and used 
them as starting points.

## Structure of the fuzz folder

The fuzz folder is a bit difficult to understand, because fuzzcheck needs to 
compile the crate and the fuzz test in two different ways. This is why it 
contains an `instrumented` and a `non-instrumented` folder. 

The `instrumented` folder contains all the test functions and their helper 
functions. It can use your library as a dependency but not `fuzzcheck` 
or `non_instrumented`. Every piece of code written there will be instrumented
such that its code coverage can be recorded.

The `non-instrumented` folder contains the code that launches the fuzzer 
(called the `fuzz_targets`). It uses your library, `fuzzcheck`, and `instrumented` as 
dependencies. The code there is not instrumented.

```
.
├── Cargo.toml
├── fuzz                     # everything inside `fuzz` is to be used by fuzzcheck
│  ├── fuzzcheck.toml        # fuzzcheck configuration file
│  ├── instrumented          # a crate that contains the test functions
│  │  ├── Cargo.toml
│  │  └── src
│  │     └── lib.rs
│  └── non_instrumented      # a crate to compile fuzzcheck and launch the fuzz tests 
│     ├── build.rs
│     ├── Cargo.toml
│     └── fuzz_targets
│        ├── target1.rs      # a fuzz-test
│        └── target2.rs      # another fuzz-test
└── src
   └── lib.rs                # your library code
```

Note that if `instrumented` and `non_instrumented` both depend on a common 
crate `A`, then that crate will be compiled twice and the two versions of it
will live in the resulting binary. These two versions will have different,
incompatible versions of the types and traits defined by `A`.

## Minifying failing test inputs

Fuzzcheck can also be used to *minify* a large input that fails a test.

Let's say you have a file `crash.json` containing an input that you would like
to minify:

```json
[
  {
    "a": 0,
    "b": [2, null, 187],
    "c": 9
  },
  {
    "a": 189,
    "b": [89, null, 213, 189, null, 32],
    "c": 19
  },
  ...
]
```

Launch `cargo-fuzzcheck run` on your target with the `tmin` command and an
`--input-file` option.

```bash
cargo fuzzcheck run target1 tmin --input-file "artifacts/crash.json"
```

This will repeatedly launch the fuzzer in “minify” mode and save the
artifacts in the folder `artifacts/crash.minified`. The name of each artifact 
will be prefixed with the complexity of its input. For example,
`crash.minified/800--fe958d4f003bd4f5.json` has a complexity of `8.00`.

You can stop the minifying fuzzer at any point and look for the least complex
input in the `crash.minified` folder.

## Creating a Mutator

If you would like to fuzz-test your own custom type without the `HasDefaultMutator` derive attribute, you will have to create
a `Mutator` for it. You can do so by creating a type that conforms to
the `Mutator` trait.

```rust
pub trait Mutator: Sized {
    type Value: Clone;
    type Cache: Clone;
    type MutationStep: Clone;
    type UnmutateToken;

    /// Compute the cache for the given value
    fn cache_from_value(&self, value: &Self::Value) -> Self::Cache;
    /// Compute the initial mutation step for the given value
    fn mutation_step_from_value(&self, value: &Self::Value) -> Self::MutationStep;

    /// The maximum complexity of an input of this type
    fn max_complexity(&self) -> f64;
    /// The minimum complexity of an input of this type
    fn min_complexity(&self) -> f64;
    /// The complexity of the current input
    fn complexity(&self, value: &Self::Value, cache: &Self::Cache) -> f64;

    /// Create an arbitrary value
    fn arbitrary(&mut self, seed: usize, max_cplx: f64) -> (Self::Value, Self::Cache);

    /// Mutate the given value in-place and return a token describing how to reverse the mutation
    fn mutate(&mut self, value: &mut Self::Value, cache: &mut Self::Cache, step: &mut Self::MutationStep, max_cplx: f64) -> Self::UnmutateToken;

    /// Reverse a mutation
    fn unmutate(&self, value: &mut Self::Value, cache: &mut Self::Cache, t: Self::UnmutateToken);
}

```

This trait can be a bit difficult to implement, but it is very powerful and it
is possible to write efficient and composable mutators with it. For 
example, fuzzcheck implements `U8Mutator` (u8), `OptionMutator` (Option), and
`VecMutator` (Vec). They compose such that it possible to use a 
`VecMutator<VecMutator<OptionMutator<U8Mutator>>>` to fuzz values of type 
`Vec<Vec<Option<u8>>>`.

Furthermore, the `fuzzcheck_mutators_derive` crate offers a proc_macro that
can generate a mutator for user-defined structs automatically. In the future,
enumerations and unions will also be supported.

I would like to write a guide to fuzzcheck to explain the trait and how to work
with it. But in the meantime, if you have questions, please send me an email or
create an issue on GitHub. You can also look at the documentation of the trait
that explains some of the design decisions behind it.

My goal is to write more mutators for common types and building blocks for 
composability such that a custom implementation of `Mutator` is rarely 
needed.

## Previous work on fuzzing engines

As far as I know, evolutionary, coverage-guided fuzzing engines were
popularized by [American Fuzzy Lop (AFL)](http://lcamtuf.coredump.cx/afl/).  
Fuzzcheck is also evolutionary and coverage-guided.

Later on, LLVM released its own fuzzing engine, 
[libFuzzer](https://www.llvm.org/docs/LibFuzzer.html), which is based on the
same ideas as AFL, but it uses Clang’s 
[SanitizerCoverage](https://clang.llvm.org/docs/SanitizerCoverage.html) and is
in-process (it lives in the same process as the program being fuzz-tested.  
Fuzzcheck is also in-process and also uses SanitizerCoverage.

Both AFL and libFuzzer work by manipulating bitstrings (e.g. `1011101011`).
However, many programs work on structured data, and mutations at the
bitstring level may not map to meaningful mutations at the level of the
structured data. This problem can be partially addresses by using a compact
binary encoding such as protobuf and providing custom mutation functions to
libFuzzer that work on the structured data itself. This is a way to perform
“structure-aware fuzzing” ([talk](https://www.youtube.com/watch?v=U60hC16HEDY),
[tutorial](https://github.com/google/fuzzer-test-suite/blob/master/tutorial/structure-aware-fuzzing.md)).

An alternative way to deal with structured data is to use generators just like
QuickCheck’s `Arbitrary` trait. And then to “treat the raw byte buffer input 
provided by the coverage-guided fuzzer as a sequence of random values and
implement a “random” number generator around it.” 
([cited blog post by @fitzgen](https://fitzgeraldnick.com/2019/09/04/combining-coverage-guided-and-generation-based-fuzzing.html)). 
The tool `cargo-fuzz` has
[recently](https://fitzgeraldnick.com/2020/01/16/better-support-for-fuzzing-structured-inputs-in-rust.html) 
implemented that approach.

Fuzzcheck is also structure-aware, but unlike previous attempts at
structure-aware fuzzing, it doesn't use an intermediary binary encoding such as
protobuf nor does it use Quickcheck-like generators.
Instead, it directly mutates the typed values in-process.
This is better many ways. First, it is faster because there is no
need to encode and decode inputs at each iteration. Second, the complexity of
the input is given by a user-defined function, which will be more accurate than
counting the bytes of the protobuf encoding.
Finally, and most importantly, the mutations are faster and more meaningful 
than those done on protobuf or `Arbitrary`’s byte buffer-based RNG.
A detail that I particularly like about fuzzcheck, and that is possible only 
because it mutates typed values, is that every mutation is done **in-place**
and is reversable. That means that generating a new test case is super fast, 
and can often even be done with zero allocations.

As I was developing Fuzzcheck for Swift, a few researchers developed Fuzzchick
for Coq ([paper](https://www.cs.umd.edu/~mwh/papers/fuzzchick-draft.pdf)). It 
is a coverage-guided property-based testing tool implemented as an extension to
Quickchick. As far as I know, it is the only other tool with the same philosophy
as fuzzcheck. The similarity between the names `fuzzcheck` and `Fuzzchick` is a 
coincidence.