/*!
A lightweight micro-benchmarking library which:
* uses linear regression to screen off sources of constant error;
* handles benchmarks which must mutate some state;
* `cloc`s in at under 100 lines of code with no dependencies;
* has an eponymously simple API!
Easybench is optimised for benchmarks with a running time of more than a nanosecond but less than a
millisecond. It's inspired by [criterion], but doesn't do the sophisticated analysis (no HTML
output).
[criterion]: https://hackage.haskell.org/package/criterion
```
use easybench::{bench,bench_env};
# fn fib(_: usize) -> usize { 0 }
#
// Simple benchmarks are performed with `bench`.
println!("fib 200: {}", bench(|| fib(200) ));
println!("fib 500: {}", bench(|| fib(500) ));
// If a function needs to mutate some state, use `bench_env`.
println!("reverse: {}", bench_env(vec![0;100], |xs| xs.reverse() ));
println!("sort: {}", bench_env(vec![0;100], |xs| xs.sort() ));
```
Running the above yields the following results:
```none
fib 200: 38 ns (R²=1.000, 26053497 iterations in 154 samples)
fib 500: 110 ns (R²=1.000, 9131584 iterations in 143 samples)
reverse: 54 ns (R²=0.999, 5669992 iterations in 138 samples)
sort: 93 ns (R²=1.000, 4685942 iterations in 136 samples)
```
Easy! However, please read the [caveats](#caveats) below before using.
# Benchmarking algorithm
An *iteration* is a single execution of your code. A *sample* is a measurement, during which your
code may be run many times. In other words: taking a sample means performing some number of
iterations and measuring the total time.
The first sample we take performs only 1 iteration, but as we continue taking samples we increase
the number of iterations exponentially. We stop when a time limit is reached (currently 1 second).
Next, we perform OLS linear regression on the resulting data. The gradient of the regression line
tells us how long it takes to perform a single iteration of the benchmark. The R² value is a
measure of how much noise there is in the data.
# Caveats
## Caveat 1: Harness overhead
**TL;DR: compile with `--release`, and expect worst-case overhead of a few nanoseconds**
How much overhead does easybench itself introduce? As explained above, we use linear regression to
eliminate any constant error involved with taking a sample. However, this technique doesn't prevent
linear error from showing up - that is, if there's some work which easybench does every iteration,
then it will be included in the results.
If your benchmark takes more than a nanosecond to run, and you compile with `--release`, this work
will typically be negligable:
* In the case of `bench` it amounts to incrementing the loop counter, plus the [cost of
`black_box`](#bonus-caveat-black-box).
* In the case of `bench_env`, we also do a lookup into a big vector in order to get the environment
for that iteration. If you're really unlucky, this could amount to a systematic cache miss, which
would affect the results by a few nanoseconds. This is an unlikely worst-case, but it's best to
be aware of the possibility.
If you're concerned at all about overhead, please take a look at [the inner loop of
`bench_env`][source]. The code is not scary.
[source]: src/easybench/lib.rs.html#197-208
## Caveat 2: Sufficient data
**TL;DR: measurements of code which takes more than a millisecond to run are not reliable**
Each benchmark collects data for 1 second. This means that in order to collect a statistically
significant amount of data, your code should run much faster than this. In fact, if we don't manage
to take a least 3 samples, easybench will actually panic!
When inspecting the results, make sure things looks statistically significant. In particular:
* Make sure the number of samples is big enough. More than 100 is probably OK.
* Make sure the R² isn't suspiciously low. It's easy to get a high R² value when the number of
samples is small, so unfortunately the definition of "suspiciously low" depends on how many
samples were taken. As a rule of thumb, expect values greater than 0.99.
## Caveat 3: Pure functions
**TL;DR: return enough information to prevent the optimiser from eliminating code from your
benchmark**
Benchmarking pure functions involves a nasty gotcha which users should be aware of. Consider the
following benchmarks:
```
# use easybench::{bench,bench_env};
#
# fn fib(_: usize) -> usize { 0 }
#
let fib_1 = bench(|| fib(500) ); // fine
let fib_2 = bench(|| { fib(500); } ); // spoiler: NOT fine
let fib_3 = bench_env(0, |x| { *x = fib(500); } ); // also fine, but ugly
# let _ = (fib_1, fib_2, fib_3);
```
The results are a little surprising:
```none
fib_1: 110 ns (R²=1.000, 9131585 iterations in 144 samples)
fib_2: 0 ns (R²=1.000, 413289203 iterations in 184 samples)
fib_3: 109 ns (R²=1.000, 9131585 iterations in 144 samples)
```
Oh, `fib_2`, why do you lie? The answer is: `fib(500)` is pure, and its return value is immediately
thrown away, so the optimiser replaces the call with a no-op (which clocks in at 0 ns).
What about the other two? `fib_1` looks very similar, with one exception: the closure which we're
benchmarking returns the result of the `fib(500)` call. When it runs your code, easybench takes the
return value and tricks the optimiser into thinking it's going to use it for something, before
throwing it away. This is why `fib_1` is safe from having code accidentally eliminated.
In the case of `fib_3`, we actually *do* use the return value: each iteration we take the result of
`fib(500)` and store it in the iteration's environment. This has the desired effect, but looks a
bit weird.
## Bonus caveat: Black box
The function which easybench uses to trick the optimiser (`black_box`) is stolen from [bencher],
which [states]:
[bencher]: https://docs.rs/bencher/
[states]: https://docs.rs/bencher/0.1.2/bencher/fn.black_box.html
> NOTE: We don't have a proper black box in stable Rust. This is a workaround implementation, that
> may have a too big performance overhead, depending on operation, or it may fail to properly avoid
> having code optimized out. It is good enough that it is used by default.
However, it works well enough in my experience.
*/
use ;
use ;
// Each time we take a sample we increase the number of iterations:
// iters = ITER_SCALE_FACTOR ^ sample_no
const ITER_SCALE_FACTOR: f64 = 1.1;
// We try to spend this many seconds (roughly) in total on each benchmark.
const BENCH_TIME_LIMIT_SECS: u64 = 1;
/// Statistics for a benchmark run.
/// Run a benchmark.
///
/// The return value of `f` is not used, but we trick the optimiser into thinking we're going to
/// use it. Make sure to return enough information to prevent the optimiser from eliminating code
/// from your benchmark! (See the module docs for more.)
/// Run a benchmark with an environment.
///
/// The value `env` is a clonable prototype for the "benchmark environment". Each iteration
/// receives a freshly-cloned mutable copy of this environment. The time taken to clone the
/// environment is not included in the results.
///
/// Nb: it's very possible that we will end up allocating many (>10,000) copies of `env` at the
/// same time. Probably best to keep it small.
///
/// See `bench` and the module docs for more.
/// Compute the OLS linear regression line for the given data set, returning the line's gradient
/// and R². Requires at least 2 samples.
// Panics if x is longer than 584 years (1.8e10 seconds)
// Stolen from `bencher`.
//
// NOTE: We don't have a proper black box in stable Rust. This is
// a workaround implementation, that may have a too big performance overhead,
// depending on operation, or it may fail to properly avoid having code
// optimized out. It is good enough that it is used by default.
//
// A function that is opaque to the optimizer, to allow benchmarks to
// pretend to use outputs to assist in avoiding dead-code
// elimination.