1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
/*!
A lightweight benchmarking library which:
* uses linear regression to screen off sources of constant error;
* handles benchmarks which must mutate some state;
* has a very simple API!
It's inspired by [criterion], but doesn't do as much sophisticated analysis. Perhaps some day it
will!
[criterion]: https://hackage.haskell.org/package/criterion
```
use easybench::{bench,bench_env};
# fn fib(n: usize) -> usize { 0 }
#
// Simple benchmarks are performed with `bench`.
println!("fib 200: {}", bench(|| fib(200) ));
println!("fib 500: {}", bench(|| fib(500) ));
// If a function needs to mutate some state, use `bench_env`.
println!("reverse: {}", bench_env(vec![1,2,3], |xs| xs.reverse()));
println!("sort: {}", bench_env(vec![1,2,3], |xs| xs.sort()));
```
Running the above yields the following results (make sure you compile with `--release`):
```none
fib 200: 38 ns (R²=1.000, 26053498 iterations in 155 samples)
fib 500: 109 ns (R²=1.000, 9131585 iterations in 144 samples)
reverse: 3 ns (R²=0.998, 23684997 iterations in 154 samples)
sort: 3 ns (R²=0.999, 23684997 iterations in 154 samples)
```
## Caveat: pure functions
Benchmarking pure functions involves a nasty gotcha which users should be aware of. Consider the
following benchmarks:
```
# use easybench::{bench,bench_env};
#
# fn fib(n: usize) -> usize { 0 }
#
let fib_1 = bench(|| fib(500) ); // fine
let fib_2 = bench(|| { fib(500); } ); // spoiler: NOT fine
let fib_3 = bench_env(0, |x| { *x = fib(500); } ); // also fine, but ugly
# let _ = (fib_1, fib_2, fib_3);
```
The results are a little surprising:
```none
fib_1: 110 ns (R²=1.000, 9131585 iterations in 144 samples)
fib_2: 0 ns (R²=1.000, 413289203 iterations in 184 samples)
fib_3: 109 ns (R²=1.000, 9131585 iterations in 144 samples)
```
Oh, `fib_2`, why do you lie? The answer is: because `fib(500)` is pure, and its return value is
immediately thrown away, so the optimiser replaces the call with a no-op (which clocks in at 0 ns).
What about the other two? `fib_1` works because the closure passed to `bench` returns the result of
the `fib(500)` call. Easybench takes whatever your code returns and tricks the optimiser into
thinking it's going to do something with it. In `fib_3`, we actually *do* use the return value - we
store it in the benchmark's private mutable state. This works fine but looks a bit weird.
The moral of the story: when benchmarking pure functions, make sure to return enough information to
prevent the optimiser from eliminating code from your benchmark!
## The benchmarking algorithm
First, let me define "sample" and "iteration". An *iteration* is a single execution of your code. A
*sample* is a measurement, during which your code may be run many times. That is: taking a sample
means performing some number of iterations and measuring the total time.
We begin by taking a sample and throwing away the results, in an effort to warm up some caches.
Now we start collecting data. The first sample performs only 1 iteration, but as we continue taking
samples we increase the number of iterations exponentially. We stop when a time limit is reached
(currently 1 second).
Next, we perform OLS regression on the resulting data. The gradient of the regression line is our
measure of the time it takes to perform a single iteration of the benchmark. The R² value is a
measure of how much noise there is in the data.
*/
use ;
use ;
/// Statistics for a benchmark run.
// Each time we take a sample we increase the number of iterations:
// iters = ITER_SCALE_FACTOR ^ sample_no
const ITER_SCALE_FACTOR: f64 = 1.1;
// We try to spend this many seconds (roughly) in total on each benchmark.
const BENCH_TIME_LIMIT_SECS: u64 = 1;
/// Run a benchmark.
///
/// The return value of `f` is not used, but we trick the optimiser into thinking we're going to
/// use it. Make sure to return enough information to prevent the optimiser from eliminating code
/// from your benchmark! (See the module docs for more.)
/// Run a benchmark with an environment.
///
/// The value `env` is a clonable prototype for the "benchmark environment". Each iteration
/// recieves a freshly-cloned mutable copy of this environment. The time taken to clone the
/// environment is not included in the results.
///
/// Nb: it's very possible that we will end up allocating many (>10,000) copies of `env` at the
/// same time. Probably best to keep it small.
///
/// See `bench` and the module docs for more.
/// Compute the OLS linear regression line for the given data set, returning the line's gradient
/// and R².
// Warning: overflows possible. TODO: check for them.
// Stolen from `bencher`.
//
// NOTE: We don't have a proper black box in stable Rust. This is
// a workaround implementation, that may have a too big performance overhead,
// depending on operation, or it may fail to properly avoid having code
// optimized out. It is good enough that it is used by default.
//
// A function that is opaque to the optimizer, to allow benchmarks to
// pretend to use outputs to assist in avoiding dead-code
// elimination.