easybench_wasm/
lib.rs

1/*!
2This is a wasm32 browser friendly fork of [easybench]
3
4A lightweight micro-benchmarking library which:
5
6* uses linear regression to screen off constant error;
7* handles benchmarks which mutate state;
8* is very easy to use!
9
10Easybench-wasm is designed for benchmarks with a running time in the range `1 ns < x < 1 ms` - results
11may be unreliable if benchmarks are very quick or very slow. It's inspired by [criterion], but
12doesn't do as much sophisticated analysis (no outlier detection, no HTML output).
13
14[easybench]: https://docs.rs/easybench
15[criterion]: https://hackage.haskell.org/package/criterion
16
17``` ignore
18// example marked as "not tested" because doc tests don't run in wasm32
19use easybench_wasm::{bench,bench_env};
20use web_sys::console;
21
22# fn fib(_: usize) -> usize { 0 }
23#
24// Simple benchmarks are performed with `bench`.
25console::log_1(&format!("fib 200: {}", bench(|| fib(200) )).into());
26console::log_1(&format!("fib 500: {}", bench(|| fib(500) )).into());
27
28// If a function needs to mutate some state, use `bench_env`.
29console::log_1(&format!("reverse: {}", bench_env(vec![0;100], |xs| xs.reverse() )).into());
30console::log_1(&format!("sort:    {}", bench_env(vec![0;100], |xs| xs.sort()    )).into());
31```
32
33Running the above yields the following results:
34
35```none
36fib 200:         38 ns   (R²=1.000, 26053497 iterations in 154 samples)
37fib 500:        110 ns   (R²=1.000, 9131584 iterations in 143 samples)
38reverse:         54 ns   (R²=0.999, 5669992 iterations in 138 samples)
39sort:            93 ns   (R²=1.000, 4685942 iterations in 136 samples)
40```
41
42Easy! However, please read the [caveats](#caveats) below before using.
43
44# Benchmarking algorithm
45
46An *iteration* is a single execution of your code. A *sample* is a measurement, during which your
47code may be run many times. In other words: taking a sample means performing some number of
48iterations and measuring the total time.
49
50The first sample we take performs only 1 iteration, but as we continue taking samples we increase
51the number of iterations exponentially. We stop when a global time limit is reached (currently 1
52second).
53
54If a benchmark must mutate some state while running, before taking a sample `n` copies of the
55initial state are prepared, where `n` is the number of iterations in that sample.
56
57Once we have the data, we perform OLS linear regression to find out how the sample time varies with
58the number of iterations in the sample. The gradient of the regression line tells us how long it
59takes to perform a single iteration of the benchmark. The R² value is a measure of how much noise
60there is in the data.
61
62# Caveats
63
64## Caveat 1: Harness overhead
65
66**TL;DR: Compile with `--release`; the overhead is likely to be within the noise of your
67benchmark.**
68
69Any work which easybench-wasm does once-per-sample is ignored (this is the purpose of the linear
70regression technique described above). However, work which is done once-per-iteration *will* be
71counted in the final times.
72
73* In the case of [`bench`] this amounts to incrementing the loop counter and [copying the return
74  value](#bonus-caveat-black-box).
75* In the case of [`bench_env`], we also do a lookup into a big vector in order to get the
76  environment for that iteration.
77* If you compile your program unoptimised, there may be additional overhead.
78
79[`bench`]: fn.bench.html
80[`bench_env`]: fn.bench_env.html
81
82The cost of the above operations depend on the details of your benchmark; namely: (1) how large is
83the return value? and (2) does the benchmark evict the environment vector from the CPU cache? In
84practice, these criteria are only satisfied by longer-running benchmarks, making these effects hard
85to measure.
86
87If you have concerns about the results you're seeing, please take a look at [the inner loop of
88`bench_env`][source]. The whole library `cloc`s in at under 100 lines of code, so it's pretty easy
89to read.
90
91[source]: ../src/easybench-wasm/lib.rs.html#280-285
92
93## Caveat 2: Sufficient data
94
95**TL;DR: Measurements are unreliable when code takes too long (> 1 ms) to run.**
96
97Each benchmark collects data for 1 second. This means that in order to collect a statistically
98significant amount of data, your code should run much faster than this.
99
100When inspecting the results, make sure things look statistically significant. In particular:
101
102* Make sure the number of samples is big enough. More than 100 is probably OK.
103* Make sure the R² isn't suspiciously low. It's easy to achieve a high R² value when the number of
104  samples is small, so unfortunately the definition of "suspiciously low" depends on how many
105  samples were taken.  As a rule of thumb, expect values greater than 0.99.
106
107## Caveat 3: Pure functions
108
109**TL;DR: Return enough information to prevent the optimiser from eliminating code from your
110benchmark.**
111
112Benchmarking pure functions involves a nasty gotcha which users should be aware of. Consider the
113following benchmarks:
114
115``` ignore
116// example marked as "not tested" because doc tests don't run in wasm32
117# use easybench_wasm::{bench,bench_env};
118#
119# fn fib(_: usize) -> usize { 0 }
120#
121let fib_1 = bench(|| fib(500) );                     // fine
122let fib_2 = bench(|| { fib(500); } );                // spoiler: NOT fine
123let fib_3 = bench_env(0, |x| { *x = fib(500); } );   // also fine, but ugly
124# let _ = (fib_1, fib_2, fib_3);
125```
126
127The results are a little surprising:
128
129```none
130fib_1:        110 ns   (R²=1.000, 9131585 iterations in 144 samples)
131fib_2:          0 ns   (R²=1.000, 413289203 iterations in 184 samples)
132fib_3:        109 ns   (R²=1.000, 9131585 iterations in 144 samples)
133```
134
135Oh, `fib_2`, why do you lie? The answer is: `fib(500)` is pure, and its return value is immediately
136thrown away, so the optimiser replaces the call with a no-op (which clocks in at 0 ns).
137
138What about the other two? `fib_1` looks very similar, with one exception: the closure which we're
139benchmarking returns the result of the `fib(500)` call. When it runs your code, easybench-wasm takes the
140return value and tricks the optimiser into thinking it's going to use it for something, before
141throwing it away. This is why `fib_1` is safe from having code accidentally eliminated.
142
143In the case of `fib_3`, we actually *do* use the return value: each iteration we take the result of
144`fib(500)` and store it in the iteration's environment. This has the desired effect, but looks a
145bit weird.
146
147## Bonus caveat: Black box
148
149The function which easybench-wasm uses to trick the optimiser (`black_box`) is stolen from [bencher],
150which [states]:
151
152[bencher]: https://docs.rs/bencher/
153[states]: https://docs.rs/bencher/0.1.2/bencher/fn.black_box.html
154
155> NOTE: We don't have a proper black box in stable Rust. This is a workaround implementation, that
156> may have a too big performance overhead, depending on operation, or it may fail to properly avoid
157> having code optimized out. It is good enough that it is used by default.
158*/
159
160extern crate web_sys;
161extern crate humantime;
162
163use std::f64;
164use std::fmt::{self,Display,Formatter};
165use std::time::Duration;
166
167// Each time we take a sample we increase the number of iterations:
168//      iters = ITER_SCALE_FACTOR ^ sample_no
169const ITER_SCALE_FACTOR: f64 = 1.1;
170
171// We try to spend this many seconds (roughly) in total on each benchmark.
172const BENCH_TIME_LIMIT_SECS: f64 = 1.0;
173
174/// Statistics for a benchmark run.
175#[derive(Debug, PartialEq, Clone)]
176pub struct Stats {
177    /// The time, in nanoseconds, per iteration. If the benchmark generated fewer than 2 samples in
178    /// the allotted time then this will be NaN.
179    pub ns_per_iter: f64,
180    /// The coefficient of determination, R².
181    ///
182    /// This is an indication of how noisy the benchmark was, where 1 is good and 0 is bad. Be
183    /// suspicious of values below 0.9.
184    pub goodness_of_fit: f64,
185    /// How many times the benchmarked code was actually run.
186    pub iterations: usize,
187    /// How many samples were taken (ie. how many times we allocated the environment and measured
188    /// the time).
189    pub samples: usize,
190}
191
192impl Display for Stats {
193    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
194        if self.ns_per_iter.is_nan() {
195            write!(f, "Only generated {} sample(s) - we can't fit a regression line to that! \
196            Try making your benchmark faster.", self.samples)
197        } else {
198            let per_iter: humantime::Duration = Duration::from_nanos(self.ns_per_iter as u64).into();
199            let per_iter = format!("{}", per_iter);
200            write!(f, "{:>11} (R²={:.3}, {} iterations in {} samples)",
201                per_iter, self.goodness_of_fit, self.iterations, self.samples)
202        }
203    }
204}
205
206/// Run a benchmark.
207///
208/// The return value of `f` is not used, but we trick the optimiser into thinking we're going to
209/// use it. Make sure to return enough information to prevent the optimiser from eliminating code
210/// from your benchmark! (See the module docs for more.)
211pub fn bench<F, O>(f: F) -> Stats where F: Fn() -> O {
212    bench_env_limit(BENCH_TIME_LIMIT_SECS, (), |_| f() )
213}
214
215/// Run a benchmark with an environment.
216///
217/// The value `env` is a clonable prototype for the "benchmark environment". Each iteration
218/// receives a freshly-cloned mutable copy of this environment. The time taken to clone the
219/// environment is not included in the results.
220///
221/// Nb: it's very possible that we will end up allocating many (>10,000) copies of `env` at the
222/// same time. Probably best to keep it small.
223///
224/// See [bench](fn.bench.html) and the module docs for more.
225///
226/// ## Overhead
227///
228/// Every iteration, `bench_env` performs a lookup into a big vector in order to get the
229/// environment for that iteration. If your benchmark is memory-intensive then this could, in the
230/// worst case, amount to a systematic cache-miss (ie. this vector would have to be fetched from
231/// DRAM at the start of every iteration). In this case the results could be affected by a hundred
232/// nanoseconds. This is a worst-case scenario however, and I haven't actually been able to trigger
233/// it in practice... but it's good to be aware of the possibility.
234pub fn bench_env<F, I, O>(env: I, f: F) -> Stats where F: Fn(&mut I) -> O, I: Clone {
235    bench_env_limit_ref(BENCH_TIME_LIMIT_SECS, env, f )
236}
237
238/// Run a benchmark, specifying the run time limit.
239///
240/// See [bench](fn.bench.html)
241pub fn bench_limit<F, O>(time_limit_secs: f64, f: F) -> Stats where F: Fn() -> O {
242    bench_env_limit(time_limit_secs, (), |_| f() )
243}
244
245/// Run a benchmark with an environment, specifying the run time limit.
246///
247/// See [bench_env](fn.bench_env.html)
248pub fn bench_env_limit<F, I, O>(time_limit_secs: f64, env: I, f: F) -> Stats where F: Fn(I) -> O, I: Clone {
249    run_bench(time_limit_secs, env,
250        |xs| {
251            for i in xs.into_iter() {
252                pretend_to_use(f(i));             // Run the code and pretend to use the output
253            }
254        }
255    )
256}
257
258/// Run a benchmark with an environment, specifying the run time limit. The function to bench takes a mutable reference
259/// to the `env` parameter instead of a struct.
260///
261/// See [bench_env](fn.bench_env.html)
262pub fn bench_env_limit_ref<F, I, O>(time_limit_secs: f64, env: I, f: F) -> Stats where F: Fn(&mut I) -> O, I: Clone {
263    run_bench(time_limit_secs, env,
264        |mut xs| {
265            for i in xs.iter_mut() {
266                pretend_to_use(f(i));             // Run the code and pretend to use the output
267            }
268        }
269    )
270}
271
272fn run_bench<F, I>(time_limit_secs: f64, env: I, f: F) -> Stats where F: Fn(Vec<I>), I: Clone {
273    let mut data = Vec::new();
274    let performance = web_sys::window()
275        .expect("'Window' not available. This package should only be used in wasm32 inside a browser or headless browser")
276        .performance()
277        .expect("'Performance' not available. This package should only be used in wasm32 inside a browser or headless browser");
278
279    let bench_start = performance.now(); // The time we started the benchmark (not used in results)
280
281    // Collect data until BENCH_TIME_LIMIT_SECS is reached.
282    while (performance.now() - bench_start) < (time_limit_secs * 1000_f64) {
283        let iters = ITER_SCALE_FACTOR.powi(data.len() as i32).round() as usize;
284        let xs = vec![env.clone();iters]; // Prepare the environments - one per iteration
285        let iter_start = performance.now();      // Start the clock
286        f(xs);
287        let time = performance.now() - iter_start;
288        data.push((iters, time));
289    }
290
291    // If the first iter in a sample is consistently slow, that's fine - that's why we do the
292    // linear regression. If the first sample is slower than the rest, however, that's not fine.
293    // Therefore, we discard the first sample as a cache-warming exercise.
294    data.remove(0);
295
296    // Compute some stats
297    let (grad, r2) = regression(&data[..]);
298    Stats {
299        ns_per_iter: grad,
300        goodness_of_fit: r2,
301        iterations: data.iter().map(|&(x,_)| x).sum(),
302        samples: data.len(),
303    }
304}
305
306/// Compute the OLS linear regression line for the given data set, returning the line's gradient
307/// and R². Requires at least 2 samples.
308//
309// Overflows:
310//
311// * sum(x * x): num_samples <= 0.5 * log_k (1 + 2 ^ 64 (FACTOR - 1))
312fn regression(data: &[(usize, f64)]) -> (f64, f64) {
313    if data.len() < 2 {
314        return (f64::NAN, f64::NAN);
315    }
316
317    let data: Vec<(u64, u64)> = data.iter().map(|&(x,y)| (x as u64, as_nanos(y))).collect();
318    let n = data.len() as f64;
319    let nxbar  = data.iter().map(|&(x,_)| x  ).sum::<u64>(); // iter_time > 5e-11 ns
320    let nybar  = data.iter().map(|&(_,y)| y  ).sum::<u64>(); // TIME_LIMIT < 2 ^ 64 ns
321    let nxxbar = data.iter().map(|&(x,_)| x*x).sum::<u64>(); // num_iters < 13_000_000_000
322    let nyybar = data.iter().map(|&(_,y)| y*y).sum::<u64>(); // TIME_LIMIT < 4.3 e9 ns
323    let nxybar = data.iter().map(|&(x,y)| x*y).sum::<u64>();
324    let ncovar = nxybar as f64 - ((nxbar * nybar) as f64 / n);
325    let nxvar  = nxxbar as f64 - ((nxbar * nxbar) as f64 / n);
326    let nyvar  = nyybar as f64 - ((nybar * nybar) as f64 / n);
327    let gradient = ncovar / nxvar;
328    let r2 = (ncovar * ncovar) / (nxvar * nyvar);
329    (gradient, r2)
330}
331
332fn as_nanos(x: f64) -> u64 {
333    (x * 1000_f64).round() as u64
334}
335
336// Stolen from `bencher`, where it's known as `black_box`.
337//
338// NOTE: We don't have a proper black box in stable Rust. This is
339// a workaround implementation, that may have a too big performance overhead,
340// depending on operation, or it may fail to properly avoid having code
341// optimized out. It is good enough that it is used by default.
342//
343// A function that is opaque to the optimizer, to allow benchmarks to
344// pretend to use outputs to assist in avoiding dead-code
345// elimination.
346fn pretend_to_use<T>(dummy: T) -> T {
347    unsafe {
348        let ret = ::std::ptr::read_volatile(&dummy);
349        ::std::mem::forget(dummy);
350        ret
351    }
352}
easybench_wasm/lib.rs

easybench_wasm/
lib.rs