scaling/lib.rs
1/*!
2A lightweight micro-benchmarking library which:
3
4* uses linear regression to screen off constant error;
5* handles benchmarks which mutate state;
6* can measure simple polynomial or exponential scaling behavior
7* is very easy to use!
8
9`scaling` is designed to work with either slow or fast functions.
10It's forked from [easybench], which is itself inspired by [criterion],
11but doesn't do as much sophisticated
12analysis (no outlier detection, no HTML output).
13
14[easybench]: https://crates.io/crates/easybench
15[criterion]: https://crates.io/crates/criterion
16
17```
18use scaling::{bench,bench_env,bench_scaling};
19
20# fn fib(_: usize) -> usize { 0 }
21#
22// Simple benchmarks are performed with `bench` or `bench_scaling`.
23println!("fib 200: {}", bench(|| fib(200) ));
24println!("fib 500: {}", bench(|| fib(500) ));
25println!("fib scaling: {}", bench_scaling(|n| fib(n), 0));
26
27// If a function needs to mutate some state, use `bench_env`.
28println!("reverse: {}", bench_env(vec![0;100], |xs| xs.reverse() ));
29println!("sort: {}", bench_env(vec![0;100], |xs| xs.sort() ));
30```
31
32Running the above yields the following results:
33
34```none
35fib 200: 50ns (R²=0.995, 20435 iterations in 68 samples)
36fib 500: 144ns (R²=0.999, 7235 iterations in 57 samples)
37fib scaling: 0.30ns/N (R²=0.999, 8645 iterations in 59 samples)
38reverse: 46ns (R²=0.990, 30550 iterations in 72 samples)
39sort: 137ns (R²=0.991, 187129 iterations in 91 samples)
40```
41
42Easy! However, please read the [caveats](#caveats) below before using.
43
44# Benchmarking algorithm
45
46An *iteration* is a single execution of your code. A *sample* is a measurement,
47during which your code may be run many times. In other words: taking a sample
48means performing some number of iterations and measuring the total time.
49
50The first sample we take performs only 1 iteration, but as we continue
51taking samples we increase the number of iterations with increasing
52rapidity. We
53stop either when a global time limit is reached (currently 10 seconds),
54or when we have collected sufficient statistics (but have run for at
55least a millisecond).
56
57If a benchmark requires some state to run, `n` copies of the initial state are
58prepared before the sample is taken.
59
60Once we have the data, we perform OLS linear regression to find out how
61the sample time varies with the number of iterations in the sample. The
62gradient of the regression line tells us how long it takes to perform a
63single iteration of the benchmark. The R² value is a measure of how much
64noise there is in the data.
65
66If the function is too slow (5 or 10 seconds), the linear regression is skipped,
67and a simple average of timings is used. For slow functions, any overhead will
68be negligible.
69
70# Caveats
71
72## Caveat 1: Harness overhead
73
74**TL;DR: Compile with `--release`; the overhead is likely to be within the
75**noise of your
76benchmark.**
77
78Any work which `scaling` does once-per-sample is ignored (this is the purpose of the linear
79regression technique described above). However, work which is done once-per-iteration *will* be
80counted in the final times.
81
82* In the case of [`bench()`] this amounts to incrementing the loop counter and
83 [copying the return value](#bonus-caveat-black-box).
84* In the case of [`bench_env`] and [`bench_gen_env`], we also do a lookup into a big vector in
85 order to get the environment for that iteration.
86* If you compile your program unoptimised, there may be additional overhead.
87
88The cost of the above operations depend on the details of your benchmark;
89namely: (1) how large is the return value? and (2) does the benchmark evict
90the environment vector from the CPU cache? In practice, these criteria are only
91satisfied by longer-running benchmarks, making these effects hard to measure.
92
93## Caveat 2: Pure functions
94
95**TL;DR: Return enough information to prevent the optimiser from eliminating
96code from your benchmark.**
97
98Benchmarking pure functions involves a nasty gotcha which users should be
99aware of. Consider the following benchmarks:
100
101```
102# use scaling::{bench,bench_env};
103#
104# fn fib(_: usize) -> usize { 0 }
105#
106let fib_1 = bench(|| fib(500) ); // fine
107let fib_2 = bench(|| { fib(500); } ); // spoiler: NOT fine
108let fib_3 = bench_env(0, |x| { *x = fib(500); } ); // also fine, but ugly
109# let _ = (fib_1, fib_2, fib_3);
110```
111
112The results are a little surprising:
113
114```none
115fib_1: 110 ns (R²=1.000, 9131585 iterations in 144 samples)
116fib_2: 0 ns (R²=1.000, 413289203 iterations in 184 samples)
117fib_3: 109 ns (R²=1.000, 9131585 iterations in 144 samples)
118```
119
120Oh, `fib_2`, why do you lie? The answer is: `fib(500)` is pure, and its
121return value is immediately thrown away, so the optimiser replaces the call
122with a no-op (which clocks in at 0 ns).
123
124What about the other two? `fib_1` looks very similar, with one exception:
125the closure which we're benchmarking returns the result of the `fib(500)`
126call. When it runs your code, `scaling` takes the return value and tricks the
127optimiser into thinking it's going to use it for something, before throwing
128it away. This is why `fib_1` is safe from having code accidentally eliminated.
129
130In the case of `fib_3`, we actually *do* use the return value: each
131iteration we take the result of `fib(500)` and store it in the iteration's
132environment. This has the desired effect, but looks a bit weird.
133
134## Bonus caveat: Black box
135
136The function which `scaling` uses to trick the optimiser (`black_box`)
137is stolen from [bencher], which [states]:
138
139[bencher]: https://docs.rs/bencher/
140[states]: https://docs.rs/bencher/0.1.2/bencher/fn.black_box.html
141
142> NOTE: We don't have a proper black box in stable Rust. This is a workaround
143> implementation, that may have a too big performance overhead, depending on
144> operation, or it may fail to properly avoid having code optimized out. It
145> is good enough that it is used by default.
146*/
147
148use std::f64;
149use std::fmt::{self, Display, Formatter};
150use std::time::*;
151
152// We try to spend at very most this many seconds (roughly) in total on
153// each benchmark.
154const BENCH_TIME_MAX_DESPERATION: Duration = Duration::from_secs(120);
155// We try to spend at most this many seconds (roughly) in total on
156// each benchmark.
157const BENCH_TIME_MAX: Duration = Duration::from_secs(10);
158// We try to spend at least this many seconds in total on each
159// benchmark.
160const BENCH_TIME_MIN: Duration = Duration::from_millis(1);
161
162/// Statistics for a benchmark run.
163#[derive(Debug, PartialEq, Clone)]
164pub struct Stats {
165 /// The time, in nanoseconds, per iteration. If the benchmark generated
166 /// fewer than 2 samples in the allotted time then this will be NaN.
167 pub ns_per_iter: f64,
168 /// The coefficient of determination, R².
169 ///
170 /// This is an indication of how noisy the benchmark was, where 1 is
171 /// good and 0 is bad. Be suspicious of values below 0.9.
172 pub goodness_of_fit: f64,
173 /// How many times the benchmarked code was actually run.
174 pub iterations: usize,
175 /// How many samples were taken (ie. how many times we allocated the
176 /// environment and measured the time).
177 pub samples: usize,
178}
179
180impl Display for Stats {
181 fn fmt(&self, f: &mut Formatter) -> fmt::Result {
182 if self.ns_per_iter.is_nan() {
183 write!(
184 f,
185 "Only generated {} sample(s) - we can't fit a regression line to that! \
186 Try making your benchmark faster.",
187 self.samples
188 )
189 } else {
190 let per_iter = Duration::from_nanos(self.ns_per_iter as u64);
191 let per_iter = format!("{:?}", per_iter);
192 write!(
193 f,
194 "{:>11} (R²={:.3}, {} iterations in {} samples)",
195 per_iter, self.goodness_of_fit, self.iterations, self.samples
196 )
197 }
198 }
199}
200
201/// Run a benchmark.
202///
203/// The return value of `f` is not used, but we trick the optimiser into
204/// thinking we're going to use it. Make sure to return enough information
205/// to prevent the optimiser from eliminating code from your benchmark! (See
206/// the module docs for more.)
207pub fn bench<F, O>(mut f: F) -> Stats
208where
209 F: FnMut() -> O,
210{
211 bench_env((), |_| f())
212}
213
214/// Run a benchmark with an environment.
215///
216/// The value `env` is a clonable prototype for the "benchmark
217/// environment". Each iteration receives a freshly-cloned mutable copy of
218/// this environment. The time taken to clone the environment is not included
219/// in the results.
220///
221/// Nb: it's very possible that we will end up allocating many (>10,000)
222/// copies of `env` at the same time. Probably best to keep it small.
223///
224/// See `bench` and the module docs for more.
225///
226/// ## Overhead
227///
228/// Every iteration, `bench_env` performs a lookup into a big vector in
229/// order to get the environment for that iteration. If your benchmark
230/// is memory-intensive then this could, in the worst case, amount to a
231/// systematic cache-miss (ie. this vector would have to be fetched from
232/// DRAM at the start of every iteration). In this case the results could be
233/// affected by a hundred nanoseconds. This is a worst-case scenario however,
234/// and I haven't actually been able to trigger it in practice... but it's
235/// good to be aware of the possibility.
236pub fn bench_env<F, I, O>(env: I, f: F) -> Stats
237where
238 F: FnMut(&mut I) -> O,
239 I: Clone,
240{
241 bench_gen_env(move || env.clone(), f)
242}
243
244/// Run a benchmark with a generated environment.
245///
246/// The function `gen_env` creates the "benchmark environment" for the
247/// computation. Each iteration receives a freshly-created environment. The
248/// time taken to create the environment is not included in the results.
249///
250/// Nb: it's very possible that we will end up generating many (>10,000)
251/// copies of `env` at the same time. Probably best to keep it small.
252///
253/// See `bench` and the module docs for more.
254///
255/// ## Overhead
256///
257/// Every iteration, `bench_gen_env` performs a lookup into a big vector
258/// in order to get the environment for that iteration. If your benchmark
259/// is memory-intensive then this could, in the worst case, amount to a
260/// systematic cache-miss (ie. this vector would have to be fetched from
261/// DRAM at the start of every iteration). In this case the results could be
262/// affected by a hundred nanoseconds. This is a worst-case scenario however,
263/// and I haven't actually been able to trigger it in practice... but it's
264/// good to be aware of the possibility.
265pub fn bench_gen_env<G, F, I, O>(mut gen_env: G, mut f: F) -> Stats
266where
267 G: FnMut() -> I,
268 F: FnMut(&mut I) -> O,
269{
270 let mut data = Vec::new();
271 // The time we started the benchmark (not used in results)
272 let bench_start = Instant::now();
273
274 // Collect data until BENCH_TIME_MAX is reached.
275 for iters in slow_fib(BENCH_SCALE_TIME) {
276 // Prepare the environments - one per iteration
277 let mut xs = std::iter::repeat_with(&mut gen_env)
278 .take(iters)
279 .collect::<Vec<I>>();
280 // Start the clock
281 let iter_start = Instant::now();
282 // We iterate over `&mut xs` rather than draining it, because we
283 // don't want to drop the env values until after the clock has stopped.
284 for x in &mut xs {
285 // Run the code and pretend to use the output
286 pretend_to_use(f(x));
287 }
288 let time = iter_start.elapsed();
289 data.push((iters, time));
290
291 let elapsed = bench_start.elapsed();
292 if elapsed > BENCH_TIME_MIN && data.len() > 3 {
293 // If the first iter in a sample is consistently slow, that's fine -
294 // that's why we do the linear regression. If the first sample is slower
295 // than the rest, however, that's not fine. Therefore, we discard the
296 // first sample as a cache-warming exercise.
297
298 // Compute some stats
299 let (grad, r2) = regression(&data[1..]);
300 let stats = Stats {
301 ns_per_iter: grad,
302 goodness_of_fit: r2,
303 iterations: data[1..].iter().map(|&(x, _)| x).sum(),
304 samples: data[1..].len(),
305 };
306 if elapsed > BENCH_TIME_MAX || r2 > 0.99 {
307 return stats;
308 }
309 } else if elapsed > BENCH_TIME_MAX {
310 let total_time: f64 = data.iter().map(|(_, t)| t.as_nanos() as f64).sum();
311 let iterations = data.iter().map(|&(x, _)| x).sum();
312 return Stats {
313 ns_per_iter: total_time / iterations as f64,
314 iterations,
315 goodness_of_fit: 0.0,
316 samples: data.len(),
317 };
318 }
319 }
320 unreachable!()
321}
322
323/// Statistics for a benchmark run determining the scaling of a function.
324#[derive(Debug, PartialEq, Clone)]
325pub struct ScalingStats {
326 pub scaling: Scaling,
327 pub goodness_of_fit: f64,
328 /// How many times the benchmarked code was actually run.
329 pub iterations: usize,
330 /// How many samples were taken (ie. how many times we allocated the
331 /// environment and measured the time).
332 pub samples: usize,
333}
334/// The timing and scaling results (without statistics) for a benchmark.
335#[derive(Debug, PartialEq, Clone)]
336pub struct Scaling {
337 /// The scaling power
338 /// If this is 2, for instance, you have an O(N²) algorithm.
339 pub power: usize,
340 /// An exponetial behavior, i.e. 2ᴺ
341 pub exponential: usize,
342 /// The time, in nanoseconds, per scaled size of the problem. If
343 /// the problem scales as O(N²) for instance, this is the number
344 /// of nanoseconds per N².
345 pub ns_per_scale: f64,
346}
347
348impl Display for ScalingStats {
349 fn fmt(&self, f: &mut Formatter) -> fmt::Result {
350 write!(
351 f,
352 "{} (R²={:.3}, {} iterations in {} samples)",
353 self.scaling, self.goodness_of_fit, self.iterations, self.samples
354 )
355 }
356}
357impl Display for Scaling {
358 fn fmt(&self, f: &mut Formatter) -> fmt::Result {
359 let per_iter = Duration::from_nanos(self.ns_per_scale as u64);
360 let per_iter = if self.ns_per_scale < 1.0 {
361 format!("{:.2}ns", self.ns_per_scale)
362 } else if self.ns_per_scale < 10.0 {
363 format!("{:.1}ns", self.ns_per_scale)
364 } else {
365 format!("{:?}", per_iter)
366 };
367 if self.exponential == 1 {
368 match self.power {
369 0 => write!(f, "{:>8}/iter", per_iter),
370 1 => write!(f, "{:>8}/N ", per_iter),
371 2 => write!(f, "{:>8}/N² ", per_iter),
372 3 => write!(f, "{:>8}/N³ ", per_iter),
373 4 => write!(f, "{:>8}/N⁴ ", per_iter),
374 5 => write!(f, "{:>8}/N⁵ ", per_iter),
375 6 => write!(f, "{:>8}/N⁶ ", per_iter),
376 7 => write!(f, "{:>8}/N⁷ ", per_iter),
377 8 => write!(f, "{:>8}/N⁸ ", per_iter),
378 9 => write!(f, "{:>8}/N⁹ ", per_iter),
379 _ => write!(f, "{:>8}/N^{}", per_iter, self.power),
380 }
381 } else {
382 match self.power {
383 0 => write!(f, "{:>8}/{}ᴺ", per_iter, self.exponential),
384 1 => write!(f, "{:>8}/(N{}ᴺ) ", per_iter, self.exponential),
385 2 => write!(f, "{:>8}/(N²{}ᴺ) ", per_iter, self.exponential),
386 3 => write!(f, "{:>8}/(N³{}ᴺ) ", per_iter, self.exponential),
387 4 => write!(f, "{:>8}/(N⁴{}ᴺ) ", per_iter, self.exponential),
388 5 => write!(f, "{:>8}/(N⁵{}ᴺ) ", per_iter, self.exponential),
389 6 => write!(f, "{:>8}/(N⁶{}ᴺ) ", per_iter, self.exponential),
390 7 => write!(f, "{:>8}/(N⁷{}ᴺ) ", per_iter, self.exponential),
391 8 => write!(f, "{:>8}/(N⁸{}ᴺ) ", per_iter, self.exponential),
392 9 => write!(f, "{:>8}/(N⁹{}ᴺ) ", per_iter, self.exponential),
393 _ => write!(f, "{:>8}/(N^{}{}ᴺ)", per_iter, self.power, self.exponential),
394 }
395 }
396 }
397}
398
399/// Benchmark the power-law scaling of the function
400///
401/// This function assumes that the function scales as 𝑶(𝑁ᴾ𝐸ᴺ).
402/// It conisders higher powers for faster functions, and tries to
403/// keep the measuring time around 10s. It measures the power ᴾ and exponential base 𝐸
404/// based on n R² goodness of fit parameter.
405pub fn bench_scaling<F, O>(f: F, nmin: usize) -> ScalingStats
406where
407 F: Fn(usize) -> O,
408{
409 let mut data = Vec::new();
410 // The time we started the benchmark (not used in results)
411 let bench_start = Instant::now();
412
413 // Collect data until BENCH_TIME_MAX is reached.
414 for iters in slow_fib(BENCH_SCALE_TIME) {
415 // Prepare the environments - nmin per iteration
416 let n = if nmin > 0 { iters * nmin } else { iters };
417 // Generate a Vec holding n's to hopefully keep the optimizer
418 // from lifting the function out of the loop, as it could if
419 // we had `f(n)` in there, and `f` were inlined or `const`.
420 let xs = vec![n; iters];
421 // Start the clock
422 let iter_start = Instant::now();
423 for x in xs.into_iter() {
424 // Run the code and pretend to use the output
425 pretend_to_use(f(x));
426 }
427 let time = iter_start.elapsed();
428 data.push((n, iters, time));
429
430 let elapsed = bench_start.elapsed();
431 if elapsed > BENCH_TIME_MIN {
432 let stats = compute_scaling_gen(&data);
433 if elapsed > BENCH_TIME_MAX_DESPERATION
434 || (elapsed > BENCH_TIME_MAX && stats.goodness_of_fit > 0.0)
435 || stats.goodness_of_fit > 0.99
436 {
437 return stats;
438 }
439 }
440 }
441 unreachable!()
442}
443
444/// Benchmark the power-law scaling of the function with generated input
445///
446/// This function is like [`bench_scaling`], but uses a generating function
447/// to construct the input to your benchmarked function.
448///
449/// This function assumes that the function scales as 𝑶(𝑁ᴾ𝐸ᴺ).
450/// It conisders higher powers for faster functions, and tries to
451/// keep the measuring time around 10s. It measures the power ᴾ and exponential base 𝐸
452/// based on n R² goodness of fit parameter.
453///
454/// # Example
455/// ```
456/// use scaling::bench_scaling_gen;
457///
458/// let summation = bench_scaling_gen(|n| vec![3.0; n], |v| v.iter().cloned().sum::<f64>(),0);
459/// println!("summation: {}", summation);
460/// assert_eq!(1, summation.scaling.power); // summation must run in linear time.
461/// ```
462/// which gives output
463/// ```none
464/// summation: 43ns/N (R²=0.996, 445 iterations in 29 samples)
465/// ```
466pub fn bench_scaling_gen<G, F, I, O>(mut gen_env: G, f: F, nmin: usize) -> ScalingStats
467where
468 G: FnMut(usize) -> I,
469 F: Fn(&mut I) -> O,
470{
471 let mut data = Vec::new();
472 // The time we started the benchmark (not used in results)
473 let bench_start = Instant::now();
474
475 let mut am_slow = false;
476 // Collect data until BENCH_TIME_MAX is reached.
477 for iters in slow_fib(BENCH_SCALE_TIME) {
478 // Prepare the environments - nmin per iteration
479 let n = if nmin > 0 { iters * nmin } else { iters };
480 let iters = if am_slow { 1 + (iters & 1) } else { iters };
481 let mut xs = std::iter::repeat_with(|| gen_env(n))
482 .take(iters)
483 .collect::<Vec<I>>();
484 // Start the clock
485 let iter_start = Instant::now();
486 // We iterate over `&mut xs` rather than draining it, because we
487 // don't want to drop the env values until after the clock has stopped.
488 for x in &mut xs {
489 // Run the code and pretend to use the output
490 pretend_to_use(f(x));
491 }
492 let time = iter_start.elapsed();
493 if !am_slow && iters == 1 && time > Duration::from_micros(1) {
494 am_slow = true;
495 }
496 data.push((n, iters, time));
497
498 let elapsed = bench_start.elapsed();
499 if elapsed > BENCH_TIME_MIN {
500 let stats = compute_scaling_gen(&data);
501 if elapsed > BENCH_TIME_MAX_DESPERATION
502 || (elapsed > BENCH_TIME_MAX && stats.goodness_of_fit > 0.0)
503 || stats.goodness_of_fit > 0.99
504 {
505 return stats;
506 }
507 }
508 }
509 println!("how did I get here?!");
510 unreachable!()
511}
512
513/// This function assumes that the function scales as 𝑶(𝑁ᴾ𝐸ᴺ). It measures the scaling
514/// based on n R² goodness of fit parameter, and returns the best fit.
515/// If it believes itself clueless, the goodness_of_fit is set to zero.
516fn compute_scaling_gen(data: &[(usize, usize, Duration)]) -> ScalingStats {
517 let num_n = {
518 let mut ns = data.iter().map(|(n, _, _)| *n).collect::<Vec<_>>();
519 ns.dedup();
520 ns.len()
521 };
522
523 // If the first iter in a sample is consistently slow, that's fine -
524 // that's why we do the linear regression. If the first sample is slower
525 // than the rest, however, that's not fine. Therefore, we discard the
526 // first sample as a cache-warming exercise.
527
528 // Compute some stats for each of several different
529 // powers, to see which seems most accurate.
530 let mut stats = Vec::new();
531 let mut best = 0;
532 let mut second_best = 0;
533 for i in 1..num_n / 2 + 2 {
534 for power in 0..i {
535 let exponential = i - power;
536 let pdata: Vec<_> = data[1..]
537 .iter()
538 .map(|&(n, i, t)| {
539 (
540 (exponential as f64).powi(n as i32)
541 * (n as f64).powi(power as i32)
542 * (i as f64),
543 t,
544 )
545 })
546 .collect();
547 let (grad, r2) = fregression(&pdata);
548 stats.push(ScalingStats {
549 scaling: Scaling {
550 power,
551 exponential,
552 ns_per_scale: grad,
553 },
554 goodness_of_fit: r2,
555 iterations: data[1..].iter().map(|&(x, _, _)| x).sum(),
556 samples: data[1..].len(),
557 });
558 if r2 > stats[best].goodness_of_fit || stats[best].goodness_of_fit.is_nan() {
559 second_best = best;
560 best = stats.len() - 1;
561 }
562 }
563 }
564
565 if num_n < 10 || stats[second_best].goodness_of_fit == stats[best].goodness_of_fit {
566 stats[best].goodness_of_fit = 0.0;
567 } else {
568 // println!("finished...");
569 // for s in stats.iter() {
570 // println!(" {}", s);
571 // }
572 // for d in data[data.len()-4..].iter() {
573 // println!(" {}, {} -> {} ns", d.0, d.1, d.2.as_nanos());
574 // }
575 // println!("best is {}", stats[best]);
576 }
577 stats[best].clone()
578}
579
580/// Compute the OLS linear regression line for the given data set, returning
581/// the line's gradient and R². Requires at least 2 samples.
582//
583// Overflows:
584//
585// * sum(x * x): num_samples <= 0.5 * log_k (1 + 2 ^ 64 (FACTOR - 1))
586fn regression(data: &[(usize, Duration)]) -> (f64, f64) {
587 if data.len() < 2 {
588 return (f64::NAN, f64::NAN);
589 }
590 // Do all the arithmetic using f64, because it can happen that the
591 // squared numbers to overflow using integer arithmetic if the
592 // tests are too fast (so we run too many iterations).
593 let data: Vec<_> = data
594 .iter()
595 .map(|&(x, y)| (x as f64, y.as_nanos() as f64))
596 .collect();
597 let n = data.len() as f64;
598 let nxbar = data.iter().map(|&(x, _)| x).sum::<f64>(); // iter_time > 5e-11 ns
599 let nybar = data.iter().map(|&(_, y)| y).sum::<f64>(); // TIME_LIMIT < 2 ^ 64 ns
600 let nxxbar = data.iter().map(|&(x, _)| x * x).sum::<f64>(); // num_iters < 13_000_000_000
601 let nyybar = data.iter().map(|&(_, y)| y * y).sum::<f64>(); // TIME_LIMIT < 4.3 e9 ns
602 let nxybar = data.iter().map(|&(x, y)| x * y).sum::<f64>();
603 let ncovar = nxybar - ((nxbar * nybar) / n);
604 let nxvar = nxxbar - ((nxbar * nxbar) / n);
605 let nyvar = nyybar - ((nybar * nybar) / n);
606 let gradient = ncovar / nxvar;
607 let r2 = (ncovar * ncovar) / (nxvar * nyvar);
608 assert!(r2.is_nan() || r2 <= 1.0);
609 (gradient, r2)
610}
611
612/// Compute the OLS linear regression line for the given data set, returning
613/// the line's gradient and R². Requires at least 2 samples.
614//
615// Overflows:
616//
617// * sum(x * x): num_samples <= 0.5 * log_k (1 + 2 ^ 64 (FACTOR - 1))
618fn fregression(data: &[(f64, Duration)]) -> (f64, f64) {
619 if data.len() < 2 {
620 return (f64::NAN, f64::NAN);
621 }
622 // Do all the arithmetic using f64, because it can happen that the
623 // squared numbers to overflow using integer arithmetic if the
624 // tests are too fast (so we run too many iterations).
625 let data: Vec<_> = data
626 .iter()
627 .map(|&(x, y)| (x as f64, y.as_nanos() as f64))
628 .collect();
629 let n = data.len() as f64;
630 let xbar = data.iter().map(|&(x, _)| x).sum::<f64>() / n;
631 let xvar = data.iter().map(|&(x, _)| (x - xbar).powi(2)).sum::<f64>() / n;
632 let ybar = data.iter().map(|&(_, y)| y).sum::<f64>() / n;
633 let yvar = data.iter().map(|&(_, y)| (y - ybar).powi(2)).sum::<f64>() / n;
634 let covar = data
635 .iter()
636 .map(|&(x, y)| (x - xbar) * (y - ybar))
637 .sum::<f64>()
638 / n;
639 let gradient = covar / xvar;
640 let r2 = covar.powi(2) / (xvar * yvar);
641 assert!(r2.is_nan() || r2 <= 1.0);
642 (gradient, r2)
643}
644
645// Stolen from `bencher`, where it's known as `black_box`.
646//
647// NOTE: We don't have a proper black box in stable Rust. This is a workaround
648// implementation, that may have a too big performance overhead, depending
649// on operation, or it may fail to properly avoid having code optimized
650// out. It is good enough that it is used by default.
651//
652// A function that is opaque to the optimizer, to allow benchmarks to pretend
653// to use outputs to assist in avoiding dead-code elimination.
654fn pretend_to_use<T>(dummy: T) -> T {
655 unsafe {
656 let ret = ::std::ptr::read_volatile(&dummy);
657 ::std::mem::forget(dummy);
658 ret
659 }
660}
661
662#[cfg(test)]
663mod tests {
664 use super::*;
665 use std::thread;
666 use std::time::Duration;
667
668 fn fib(n: usize) -> usize {
669 let mut i = 0;
670 let mut sum = 0;
671 let mut last = 0;
672 let mut curr = 1usize;
673 while i < n - 1 {
674 sum = curr.wrapping_add(last);
675 last = curr;
676 curr = sum;
677 i += 1;
678 }
679 sum
680 }
681
682 // This is only here because doctests don't work with `--nocapture`.
683 #[test]
684 #[ignore]
685 fn doctests_again() {
686 println!();
687 println!("fib 200: {}", bench(|| fib(200)));
688 println!("fib 500: {}", bench(|| fib(500)));
689 println!("fib scaling: {}", bench_scaling(|n| fib(n), 0));
690 println!("reverse: {}", bench_env(vec![0; 100], |xs| xs.reverse()));
691 println!("sort: {}", bench_env(vec![0; 100], |xs| xs.sort()));
692
693 // This is fine:
694 println!("fib 1: {}", bench(|| fib(500)));
695 // This is NOT fine:
696 println!(
697 "fib 2: {}",
698 bench(|| {
699 fib(500);
700 })
701 );
702 // This is also fine, but a bit weird:
703 println!(
704 "fib 3: {}",
705 bench_env(0, |x| {
706 *x = fib(500);
707 })
708 );
709 }
710
711 #[test]
712 fn scales_o_one() {
713 println!();
714 let stats = bench_scaling(|_| thread::sleep(Duration::from_millis(10)), 1);
715 println!("O(N): {}", stats);
716 assert_eq!(stats.scaling.power, 0);
717 println!(" error: {:e}", stats.scaling.ns_per_scale - 1e7);
718 assert!((stats.scaling.ns_per_scale - 1e7).abs() < 1e6);
719 assert!(format!("{}", stats).contains("samples"));
720 }
721
722 #[test]
723 fn scales_o_n() {
724 println!();
725 let stats = bench_scaling(|n| thread::sleep(Duration::from_millis(10 * n as u64)), 1);
726 println!("O(N): {}", stats);
727 assert_eq!(stats.scaling.power, 1);
728 println!(" error: {:e}", stats.scaling.ns_per_scale - 1e7);
729 assert!((stats.scaling.ns_per_scale - 1e7).abs() < 1e5);
730
731 println!("Summing integers");
732 let stats = bench_scaling_gen(
733 |n| (0..n as u64).collect::<Vec<_>>(),
734 |v| v.iter().cloned().sum::<u64>(),
735 1,
736 );
737 println!("O(N): {}", stats);
738 println!(" error: {:e}", stats.scaling.ns_per_scale - 1e7);
739 assert_eq!(stats.scaling.power, 1);
740 }
741
742 #[test]
743 fn scales_o_n_log_n_looks_like_n() {
744 println!("Sorting integers");
745 let stats = bench_scaling_gen(
746 |n| {
747 (0..n as u64)
748 .map(|i| (i * 13 + 5) % 137)
749 .collect::<Vec<_>>()
750 },
751 |v| v.sort(),
752 1,
753 );
754 println!("O(N log N): {}", stats);
755 println!(" error: {:e}", stats.scaling.ns_per_scale - 1e7);
756 assert_eq!(stats.scaling.power, 1);
757 }
758
759 #[test]
760 fn scales_o_2_to_the_n() {
761 println!();
762 let stats = bench_scaling(|n| thread::sleep(Duration::from_nanos((1 << n) as u64)), 1);
763 println!("O(2ᴺ): {}", stats);
764 assert_eq!(stats.scaling.power, 0);
765 assert_eq!(stats.scaling.exponential, 2);
766 println!(" error: {:e}", stats.scaling.ns_per_scale - 1.0);
767 assert!((stats.scaling.ns_per_scale - 1.0).abs() < 0.2);
768 }
769
770 #[test]
771 fn scales_o_n_square() {
772 println!();
773 let stats = bench_scaling(
774 |n| thread::sleep(Duration::from_millis(10 * (n * n) as u64)),
775 1,
776 );
777 println!("O(N): {}", stats);
778 assert_eq!(stats.scaling.power, 2);
779 println!(" error: {:e}", stats.scaling.ns_per_scale - 1e7);
780 assert!((stats.scaling.ns_per_scale - 1e7).abs() < 1e5);
781 }
782
783 #[test]
784 fn very_quick() {
785 println!();
786 println!("very quick: {}", bench(|| {}));
787 }
788
789 #[test]
790 fn very_slow() {
791 println!();
792 let stats = bench(|| thread::sleep(Duration::from_millis(400)));
793 println!("very slow: {}", stats);
794 assert!(stats.ns_per_iter > 399.0e6);
795 assert_eq!(3, stats.samples);
796 }
797
798 #[test]
799 fn painfully_slow() {
800 println!();
801 let stats = bench(|| thread::sleep(Duration::from_secs(11)));
802 println!("painfully slow: {}", stats);
803 println!("ns {}", stats.ns_per_iter);
804 assert!(stats.ns_per_iter > 11.0e9);
805 assert_eq!(1, stats.iterations);
806 }
807
808 #[test]
809 fn sadly_slow() {
810 println!();
811 let stats = bench(|| thread::sleep(Duration::from_secs(6)));
812 println!("sadly slow: {}", stats);
813 println!("ns {}", stats.ns_per_iter);
814 assert!(stats.ns_per_iter > 6.0e9);
815 assert_eq!(2, stats.iterations);
816 }
817
818 #[test]
819 fn test_sleep() {
820 println!();
821 println!(
822 "sleep 1 ms: {}",
823 bench(|| thread::sleep(Duration::from_millis(1)))
824 );
825 }
826
827 #[test]
828 fn noop() {
829 println!();
830 println!("noop base: {}", bench(|| {}));
831 println!("noop 0: {}", bench_env(vec![0u64; 0], |_| {}));
832 println!("noop 16: {}", bench_env(vec![0u64; 16], |_| {}));
833 println!("noop 64: {}", bench_env(vec![0u64; 64], |_| {}));
834 println!("noop 256: {}", bench_env(vec![0u64; 256], |_| {}));
835 println!("noop 512: {}", bench_env(vec![0u64; 512], |_| {}));
836 }
837
838 #[test]
839 fn ret_value() {
840 println!();
841 println!(
842 "no ret 32: {}",
843 bench_env(vec![0u64; 32], |x| { x.clone() })
844 );
845 println!("return 32: {}", bench_env(vec![0u64; 32], |x| x.clone()));
846 println!(
847 "no ret 256: {}",
848 bench_env(vec![0u64; 256], |x| { x.clone() })
849 );
850 println!(
851 "return 256: {}",
852 bench_env(vec![0u64; 256], |x| x.clone())
853 );
854 println!(
855 "no ret 1024: {}",
856 bench_env(vec![0u64; 1024], |x| { x.clone() })
857 );
858 println!(
859 "return 1024: {}",
860 bench_env(vec![0u64; 1024], |x| x.clone())
861 );
862 println!(
863 "no ret 4096: {}",
864 bench_env(vec![0u64; 4096], |x| { x.clone() })
865 );
866 println!(
867 "return 4096: {}",
868 bench_env(vec![0u64; 4096], |x| x.clone())
869 );
870 println!(
871 "no ret 50000: {}",
872 bench_env(vec![0u64; 50000], |x| { x.clone() })
873 );
874 println!(
875 "return 50000: {}",
876 bench_env(vec![0u64; 50000], |x| x.clone())
877 );
878 }
879}
880
881// Each time we take a sample we increase the number of iterations
882// using a slow version of the Fibonacci sequence, which
883// asymptotically grows exponentially, but also gives us a different
884// value each time (except for repeating 1 twice, once for warmup).
885
886// For our standard `bench_*` we use slow_fib(25), which was chosen to
887// asymptotically match the prior behavior of the library, which grew
888// by an exponential of 1.1.
889const BENCH_SCALE_TIME: usize = 25;
890
891fn slow_fib(scale_time: usize) -> impl Iterator<Item = usize> {
892 #[derive(Debug)]
893 struct SlowFib {
894 which: usize,
895 buffer: Vec<usize>,
896 }
897 impl Iterator for SlowFib {
898 type Item = usize;
899 fn next(&mut self) -> Option<usize> {
900 // println!("!!! {:?}", self);
901 let oldwhich = self.which;
902 self.which = (self.which + 1) % self.buffer.len();
903 self.buffer[self.which] = self.buffer[oldwhich] + self.buffer[self.which];
904 Some(self.buffer[self.which])
905 }
906 }
907 assert!(scale_time > 3);
908 let mut buffer = vec![1; scale_time];
909 // buffer needs just the two zeros to make it start with two 1
910 // values. The rest should be 1s.
911 buffer[1] = 0;
912 buffer[2] = 0;
913 SlowFib { which: 0, buffer }
914}
915
916#[test]
917fn test_fib() {
918 // The following code was used to demonstrate that asymptotically
919 // the SlowFib grows as the 1.1 power, just as the old code. It
920 // differs in that it increases linearly at the beginning, which
921 // leads to larger numbers earlier in the sequence. It also
922 // differs in that it does not repeat any numbers in the sequence,
923 // which hopefully leads to better linear regression, particularly
924 // if we can only run a few iterations.
925 let mut prev = 1;
926 for x in slow_fib(25).take(200) {
927 let rat = x as f64 / prev as f64;
928 println!("ratio: {}/{} = {}", prev, x, rat);
929 prev = x;
930 }
931 let five: Vec<_> = slow_fib(25).take(5).collect();
932 assert_eq!(&five, &[1, 1, 2, 3, 4]);
933 let more: Vec<_> = slow_fib(25).take(32).collect();
934 assert_eq!(
935 &more,
936 &[
937 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
938 24, 25, 26, 28, 31, 35, 40, 46,
939 ]
940 );
941 let previous_sequence: Vec<_> = (0..32).map(|n| (1.1f64).powi(n).round() as usize).collect();
942 assert_eq!(
943 &previous_sequence,
944 &[
945 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9, 10, 11, 12, 13,
946 14, 16, 17, 19,
947 ]
948 );
949 let previous_sequence: Vec<_> = (20..40)
950 .map(|n| (1.1f64).powi(n).round() as usize)
951 .collect();
952 assert_eq!(
953 &previous_sequence,
954 &[7, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 23, 26, 28, 31, 34, 37, 41,]
955 );
956}