scaling/
lib.rs

1/*!
2A lightweight micro-benchmarking library which:
3
4* uses linear regression to screen off constant error;
5* handles benchmarks which mutate state;
6* can measure simple polynomial or exponential scaling behavior
7* is very easy to use!
8
9`scaling` is designed to work with either slow or fast functions.
10It's forked from [easybench], which is itself inspired by [criterion],
11but doesn't do as much sophisticated
12analysis (no outlier detection, no HTML output).
13
14[easybench]: https://crates.io/crates/easybench
15[criterion]: https://crates.io/crates/criterion
16
17```
18use scaling::{bench,bench_env,bench_scaling};
19
20# fn fib(_: usize) -> usize { 0 }
21#
22// Simple benchmarks are performed with `bench` or `bench_scaling`.
23println!("fib 200: {}", bench(|| fib(200) ));
24println!("fib 500: {}", bench(|| fib(500) ));
25println!("fib scaling: {}", bench_scaling(|n| fib(n), 0));
26
27// If a function needs to mutate some state, use `bench_env`.
28println!("reverse: {}", bench_env(vec![0;100], |xs| xs.reverse() ));
29println!("sort:    {}", bench_env(vec![0;100], |xs| xs.sort()    ));
30```
31
32Running the above yields the following results:
33
34```none
35fib 200:        50ns (R²=0.995, 20435 iterations in 68 samples)
36fib 500:       144ns (R²=0.999, 7235 iterations in 57 samples)
37fib scaling:   0.30ns/N    (R²=0.999, 8645 iterations in 59 samples)
38reverse:        46ns (R²=0.990, 30550 iterations in 72 samples)
39sort:          137ns (R²=0.991, 187129 iterations in 91 samples)
40```
41
42Easy! However, please read the [caveats](#caveats) below before using.
43
44# Benchmarking algorithm
45
46An *iteration* is a single execution of your code. A *sample* is a measurement,
47during which your code may be run many times. In other words: taking a sample
48means performing some number of iterations and measuring the total time.
49
50The first sample we take performs only 1 iteration, but as we continue
51taking samples we increase the number of iterations with increasing
52rapidity. We
53stop either when a global time limit is reached (currently 10 seconds),
54or when we have collected sufficient statistics (but have run for at
55least a millisecond).
56
57If a benchmark requires some state to run, `n` copies of the initial state are
58prepared before the sample is taken.
59
60Once we have the data, we perform OLS linear regression to find out how
61the sample time varies with the number of iterations in the sample. The
62gradient of the regression line tells us how long it takes to perform a
63single iteration of the benchmark. The R² value is a measure of how much
64noise there is in the data.
65
66If the function is too slow (5 or 10 seconds), the linear regression is skipped,
67and a simple average of timings is used.  For slow functions, any overhead will
68be negligible.
69
70# Caveats
71
72## Caveat 1: Harness overhead
73
74**TL;DR: Compile with `--release`; the overhead is likely to be within the
75**noise of your
76benchmark.**
77
78Any work which `scaling` does once-per-sample is ignored (this is the purpose of the linear
79regression technique described above). However, work which is done once-per-iteration *will* be
80counted in the final times.
81
82* In the case of [`bench()`] this amounts to incrementing the loop counter and
83  [copying the return value](#bonus-caveat-black-box).
84* In the case of [`bench_env`] and [`bench_gen_env`], we also do a lookup into a big vector in
85  order to get the environment for that iteration.
86* If you compile your program unoptimised, there may be additional overhead.
87
88The cost of the above operations depend on the details of your benchmark;
89namely: (1) how large is the return value? and (2) does the benchmark evict
90the environment vector from the CPU cache? In practice, these criteria are only
91satisfied by longer-running benchmarks, making these effects hard to measure.
92
93## Caveat 2: Pure functions
94
95**TL;DR: Return enough information to prevent the optimiser from eliminating
96code from your benchmark.**
97
98Benchmarking pure functions involves a nasty gotcha which users should be
99aware of. Consider the following benchmarks:
100
101```
102# use scaling::{bench,bench_env};
103#
104# fn fib(_: usize) -> usize { 0 }
105#
106let fib_1 = bench(|| fib(500) );                     // fine
107let fib_2 = bench(|| { fib(500); } );                // spoiler: NOT fine
108let fib_3 = bench_env(0, |x| { *x = fib(500); } );   // also fine, but ugly
109# let _ = (fib_1, fib_2, fib_3);
110```
111
112The results are a little surprising:
113
114```none
115fib_1:        110 ns   (R²=1.000, 9131585 iterations in 144 samples)
116fib_2:          0 ns   (R²=1.000, 413289203 iterations in 184 samples)
117fib_3:        109 ns   (R²=1.000, 9131585 iterations in 144 samples)
118```
119
120Oh, `fib_2`, why do you lie? The answer is: `fib(500)` is pure, and its
121return value is immediately thrown away, so the optimiser replaces the call
122with a no-op (which clocks in at 0 ns).
123
124What about the other two? `fib_1` looks very similar, with one exception:
125the closure which we're benchmarking returns the result of the `fib(500)`
126call. When it runs your code, `scaling` takes the return value and tricks the
127optimiser into thinking it's going to use it for something, before throwing
128it away. This is why `fib_1` is safe from having code accidentally eliminated.
129
130In the case of `fib_3`, we actually *do* use the return value: each
131iteration we take the result of `fib(500)` and store it in the iteration's
132environment. This has the desired effect, but looks a bit weird.
133
134## Bonus caveat: Black box
135
136The function which `scaling` uses to trick the optimiser (`black_box`)
137is stolen from [bencher], which [states]:
138
139[bencher]: https://docs.rs/bencher/
140[states]: https://docs.rs/bencher/0.1.2/bencher/fn.black_box.html
141
142> NOTE: We don't have a proper black box in stable Rust. This is a workaround
143> implementation, that may have a too big performance overhead, depending on
144> operation, or it may fail to properly avoid having code optimized out. It
145> is good enough that it is used by default.
146*/
147
148use std::f64;
149use std::fmt::{self, Display, Formatter};
150use std::time::*;
151
152// We try to spend at very most this many seconds (roughly) in total on
153// each benchmark.
154const BENCH_TIME_MAX_DESPERATION: Duration = Duration::from_secs(120);
155// We try to spend at most this many seconds (roughly) in total on
156// each benchmark.
157const BENCH_TIME_MAX: Duration = Duration::from_secs(10);
158// We try to spend at least this many seconds in total on each
159// benchmark.
160const BENCH_TIME_MIN: Duration = Duration::from_millis(1);
161
162/// Statistics for a benchmark run.
163#[derive(Debug, PartialEq, Clone)]
164pub struct Stats {
165    /// The time, in nanoseconds, per iteration. If the benchmark generated
166    /// fewer than 2 samples in the allotted time then this will be NaN.
167    pub ns_per_iter: f64,
168    /// The coefficient of determination, R².
169    ///
170    /// This is an indication of how noisy the benchmark was, where 1 is
171    /// good and 0 is bad. Be suspicious of values below 0.9.
172    pub goodness_of_fit: f64,
173    /// How many times the benchmarked code was actually run.
174    pub iterations: usize,
175    /// How many samples were taken (ie. how many times we allocated the
176    /// environment and measured the time).
177    pub samples: usize,
178}
179
180impl Display for Stats {
181    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
182        if self.ns_per_iter.is_nan() {
183            write!(
184                f,
185                "Only generated {} sample(s) - we can't fit a regression line to that! \
186                 Try making your benchmark faster.",
187                self.samples
188            )
189        } else {
190            let per_iter = Duration::from_nanos(self.ns_per_iter as u64);
191            let per_iter = format!("{:?}", per_iter);
192            write!(
193                f,
194                "{:>11} (R²={:.3}, {} iterations in {} samples)",
195                per_iter, self.goodness_of_fit, self.iterations, self.samples
196            )
197        }
198    }
199}
200
201/// Run a benchmark.
202///
203/// The return value of `f` is not used, but we trick the optimiser into
204/// thinking we're going to use it. Make sure to return enough information
205/// to prevent the optimiser from eliminating code from your benchmark! (See
206/// the module docs for more.)
207pub fn bench<F, O>(mut f: F) -> Stats
208where
209    F: FnMut() -> O,
210{
211    bench_env((), |_| f())
212}
213
214/// Run a benchmark with an environment.
215///
216/// The value `env` is a clonable prototype for the "benchmark
217/// environment". Each iteration receives a freshly-cloned mutable copy of
218/// this environment. The time taken to clone the environment is not included
219/// in the results.
220///
221/// Nb: it's very possible that we will end up allocating many (>10,000)
222/// copies of `env` at the same time. Probably best to keep it small.
223///
224/// See `bench` and the module docs for more.
225///
226/// ## Overhead
227///
228/// Every iteration, `bench_env` performs a lookup into a big vector in
229/// order to get the environment for that iteration. If your benchmark
230/// is memory-intensive then this could, in the worst case, amount to a
231/// systematic cache-miss (ie. this vector would have to be fetched from
232/// DRAM at the start of every iteration). In this case the results could be
233/// affected by a hundred nanoseconds. This is a worst-case scenario however,
234/// and I haven't actually been able to trigger it in practice... but it's
235/// good to be aware of the possibility.
236pub fn bench_env<F, I, O>(env: I, f: F) -> Stats
237where
238    F: FnMut(&mut I) -> O,
239    I: Clone,
240{
241    bench_gen_env(move || env.clone(), f)
242}
243
244/// Run a benchmark with a generated environment.
245///
246/// The function `gen_env` creates the "benchmark environment" for the
247/// computation. Each iteration receives a freshly-created environment. The
248/// time taken to create the environment is not included in the results.
249///
250/// Nb: it's very possible that we will end up generating many (>10,000)
251/// copies of `env` at the same time. Probably best to keep it small.
252///
253/// See `bench` and the module docs for more.
254///
255/// ## Overhead
256///
257/// Every iteration, `bench_gen_env` performs a lookup into a big vector
258/// in order to get the environment for that iteration. If your benchmark
259/// is memory-intensive then this could, in the worst case, amount to a
260/// systematic cache-miss (ie. this vector would have to be fetched from
261/// DRAM at the start of every iteration). In this case the results could be
262/// affected by a hundred nanoseconds. This is a worst-case scenario however,
263/// and I haven't actually been able to trigger it in practice... but it's
264/// good to be aware of the possibility.
265pub fn bench_gen_env<G, F, I, O>(mut gen_env: G, mut f: F) -> Stats
266where
267    G: FnMut() -> I,
268    F: FnMut(&mut I) -> O,
269{
270    let mut data = Vec::new();
271    // The time we started the benchmark (not used in results)
272    let bench_start = Instant::now();
273
274    // Collect data until BENCH_TIME_MAX is reached.
275    for iters in slow_fib(BENCH_SCALE_TIME) {
276        // Prepare the environments - one per iteration
277        let mut xs = std::iter::repeat_with(&mut gen_env)
278            .take(iters)
279            .collect::<Vec<I>>();
280        // Start the clock
281        let iter_start = Instant::now();
282        // We iterate over `&mut xs` rather than draining it, because we
283        // don't want to drop the env values until after the clock has stopped.
284        for x in &mut xs {
285            // Run the code and pretend to use the output
286            pretend_to_use(f(x));
287        }
288        let time = iter_start.elapsed();
289        data.push((iters, time));
290
291        let elapsed = bench_start.elapsed();
292        if elapsed > BENCH_TIME_MIN && data.len() > 3 {
293            // If the first iter in a sample is consistently slow, that's fine -
294            // that's why we do the linear regression. If the first sample is slower
295            // than the rest, however, that's not fine.  Therefore, we discard the
296            // first sample as a cache-warming exercise.
297
298            // Compute some stats
299            let (grad, r2) = regression(&data[1..]);
300            let stats = Stats {
301                ns_per_iter: grad,
302                goodness_of_fit: r2,
303                iterations: data[1..].iter().map(|&(x, _)| x).sum(),
304                samples: data[1..].len(),
305            };
306            if elapsed > BENCH_TIME_MAX || r2 > 0.99 {
307                return stats;
308            }
309        } else if elapsed > BENCH_TIME_MAX {
310            let total_time: f64 = data.iter().map(|(_, t)| t.as_nanos() as f64).sum();
311            let iterations = data.iter().map(|&(x, _)| x).sum();
312            return Stats {
313                ns_per_iter: total_time / iterations as f64,
314                iterations,
315                goodness_of_fit: 0.0,
316                samples: data.len(),
317            };
318        }
319    }
320    unreachable!()
321}
322
323/// Statistics for a benchmark run determining the scaling of a function.
324#[derive(Debug, PartialEq, Clone)]
325pub struct ScalingStats {
326    pub scaling: Scaling,
327    pub goodness_of_fit: f64,
328    /// How many times the benchmarked code was actually run.
329    pub iterations: usize,
330    /// How many samples were taken (ie. how many times we allocated the
331    /// environment and measured the time).
332    pub samples: usize,
333}
334/// The timing and scaling results (without statistics) for a benchmark.
335#[derive(Debug, PartialEq, Clone)]
336pub struct Scaling {
337    /// The scaling power
338    /// If this is 2, for instance, you have an O(N²) algorithm.
339    pub power: usize,
340    /// An exponetial behavior, i.e. 2ᴺ
341    pub exponential: usize,
342    /// The time, in nanoseconds, per scaled size of the problem. If
343    /// the problem scales as O(N²) for instance, this is the number
344    /// of nanoseconds per N².
345    pub ns_per_scale: f64,
346}
347
348impl Display for ScalingStats {
349    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
350        write!(
351            f,
352            "{} (R²={:.3}, {} iterations in {} samples)",
353            self.scaling, self.goodness_of_fit, self.iterations, self.samples
354        )
355    }
356}
357impl Display for Scaling {
358    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
359        let per_iter = Duration::from_nanos(self.ns_per_scale as u64);
360        let per_iter = if self.ns_per_scale < 1.0 {
361            format!("{:.2}ns", self.ns_per_scale)
362        } else if self.ns_per_scale < 10.0 {
363            format!("{:.1}ns", self.ns_per_scale)
364        } else {
365            format!("{:?}", per_iter)
366        };
367        if self.exponential == 1 {
368            match self.power {
369                0 => write!(f, "{:>8}/iter", per_iter),
370                1 => write!(f, "{:>8}/N   ", per_iter),
371                2 => write!(f, "{:>8}/N²  ", per_iter),
372                3 => write!(f, "{:>8}/N³  ", per_iter),
373                4 => write!(f, "{:>8}/N⁴  ", per_iter),
374                5 => write!(f, "{:>8}/N⁵  ", per_iter),
375                6 => write!(f, "{:>8}/N⁶  ", per_iter),
376                7 => write!(f, "{:>8}/N⁷  ", per_iter),
377                8 => write!(f, "{:>8}/N⁸  ", per_iter),
378                9 => write!(f, "{:>8}/N⁹  ", per_iter),
379                _ => write!(f, "{:>8}/N^{}", per_iter, self.power),
380            }
381        } else {
382            match self.power {
383                0 => write!(f, "{:>8}/{}ᴺ", per_iter, self.exponential),
384                1 => write!(f, "{:>8}/(N{}ᴺ)   ", per_iter, self.exponential),
385                2 => write!(f, "{:>8}/(N²{}ᴺ)  ", per_iter, self.exponential),
386                3 => write!(f, "{:>8}/(N³{}ᴺ)  ", per_iter, self.exponential),
387                4 => write!(f, "{:>8}/(N⁴{}ᴺ)  ", per_iter, self.exponential),
388                5 => write!(f, "{:>8}/(N⁵{}ᴺ)  ", per_iter, self.exponential),
389                6 => write!(f, "{:>8}/(N⁶{}ᴺ)  ", per_iter, self.exponential),
390                7 => write!(f, "{:>8}/(N⁷{}ᴺ)  ", per_iter, self.exponential),
391                8 => write!(f, "{:>8}/(N⁸{}ᴺ)  ", per_iter, self.exponential),
392                9 => write!(f, "{:>8}/(N⁹{}ᴺ)  ", per_iter, self.exponential),
393                _ => write!(f, "{:>8}/(N^{}{}ᴺ)", per_iter, self.power, self.exponential),
394            }
395        }
396    }
397}
398
399/// Benchmark the power-law scaling of the function
400///
401/// This function assumes that the function scales as 𝑶(𝑁ᴾ𝐸ᴺ).
402/// It conisders higher powers for faster functions, and tries to
403/// keep the measuring time around 10s.  It measures the power ᴾ and exponential base 𝐸
404/// based on n R² goodness of fit parameter.
405pub fn bench_scaling<F, O>(f: F, nmin: usize) -> ScalingStats
406where
407    F: Fn(usize) -> O,
408{
409    let mut data = Vec::new();
410    // The time we started the benchmark (not used in results)
411    let bench_start = Instant::now();
412
413    // Collect data until BENCH_TIME_MAX is reached.
414    for iters in slow_fib(BENCH_SCALE_TIME) {
415        // Prepare the environments - nmin per iteration
416        let n = if nmin > 0 { iters * nmin } else { iters };
417        // Generate a Vec holding n's to hopefully keep the optimizer
418        // from lifting the function out of the loop, as it could if
419        // we had `f(n)` in there, and `f` were inlined or `const`.
420        let xs = vec![n; iters];
421        // Start the clock
422        let iter_start = Instant::now();
423        for x in xs.into_iter() {
424            // Run the code and pretend to use the output
425            pretend_to_use(f(x));
426        }
427        let time = iter_start.elapsed();
428        data.push((n, iters, time));
429
430        let elapsed = bench_start.elapsed();
431        if elapsed > BENCH_TIME_MIN {
432            let stats = compute_scaling_gen(&data);
433            if elapsed > BENCH_TIME_MAX_DESPERATION
434                || (elapsed > BENCH_TIME_MAX && stats.goodness_of_fit > 0.0)
435                || stats.goodness_of_fit > 0.99
436            {
437                return stats;
438            }
439        }
440    }
441    unreachable!()
442}
443
444/// Benchmark the power-law scaling of the function with generated input
445///
446/// This function is like [`bench_scaling`], but uses a generating function
447/// to construct the input to your benchmarked function.
448///
449/// This function assumes that the function scales as 𝑶(𝑁ᴾ𝐸ᴺ).
450/// It conisders higher powers for faster functions, and tries to
451/// keep the measuring time around 10s.  It measures the power ᴾ and exponential base 𝐸
452/// based on n R² goodness of fit parameter.
453///
454/// # Example
455/// ```
456/// use scaling::bench_scaling_gen;
457///
458/// let summation = bench_scaling_gen(|n| vec![3.0; n], |v| v.iter().cloned().sum::<f64>(),0);
459/// println!("summation: {}", summation);
460/// assert_eq!(1, summation.scaling.power); // summation must run in linear time.
461/// ```
462/// which gives output
463/// ```none
464/// summation:     43ns/N    (R²=0.996, 445 iterations in 29 samples)
465/// ```
466pub fn bench_scaling_gen<G, F, I, O>(mut gen_env: G, f: F, nmin: usize) -> ScalingStats
467where
468    G: FnMut(usize) -> I,
469    F: Fn(&mut I) -> O,
470{
471    let mut data = Vec::new();
472    // The time we started the benchmark (not used in results)
473    let bench_start = Instant::now();
474
475    let mut am_slow = false;
476    // Collect data until BENCH_TIME_MAX is reached.
477    for iters in slow_fib(BENCH_SCALE_TIME) {
478        // Prepare the environments - nmin per iteration
479        let n = if nmin > 0 { iters * nmin } else { iters };
480        let iters = if am_slow { 1 + (iters & 1) } else { iters };
481        let mut xs = std::iter::repeat_with(|| gen_env(n))
482            .take(iters)
483            .collect::<Vec<I>>();
484        // Start the clock
485        let iter_start = Instant::now();
486        // We iterate over `&mut xs` rather than draining it, because we
487        // don't want to drop the env values until after the clock has stopped.
488        for x in &mut xs {
489            // Run the code and pretend to use the output
490            pretend_to_use(f(x));
491        }
492        let time = iter_start.elapsed();
493        if !am_slow && iters == 1 && time > Duration::from_micros(1) {
494            am_slow = true;
495        }
496        data.push((n, iters, time));
497
498        let elapsed = bench_start.elapsed();
499        if elapsed > BENCH_TIME_MIN {
500            let stats = compute_scaling_gen(&data);
501            if elapsed > BENCH_TIME_MAX_DESPERATION
502                || (elapsed > BENCH_TIME_MAX && stats.goodness_of_fit > 0.0)
503                || stats.goodness_of_fit > 0.99
504            {
505                return stats;
506            }
507        }
508    }
509    println!("how did I get here?!");
510    unreachable!()
511}
512
513/// This function assumes that the function scales as 𝑶(𝑁ᴾ𝐸ᴺ).  It measures the scaling
514/// based on n R² goodness of fit parameter, and returns the best fit.
515/// If it believes itself clueless, the goodness_of_fit is set to zero.
516fn compute_scaling_gen(data: &[(usize, usize, Duration)]) -> ScalingStats {
517    let num_n = {
518        let mut ns = data.iter().map(|(n, _, _)| *n).collect::<Vec<_>>();
519        ns.dedup();
520        ns.len()
521    };
522
523    // If the first iter in a sample is consistently slow, that's fine -
524    // that's why we do the linear regression. If the first sample is slower
525    // than the rest, however, that's not fine.  Therefore, we discard the
526    // first sample as a cache-warming exercise.
527
528    // Compute some stats for each of several different
529    // powers, to see which seems most accurate.
530    let mut stats = Vec::new();
531    let mut best = 0;
532    let mut second_best = 0;
533    for i in 1..num_n / 2 + 2 {
534        for power in 0..i {
535            let exponential = i - power;
536            let pdata: Vec<_> = data[1..]
537                .iter()
538                .map(|&(n, i, t)| {
539                    (
540                        (exponential as f64).powi(n as i32)
541                            * (n as f64).powi(power as i32)
542                            * (i as f64),
543                        t,
544                    )
545                })
546                .collect();
547            let (grad, r2) = fregression(&pdata);
548            stats.push(ScalingStats {
549                scaling: Scaling {
550                    power,
551                    exponential,
552                    ns_per_scale: grad,
553                },
554                goodness_of_fit: r2,
555                iterations: data[1..].iter().map(|&(x, _, _)| x).sum(),
556                samples: data[1..].len(),
557            });
558            if r2 > stats[best].goodness_of_fit || stats[best].goodness_of_fit.is_nan() {
559                second_best = best;
560                best = stats.len() - 1;
561            }
562        }
563    }
564
565    if num_n < 10 || stats[second_best].goodness_of_fit == stats[best].goodness_of_fit {
566        stats[best].goodness_of_fit = 0.0;
567    } else {
568        // println!("finished...");
569        // for s in stats.iter() {
570        //     println!("  {}", s);
571        // }
572        // for d in data[data.len()-4..].iter() {
573        //     println!("    {}, {} -> {} ns", d.0, d.1, d.2.as_nanos());
574        // }
575        // println!("best is {}", stats[best]);
576    }
577    stats[best].clone()
578}
579
580/// Compute the OLS linear regression line for the given data set, returning
581/// the line's gradient and R². Requires at least 2 samples.
582//
583// Overflows:
584//
585// * sum(x * x): num_samples <= 0.5 * log_k (1 + 2 ^ 64 (FACTOR - 1))
586fn regression(data: &[(usize, Duration)]) -> (f64, f64) {
587    if data.len() < 2 {
588        return (f64::NAN, f64::NAN);
589    }
590    // Do all the arithmetic using f64, because it can happen that the
591    // squared numbers to overflow using integer arithmetic if the
592    // tests are too fast (so we run too many iterations).
593    let data: Vec<_> = data
594        .iter()
595        .map(|&(x, y)| (x as f64, y.as_nanos() as f64))
596        .collect();
597    let n = data.len() as f64;
598    let nxbar = data.iter().map(|&(x, _)| x).sum::<f64>(); // iter_time > 5e-11 ns
599    let nybar = data.iter().map(|&(_, y)| y).sum::<f64>(); // TIME_LIMIT < 2 ^ 64 ns
600    let nxxbar = data.iter().map(|&(x, _)| x * x).sum::<f64>(); // num_iters < 13_000_000_000
601    let nyybar = data.iter().map(|&(_, y)| y * y).sum::<f64>(); // TIME_LIMIT < 4.3 e9 ns
602    let nxybar = data.iter().map(|&(x, y)| x * y).sum::<f64>();
603    let ncovar = nxybar - ((nxbar * nybar) / n);
604    let nxvar = nxxbar - ((nxbar * nxbar) / n);
605    let nyvar = nyybar - ((nybar * nybar) / n);
606    let gradient = ncovar / nxvar;
607    let r2 = (ncovar * ncovar) / (nxvar * nyvar);
608    assert!(r2.is_nan() || r2 <= 1.0);
609    (gradient, r2)
610}
611
612/// Compute the OLS linear regression line for the given data set, returning
613/// the line's gradient and R². Requires at least 2 samples.
614//
615// Overflows:
616//
617// * sum(x * x): num_samples <= 0.5 * log_k (1 + 2 ^ 64 (FACTOR - 1))
618fn fregression(data: &[(f64, Duration)]) -> (f64, f64) {
619    if data.len() < 2 {
620        return (f64::NAN, f64::NAN);
621    }
622    // Do all the arithmetic using f64, because it can happen that the
623    // squared numbers to overflow using integer arithmetic if the
624    // tests are too fast (so we run too many iterations).
625    let data: Vec<_> = data
626        .iter()
627        .map(|&(x, y)| (x as f64, y.as_nanos() as f64))
628        .collect();
629    let n = data.len() as f64;
630    let xbar = data.iter().map(|&(x, _)| x).sum::<f64>() / n;
631    let xvar = data.iter().map(|&(x, _)| (x - xbar).powi(2)).sum::<f64>() / n;
632    let ybar = data.iter().map(|&(_, y)| y).sum::<f64>() / n;
633    let yvar = data.iter().map(|&(_, y)| (y - ybar).powi(2)).sum::<f64>() / n;
634    let covar = data
635        .iter()
636        .map(|&(x, y)| (x - xbar) * (y - ybar))
637        .sum::<f64>()
638        / n;
639    let gradient = covar / xvar;
640    let r2 = covar.powi(2) / (xvar * yvar);
641    assert!(r2.is_nan() || r2 <= 1.0);
642    (gradient, r2)
643}
644
645// Stolen from `bencher`, where it's known as `black_box`.
646//
647// NOTE: We don't have a proper black box in stable Rust. This is a workaround
648// implementation, that may have a too big performance overhead, depending
649// on operation, or it may fail to properly avoid having code optimized
650// out. It is good enough that it is used by default.
651//
652// A function that is opaque to the optimizer, to allow benchmarks to pretend
653// to use outputs to assist in avoiding dead-code elimination.
654fn pretend_to_use<T>(dummy: T) -> T {
655    unsafe {
656        let ret = ::std::ptr::read_volatile(&dummy);
657        ::std::mem::forget(dummy);
658        ret
659    }
660}
661
662#[cfg(test)]
663mod tests {
664    use super::*;
665    use std::thread;
666    use std::time::Duration;
667
668    fn fib(n: usize) -> usize {
669        let mut i = 0;
670        let mut sum = 0;
671        let mut last = 0;
672        let mut curr = 1usize;
673        while i < n - 1 {
674            sum = curr.wrapping_add(last);
675            last = curr;
676            curr = sum;
677            i += 1;
678        }
679        sum
680    }
681
682    // This is only here because doctests don't work with `--nocapture`.
683    #[test]
684    #[ignore]
685    fn doctests_again() {
686        println!();
687        println!("fib 200: {}", bench(|| fib(200)));
688        println!("fib 500: {}", bench(|| fib(500)));
689        println!("fib scaling: {}", bench_scaling(|n| fib(n), 0));
690        println!("reverse: {}", bench_env(vec![0; 100], |xs| xs.reverse()));
691        println!("sort:    {}", bench_env(vec![0; 100], |xs| xs.sort()));
692
693        // This is fine:
694        println!("fib 1:   {}", bench(|| fib(500)));
695        // This is NOT fine:
696        println!(
697            "fib 2:   {}",
698            bench(|| {
699                fib(500);
700            })
701        );
702        // This is also fine, but a bit weird:
703        println!(
704            "fib 3:   {}",
705            bench_env(0, |x| {
706                *x = fib(500);
707            })
708        );
709    }
710
711    #[test]
712    fn scales_o_one() {
713        println!();
714        let stats = bench_scaling(|_| thread::sleep(Duration::from_millis(10)), 1);
715        println!("O(N): {}", stats);
716        assert_eq!(stats.scaling.power, 0);
717        println!("   error: {:e}", stats.scaling.ns_per_scale - 1e7);
718        assert!((stats.scaling.ns_per_scale - 1e7).abs() < 1e6);
719        assert!(format!("{}", stats).contains("samples"));
720    }
721
722    #[test]
723    fn scales_o_n() {
724        println!();
725        let stats = bench_scaling(|n| thread::sleep(Duration::from_millis(10 * n as u64)), 1);
726        println!("O(N): {}", stats);
727        assert_eq!(stats.scaling.power, 1);
728        println!("   error: {:e}", stats.scaling.ns_per_scale - 1e7);
729        assert!((stats.scaling.ns_per_scale - 1e7).abs() < 1e5);
730
731        println!("Summing integers");
732        let stats = bench_scaling_gen(
733            |n| (0..n as u64).collect::<Vec<_>>(),
734            |v| v.iter().cloned().sum::<u64>(),
735            1,
736        );
737        println!("O(N): {}", stats);
738        println!("   error: {:e}", stats.scaling.ns_per_scale - 1e7);
739        assert_eq!(stats.scaling.power, 1);
740    }
741
742    #[test]
743    fn scales_o_n_log_n_looks_like_n() {
744        println!("Sorting integers");
745        let stats = bench_scaling_gen(
746            |n| {
747                (0..n as u64)
748                    .map(|i| (i * 13 + 5) % 137)
749                    .collect::<Vec<_>>()
750            },
751            |v| v.sort(),
752            1,
753        );
754        println!("O(N log N): {}", stats);
755        println!("   error: {:e}", stats.scaling.ns_per_scale - 1e7);
756        assert_eq!(stats.scaling.power, 1);
757    }
758
759    #[test]
760    fn scales_o_2_to_the_n() {
761        println!();
762        let stats = bench_scaling(|n| thread::sleep(Duration::from_nanos((1 << n) as u64)), 1);
763        println!("O(2ᴺ): {}", stats);
764        assert_eq!(stats.scaling.power, 0);
765        assert_eq!(stats.scaling.exponential, 2);
766        println!("   error: {:e}", stats.scaling.ns_per_scale - 1.0);
767        assert!((stats.scaling.ns_per_scale - 1.0).abs() < 0.2);
768    }
769
770    #[test]
771    fn scales_o_n_square() {
772        println!();
773        let stats = bench_scaling(
774            |n| thread::sleep(Duration::from_millis(10 * (n * n) as u64)),
775            1,
776        );
777        println!("O(N): {}", stats);
778        assert_eq!(stats.scaling.power, 2);
779        println!("   error: {:e}", stats.scaling.ns_per_scale - 1e7);
780        assert!((stats.scaling.ns_per_scale - 1e7).abs() < 1e5);
781    }
782
783    #[test]
784    fn very_quick() {
785        println!();
786        println!("very quick: {}", bench(|| {}));
787    }
788
789    #[test]
790    fn very_slow() {
791        println!();
792        let stats = bench(|| thread::sleep(Duration::from_millis(400)));
793        println!("very slow: {}", stats);
794        assert!(stats.ns_per_iter > 399.0e6);
795        assert_eq!(3, stats.samples);
796    }
797
798    #[test]
799    fn painfully_slow() {
800        println!();
801        let stats = bench(|| thread::sleep(Duration::from_secs(11)));
802        println!("painfully slow: {}", stats);
803        println!("ns {}", stats.ns_per_iter);
804        assert!(stats.ns_per_iter > 11.0e9);
805        assert_eq!(1, stats.iterations);
806    }
807
808    #[test]
809    fn sadly_slow() {
810        println!();
811        let stats = bench(|| thread::sleep(Duration::from_secs(6)));
812        println!("sadly slow: {}", stats);
813        println!("ns {}", stats.ns_per_iter);
814        assert!(stats.ns_per_iter > 6.0e9);
815        assert_eq!(2, stats.iterations);
816    }
817
818    #[test]
819    fn test_sleep() {
820        println!();
821        println!(
822            "sleep 1 ms: {}",
823            bench(|| thread::sleep(Duration::from_millis(1)))
824        );
825    }
826
827    #[test]
828    fn noop() {
829        println!();
830        println!("noop base: {}", bench(|| {}));
831        println!("noop 0:    {}", bench_env(vec![0u64; 0], |_| {}));
832        println!("noop 16:   {}", bench_env(vec![0u64; 16], |_| {}));
833        println!("noop 64:   {}", bench_env(vec![0u64; 64], |_| {}));
834        println!("noop 256:  {}", bench_env(vec![0u64; 256], |_| {}));
835        println!("noop 512:  {}", bench_env(vec![0u64; 512], |_| {}));
836    }
837
838    #[test]
839    fn ret_value() {
840        println!();
841        println!(
842            "no ret 32:    {}",
843            bench_env(vec![0u64; 32], |x| { x.clone() })
844        );
845        println!("return 32:    {}", bench_env(vec![0u64; 32], |x| x.clone()));
846        println!(
847            "no ret 256:   {}",
848            bench_env(vec![0u64; 256], |x| { x.clone() })
849        );
850        println!(
851            "return 256:   {}",
852            bench_env(vec![0u64; 256], |x| x.clone())
853        );
854        println!(
855            "no ret 1024:  {}",
856            bench_env(vec![0u64; 1024], |x| { x.clone() })
857        );
858        println!(
859            "return 1024:  {}",
860            bench_env(vec![0u64; 1024], |x| x.clone())
861        );
862        println!(
863            "no ret 4096:  {}",
864            bench_env(vec![0u64; 4096], |x| { x.clone() })
865        );
866        println!(
867            "return 4096:  {}",
868            bench_env(vec![0u64; 4096], |x| x.clone())
869        );
870        println!(
871            "no ret 50000: {}",
872            bench_env(vec![0u64; 50000], |x| { x.clone() })
873        );
874        println!(
875            "return 50000: {}",
876            bench_env(vec![0u64; 50000], |x| x.clone())
877        );
878    }
879}
880
881// Each time we take a sample we increase the number of iterations
882// using a slow version of the Fibonacci sequence, which
883// asymptotically grows exponentially, but also gives us a different
884// value each time (except for repeating 1 twice, once for warmup).
885
886// For our standard `bench_*` we use slow_fib(25), which was chosen to
887// asymptotically match the prior behavior of the library, which grew
888// by an exponential of 1.1.
889const BENCH_SCALE_TIME: usize = 25;
890
891fn slow_fib(scale_time: usize) -> impl Iterator<Item = usize> {
892    #[derive(Debug)]
893    struct SlowFib {
894        which: usize,
895        buffer: Vec<usize>,
896    }
897    impl Iterator for SlowFib {
898        type Item = usize;
899        fn next(&mut self) -> Option<usize> {
900            // println!("!!! {:?}", self);
901            let oldwhich = self.which;
902            self.which = (self.which + 1) % self.buffer.len();
903            self.buffer[self.which] = self.buffer[oldwhich] + self.buffer[self.which];
904            Some(self.buffer[self.which])
905        }
906    }
907    assert!(scale_time > 3);
908    let mut buffer = vec![1; scale_time];
909    // buffer needs just the two zeros to make it start with two 1
910    // values.  The rest should be 1s.
911    buffer[1] = 0;
912    buffer[2] = 0;
913    SlowFib { which: 0, buffer }
914}
915
916#[test]
917fn test_fib() {
918    // The following code was used to demonstrate that asymptotically
919    // the SlowFib grows as the 1.1 power, just as the old code.  It
920    // differs in that it increases linearly at the beginning, which
921    // leads to larger numbers earlier in the sequence.  It also
922    // differs in that it does not repeat any numbers in the sequence,
923    // which hopefully leads to better linear regression, particularly
924    // if we can only run a few iterations.
925    let mut prev = 1;
926    for x in slow_fib(25).take(200) {
927        let rat = x as f64 / prev as f64;
928        println!("ratio: {}/{} = {}", prev, x, rat);
929        prev = x;
930    }
931    let five: Vec<_> = slow_fib(25).take(5).collect();
932    assert_eq!(&five, &[1, 1, 2, 3, 4]);
933    let more: Vec<_> = slow_fib(25).take(32).collect();
934    assert_eq!(
935        &more,
936        &[
937            1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
938            24, 25, 26, 28, 31, 35, 40, 46,
939        ]
940    );
941    let previous_sequence: Vec<_> = (0..32).map(|n| (1.1f64).powi(n).round() as usize).collect();
942    assert_eq!(
943        &previous_sequence,
944        &[
945            1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9, 10, 11, 12, 13,
946            14, 16, 17, 19,
947        ]
948    );
949    let previous_sequence: Vec<_> = (20..40)
950        .map(|n| (1.1f64).powi(n).round() as usize)
951        .collect();
952    assert_eq!(
953        &previous_sequence,
954        &[7, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 23, 26, 28, 31, 34, 37, 41,]
955    );
956}