Crate canbench_rs

Expand description

canbench is a tool for benchmarking canisters on the Internet Computer.

§Quickstart

This example is also available to tinker with in the examples directory. See the fibonacci example.

§1. Install the `canbench` binary.

The canbench is what runs your canister’s benchmarks.

cargo install canbench

§2. Add optional dependency to `Cargo.toml`

Typically you do not want your benchmarks to be part of your canister when deploying it to the Internet Computer. Therefore, we include canbench only as an optional dependency so that it’s only included when running benchmarks. For more information about optional dependencies, you can read more about them here.

canbench-rs = { version = "x.y.z", optional = true }

§3. Add a configuration to `canbench.yml`

The canbench.yml configuration file tells canbench how to build and run you canister. Below is a typical configuration. Note that we’re compiling the canister with the canbench feature so that the benchmarking logic is included in the Wasm.

build_cmd:
  cargo build --release --target wasm32-unknown-unknown --locked --features canbench-rs

wasm_path:
  ./target/wasm32-unknown-unknown/release/<YOUR_CANISTER>.wasm

§Init Args

Init args can be specified using the init_args key in the configuration file:

init_args:
  hex: 4449444c0001710568656c6c6f

§Stable Memory

A file can be specified to be loaded in the canister’s stable memory after initialization.

stable_memory:
  file:
    stable_memory.bin

Contents of the stable memory file are loaded after the call to the canister's init method. Therefore, changes made to stable memory in the init method would be overwritten.

§4. Start benching! 🏋🏽

Let’s say we have a canister that exposes a query computing the fibonacci sequence of a given number. Here’s what that query can look like:

#[ic_cdk::query]
fn fibonacci(n: u32) -> u32 {
    if n == 0 {
        return 0;
    } else if n == 1 {
        return 1;
    }

    let mut a = 0;
    let mut b = 1;
    let mut result = 0;

    for _ in 2..=n {
        result = a + b;
        a = b;
        b = result;
    }

    result
}

Now, let’s add some benchmarks to this query:

#[cfg(feature = "canbench-rs")]
mod benches {
    use super::*;
    use canbench_rs::bench;


    #[bench]
    fn fibonacci_20() {
        // Prevent the compiler from optimizing the call and propagating constants.
        std::hint::black_box(fibonacci(std::hint::black_box(20)));
    }

    #[bench]
    fn fibonacci_45() {
        // Prevent the compiler from optimizing the call and propagating constants.
        std::hint::black_box(fibonacci(std::hint::black_box(45)));
    }
}

Run canbench. You’ll see an output that looks similar to this:

$ canbench

---------------------------------------------------

Benchmark: fibonacci_20 (new)
  total:
    instructions: 2301 (new)
    heap_increase: 0 pages (new)
    stable_memory_increase: 0 pages (new)

---------------------------------------------------

Benchmark: fibonacci_45 (new)
  total:
    instructions: 3088 (new)
    heap_increase: 0 pages (new)
    stable_memory_increase: 0 pages (new)

---------------------------------------------------

Executed 2 of 2 benchmarks.

§5. Track performance regressions

Notice that canbench reported the above benchmarks as “new”. canbench allows you to persist the results of these benchmarks. In subsequent runs, canbench reports the performance relative to the last persisted run.

Let’s first persist the results above by running canbench again, but with the persist flag:

$ canbench --persist
# optionally add `--csv` to generate a CSV report
$ canbench --persist --csv
...
---------------------------------------------------

Executed 2 of 2 benchmarks.
Successfully persisted results to canbench_results.yml

Now, if we run canbench again, canbench will run the benchmarks, and will additionally report that there were no changes detected in performance.

$ canbench
    Finished release [optimized] target(s) in 0.34s

---------------------------------------------------

Benchmark: fibonacci_20
  total:
    instructions: 2301 (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: fibonacci_45
  total:
    instructions: 3088 (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Executed 2 of 2 benchmarks.

Let’s try swapping out our implementation of fibonacci with an implementation that’s miserably inefficient. Replace the fibonacci function defined previously with the following:

#[ic_cdk::query]
fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 1,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

And running canbench again, we see that it detects and reports a regression.

$ canbench

---------------------------------------------------

Benchmark: fibonacci_20
  total:
    instructions: 337.93 K (regressed by 14586.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: fibonacci_45
  total:
    instructions: 56.39 B (regressed by 1826095830.76%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Executed 2 of 2 benchmarks.

Apparently, the recursive implementation is many orders of magnitude more expensive than the iterative implementation 😱 Good thing we found out before deploying this implementation to production.

Notice that fibonacci_45 took > 50B instructions, which is substantially more than the instruction limit given for a single message execution on the Internet Computer. canbench runs benchmarks in an environment that gives them up to 10T instructions.

§Additional Examples

For the following examples, we’ll be using the following canister code, which you can also find in the examples directory. This canister defines a simple state as well as a pre_upgrade function that stores that state into stable memory.

use candid::{CandidType, Encode};
use ic_cdk_macros::pre_upgrade;
use std::cell::RefCell;

#[derive(CandidType)]
struct User {
    name: String,
}

#[derive(Default, CandidType)]
struct State {
    users: std::collections::BTreeMap<u64, User>,
}

thread_local! {
    static STATE: RefCell<State> = RefCell::new(State::default());
}

#[pre_upgrade]
fn pre_upgrade() {
    // Serialize state.
    let bytes = STATE.with(|s| Encode!(s).unwrap());

    // Write to stable memory.
    ic_cdk::api::stable::StableWriter::default()
        .write(&bytes)
        .unwrap();
}

§Excluding setup code

Let’s say we want to benchmark how long it takes to run the pre_upgrade function. We can define the following benchmark:

#[cfg(feature = "canbench-rs")]
mod benches {
    use super::*;
    use canbench_rs::bench;


    #[bench]
    fn pre_upgrade_bench() {
        // Some function that fills the state with lots of data.
        initialize_state();

        pre_upgrade();
    }
}

The problem with the above benchmark is that it’s benchmarking both the pre_upgrade call and the initialization of the state. What if we’re only interested in benchmarking the pre_upgrade call? To address this, we can use the #[bench(raw)] macro to specify exactly which code we’d like to benchmark.

#[cfg(feature = "canbench-rs")]
mod benches {
    use super::*;
    use canbench_rs::bench;


    #[bench(raw)]
    fn pre_upgrade_bench() -> canbench_rs::BenchResult {
        // Some function that fills the state with lots of data.
        initialize_state();

        // Only benchmark the pre_upgrade. Initializing the state isn't
        // included in the results of our benchmark.
        canbench_rs::bench_fn(pre_upgrade)
    }
}

Running canbench on the example above will benchmark only the code wrapped in canbench_rs::bench_fn, which in this case is the call to pre_upgrade.

$ canbench pre_upgrade_bench

---------------------------------------------------

Benchmark: pre_upgrade_bench (new)
  total:
    instructions: 717.10 M (new)
    heap_increase: 519 pages (new)
    stable_memory_increase: 184 pages (new)

---------------------------------------------------

Executed 1 of 1 benchmarks.

§Granular Benchmarking

Building on the example above, the pre_upgrade function does two steps:

Serialize the state
Write to stable memory

Suppose we’re interested in understanding, within pre_upgrade, the resources spent in each of these steps. canbench allows you to do more granular benchmarking using the canbench_rs::bench_scope function. Here’s how we can modify our pre_upgrade function:


#[pre_upgrade]
fn pre_upgrade() {
    // Serialize state.
    let bytes = {
        #[cfg(feature = "canbench-rs")]
        let _p = canbench_rs::bench_scope("serialize_state");
        STATE.with(|s| Encode!(s).unwrap())
    };

    // Write to stable memory.
    #[cfg(feature = "canbench-rs")]
    let _p = canbench_rs::bench_scope("writing_to_stable_memory");
    ic_cdk::api::stable::StableWriter::default()
        .write(&bytes)
        .unwrap();
}

In the code above, we’ve asked canbench to profile each of these steps separately. Running canbench now, each of these steps are reported.

$ canbench pre_upgrade_bench

---------------------------------------------------

Benchmark: pre_upgrade_bench (new)
  total:
    instructions: 717.11 M (new)
    heap_increase: 519 pages (new)
    stable_memory_increase: 184 pages (new)

  serialize_state (profiling):
    instructions: 717.10 M (new)
    heap_increase: 519 pages (new)
    stable_memory_increase: 0 pages (new)

  writing_to_stable_memory (profiling):
    instructions: 502 (new)
    heap_increase: 0 pages (new)
    stable_memory_increase: 184 pages (new)

---------------------------------------------------

Executed 1 of 1 benchmarks.

§Debugging

The ic_cdk::eprintln!() macro facilitates tracing canister and benchmark execution. Output is displayed on the console when canbench is executed with the --show-canister-output option.

    #[bench]
    fn bench_with_debug_print() {
        // Run `canbench --show-canister-output` to see the output.
        ic_cdk::eprintln!("Hello from {}!", env!("CARGO_PKG_NAME"));
    }

Example output:

$ canbench bench_with_debug_print --show-canister-output
[...]
2021-05-06 19:17:10.000000003 UTC: [Canister lxzze-o7777-77777-aaaaa-cai] Hello from example!
[...]

Refer to the Internet Computer specification for more details.

§Preventing Compiler Optimizations

If benchmark results appear suspiciously low and remain consistent despite increased benchmarked function complexity, the std::hint::black_box function helps prevent compiler optimizations.

    #[bench]
    fn fibonacci_20() {
        // Prevent the compiler from optimizing the call and propagating constants.
        std::hint::black_box(fibonacci(std::hint::black_box(20)));
    }

Note that passing constant values as function arguments can also trigger compiler optimizations. If the actual code uses variables (not constants), both the arguments and the result of the benchmarked function must be wrapped in black_box calls.

Refer to the Rust documentation for more details.

Structs§

BenchResult: The results of a benchmark. This type is in a public API.
BenchScope: An object used for benchmarking a specific scope.
Measurement: A benchmark measurement containing various stats. This type is in a public API.

Functions§

bench_fn: Benchmarks the given function.
bench_scope: Benchmarks the scope this function is declared in.
get_traces

Attribute Macros§

bench: A macro for declaring a benchmark where only some part of the function is benchmarked.

Crate canbench_rsCopy item path