Crate canbench_rs

source ·
Expand description

canbench is a tool for benchmarking canisters on the Internet Computer.

§Quickstart

This example is also available to tinker with in the examples directory. See the fibonacci example.

§1. Install the canbench binary.

The canbench is what runs your canister’s benchmarks.

cargo install canbench

§2. Add optional dependency to Cargo.toml

Typically you do not want your benchmarks to be part of your canister when deploying it to the Internet Computer. Therefore, we include canbench only as an optional dependency so that it’s only included when running benchmarks. For more information about optional dependencies, you can read more about them here.

canbench_rs = { version = "x.y.z", optional = true }

§3. Add a configuration to canbench.yml

The canbench.yml configuration file tells canbench how to build and run you canister. Below is a typical configuration. Note that we’re compiling the canister with the canbench feature so that the benchmarking logic is included in the Wasm.

build_cmd:
  cargo build --release --target wasm32-unknown-unknown --features canbench-rs

wasm_path:
  ./target/wasm32-unknown-unknown/release/<YOUR_CANISTER>.wasm

§4. Start benching! 🏋🏽

Let’s say we have a canister that exposes a query computing the fibonacci sequence of a given number. Here’s what that query can look like:

#[ic_cdk::query]
fn fibonacci(n: u32) -> u32 {
    if n == 0 {
        return 0;
    } else if n == 1 {
        return 1;
    }

    let mut a = 0;
    let mut b = 1;
    let mut result = 0;

    for _ in 2..=n {
        result = a + b;
        a = b;
        b = result;
    }

    result
}

Now, let’s add some benchmarks to this query:

#[cfg(feature = "canbench-rs")]
mod benches {
    use super::*;
    use canbench_rs::bench;


    #[bench]
    fn fibonacci_20() {
        // NOTE: the result is printed to prevent the compiler from optimizing the call away.
        println!("{:?}", fibonacci(20));
    }

    #[bench]
    fn fibonacci_45() {
        // NOTE: the result is printed to prevent the compiler from optimizing the call away.
        println!("{:?}", fibonacci(45));
    }
}

Run canbench. You’ll see an output that looks similar to this:

$ canbench

---------------------------------------------------

Benchmark: fibonacci_20 (new)
  total:
    instructions: 2301 (new)
    heap_increase: 0 pages (new)
    stable_memory_increase: 0 pages (new)

---------------------------------------------------

Benchmark: fibonacci_45 (new)
  total:
    instructions: 3088 (new)
    heap_increase: 0 pages (new)
    stable_memory_increase: 0 pages (new)

---------------------------------------------------

Executed 2 of 2 benchmarks.

§5. Track performance regressions

Notice that canbench reported the above benchmarks as “new”. canbench allows you to persist the results of these benchmarks. In subsequent runs, canbench reports the performance relative to the last persisted run.

Let’s first persist the results above by running canbench again, but with the persist flag:

$ canbench --persist
...
---------------------------------------------------

Executed 2 of 2 benchmarks.
Successfully persisted results to canbench_results.yml

Now, if we run canbench again, canbench will run the benchmarks, and will additionally report that there were no changes detected in performance.

$ canbench
    Finished release [optimized] target(s) in 0.34s

---------------------------------------------------

Benchmark: fibonacci_20
  total:
    instructions: 2301 (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: fibonacci_45
  total:
    instructions: 3088 (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Executed 2 of 2 benchmarks.

Let’s try swapping out our implementation of fibonacci with an implementation that’s miserably inefficient. Replace the fibonacci function defined previously with the following:

#[ic_cdk::query]
fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 1,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

And running canbench again, we see that it detects and reports a regression.

$ canbench

---------------------------------------------------

Benchmark: fibonacci_20
  total:
    instructions: 337.93 K (regressed by 14586.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: fibonacci_45
  total:
    instructions: 56.39 B (regressed by 1826095830.76%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Executed 2 of 2 benchmarks.

Apparently, the recursive implementation is many orders of magnitude more expensive than the iterative implementation 😱 Good thing we found out before deploying this implementation to production.

Notice that fibonacci_45 took > 50B instructions, which is substantially more than the instruction limit given for a single message execution on the Internet Computer. canbench runs benchmarks in an environment that gives them up to 10T instructions.

§Additional Examples

For the following examples, we’ll be using the following canister code, which you can also find in the examples directory. This canister defines a simple state as well as a pre_upgrade function that stores that state into stable memory.

use candid::{CandidType, Encode};
use ic_cdk_macros::pre_upgrade;
use std::cell::RefCell;

#[derive(CandidType)]
struct User {
    name: String,
}

#[derive(Default, CandidType)]
struct State {
    users: std::collections::BTreeMap<u64, User>,
}

thread_local! {
    static STATE: RefCell<State> = RefCell::new(State::default());
}

#[pre_upgrade]
fn pre_upgrade() {
    // Serialize state.
    let bytes = STATE.with(|s| Encode!(s).unwrap());

    // Write to stable memory.
    ic_cdk::api::stable::StableWriter::default()
        .write(&bytes)
        .unwrap();
}

§Excluding setup code

Let’s say we want to benchmark how long it takes to run the pre_upgrade function. We can define the following benchmark:

#[cfg(feature = "canbench-rs")]
mod benches {
    use super::*;
    use canbench_rs::bench;


    #[bench]
    fn pre_upgrade_bench() {
        // Some function that fills the state with lots of data.
        initialize_state();

        pre_upgrade();
    }
}

The problem with the above benchmark is that it’s benchmarking both the pre_upgrade call and the initialization of the state. What if we’re only interested in benchmarking the pre_upgrade call? To address this, we can use the #[bench(raw)] macro to specify exactly which code we’d like to benchmark.

#[cfg(feature = "canbench-rs")]
mod benches {
    use super::*;
    use canbench_rs::bench;


    #[bench(raw)]
    fn pre_upgrade_bench() -> canbench_rs::BenchResult {
        // Some function that fills the state with lots of data.
        initialize_state();

        // Only benchmark the pre_upgrade. Initializing the state isn't
        // included in the results of our benchmark.
        canbench_rs::bench_fn(pre_upgrade)
    }
}

Running canbench on the example above will benchmark only the code wrapped in canbench_rs::bench_fn, which in this case is the call to pre_upgrade.

$ canbench pre_upgrade_bench

---------------------------------------------------

Benchmark: pre_upgrade_bench (new)
  total:
    instructions: 717.10 M (new)
    heap_increase: 519 pages (new)
    stable_memory_increase: 184 pages (new)

---------------------------------------------------

Executed 1 of 1 benchmarks.

§Granular Benchmarking

Building on the example above, the pre_upgrade function does two steps:

  1. Serialize the state
  2. Write to stable memory

Suppose we’re interested in understanding, within pre_upgrade, the resources spent in each of these steps. canbench allows you to do more granular benchmarking using the canbench_rs::bench_scope function. Here’s how we can modify our pre_upgrade function:


#[pre_upgrade]
fn pre_upgrade() {
    // Serialize state.
    let bytes = {
        #[cfg(feature = "canbench-rs")]
        let _p = canbench_rs::bench_scope("serialize_state");
        STATE.with(|s| Encode!(s).unwrap())
    };

    // Write to stable memory.
    #[cfg(feature = "canbench-rs")]
    let _p = canbench_rs::bench_scope("writing_to_stable_memory");
    ic_cdk::api::stable::StableWriter::default()
        .write(&bytes)
        .unwrap();
}

In the code above, we’ve asked canbench to profile each of these steps separately. Running canbench now, each of these steps are reported.

$ canbench pre_upgrade_bench

---------------------------------------------------

Benchmark: pre_upgrade_bench (new)
  total:
    instructions: 717.11 M (new)
    heap_increase: 519 pages (new)
    stable_memory_increase: 184 pages (new)

  serialize_state (profiling):
    instructions: 717.10 M (new)
    heap_increase: 519 pages (new)
    stable_memory_increase: 0 pages (new)

  writing_to_stable_memory (profiling):
    instructions: 502 (new)
    heap_increase: 0 pages (new)
    stable_memory_increase: 184 pages (new)

---------------------------------------------------

Executed 1 of 1 benchmarks.

Structs§

Functions§

  • Benchmarks the given function.
  • Benchmarks the scope this function is declared in.

Attribute Macros§

  • A macro for declaring a benchmark where only some part of the function is benchmarked.