Crate canbench_rs
source ·Expand description
canbench
is a tool for benchmarking canisters on the Internet Computer.
§Quickstart
This example is also available to tinker with in the examples directory. See the fibonacci example.
§1. Install the canbench
binary.
The canbench
is what runs your canister’s benchmarks.
cargo install canbench
§2. Add optional dependency to Cargo.toml
Typically you do not want your benchmarks to be part of your canister when deploying it to the Internet Computer.
Therefore, we include canbench
only as an optional dependency so that it’s only included when running benchmarks.
For more information about optional dependencies, you can read more about them here.
canbench_rs = { version = "x.y.z", optional = true }
§3. Add a configuration to canbench.yml
The canbench.yml
configuration file tells canbench
how to build and run you canister.
Below is a typical configuration.
Note that we’re compiling the canister with the canbench
feature so that the benchmarking logic is included in the Wasm.
build_cmd:
cargo build --release --target wasm32-unknown-unknown --features canbench-rs
wasm_path:
./target/wasm32-unknown-unknown/release/<YOUR_CANISTER>.wasm
§Init Args
Init args can be specified using the init
key in the configuration file:
init_args:
hex: 4449444c0001710568656c6c6f
§4. Start benching! 🏋🏽
Let’s say we have a canister that exposes a query
computing the fibonacci sequence of a given number.
Here’s what that query can look like:
#[ic_cdk::query]
fn fibonacci(n: u32) -> u32 {
if n == 0 {
return 0;
} else if n == 1 {
return 1;
}
let mut a = 0;
let mut b = 1;
let mut result = 0;
for _ in 2..=n {
result = a + b;
a = b;
b = result;
}
result
}
Now, let’s add some benchmarks to this query:
#[cfg(feature = "canbench-rs")]
mod benches {
use super::*;
use canbench_rs::bench;
#[bench]
fn fibonacci_20() {
// NOTE: the result is printed to prevent the compiler from optimizing the call away.
println!("{:?}", fibonacci(20));
}
#[bench]
fn fibonacci_45() {
// NOTE: the result is printed to prevent the compiler from optimizing the call away.
println!("{:?}", fibonacci(45));
}
}
Run canbench
. You’ll see an output that looks similar to this:
$ canbench
---------------------------------------------------
Benchmark: fibonacci_20 (new)
total:
instructions: 2301 (new)
heap_increase: 0 pages (new)
stable_memory_increase: 0 pages (new)
---------------------------------------------------
Benchmark: fibonacci_45 (new)
total:
instructions: 3088 (new)
heap_increase: 0 pages (new)
stable_memory_increase: 0 pages (new)
---------------------------------------------------
Executed 2 of 2 benchmarks.
§5. Track performance regressions
Notice that canbench
reported the above benchmarks as “new”.
canbench
allows you to persist the results of these benchmarks.
In subsequent runs, canbench
reports the performance relative to the last persisted run.
Let’s first persist the results above by running canbench
again, but with the persist
flag:
$ canbench --persist
...
---------------------------------------------------
Executed 2 of 2 benchmarks.
Successfully persisted results to canbench_results.yml
Now, if we run canbench
again, canbench
will run the benchmarks, and will additionally report that there were no changes detected in performance.
$ canbench
Finished release [optimized] target(s) in 0.34s
---------------------------------------------------
Benchmark: fibonacci_20
total:
instructions: 2301 (no change)
heap_increase: 0 pages (no change)
stable_memory_increase: 0 pages (no change)
---------------------------------------------------
Benchmark: fibonacci_45
total:
instructions: 3088 (no change)
heap_increase: 0 pages (no change)
stable_memory_increase: 0 pages (no change)
---------------------------------------------------
Executed 2 of 2 benchmarks.
Let’s try swapping out our implementation of fibonacci
with an implementation that’s miserably inefficient.
Replace the fibonacci
function defined previously with the following:
#[ic_cdk::query]
fn fibonacci(n: u32) -> u32 {
match n {
0 => 1,
1 => 1,
_ => fibonacci(n - 1) + fibonacci(n - 2),
}
}
And running canbench
again, we see that it detects and reports a regression.
$ canbench
---------------------------------------------------
Benchmark: fibonacci_20
total:
instructions: 337.93 K (regressed by 14586.14%)
heap_increase: 0 pages (no change)
stable_memory_increase: 0 pages (no change)
---------------------------------------------------
Benchmark: fibonacci_45
total:
instructions: 56.39 B (regressed by 1826095830.76%)
heap_increase: 0 pages (no change)
stable_memory_increase: 0 pages (no change)
---------------------------------------------------
Executed 2 of 2 benchmarks.
Apparently, the recursive implementation is many orders of magnitude more expensive than the iterative implementation 😱 Good thing we found out before deploying this implementation to production.
Notice that fibonacci_45
took > 50B instructions, which is substantially more than the instruction limit given for a single message execution on the Internet Computer. canbench
runs benchmarks in an environment that gives them up to 10T instructions.
§Additional Examples
For the following examples, we’ll be using the following canister code, which you can also find in the examples directory.
This canister defines a simple state as well as a pre_upgrade
function that stores that state into stable memory.
use candid::{CandidType, Encode};
use ic_cdk_macros::pre_upgrade;
use std::cell::RefCell;
#[derive(CandidType)]
struct User {
name: String,
}
#[derive(Default, CandidType)]
struct State {
users: std::collections::BTreeMap<u64, User>,
}
thread_local! {
static STATE: RefCell<State> = RefCell::new(State::default());
}
#[pre_upgrade]
fn pre_upgrade() {
// Serialize state.
let bytes = STATE.with(|s| Encode!(s).unwrap());
// Write to stable memory.
ic_cdk::api::stable::StableWriter::default()
.write(&bytes)
.unwrap();
}
§Excluding setup code
Let’s say we want to benchmark how long it takes to run the pre_upgrade
function. We can define the following benchmark:
#[cfg(feature = "canbench-rs")]
mod benches {
use super::*;
use canbench_rs::bench;
#[bench]
fn pre_upgrade_bench() {
// Some function that fills the state with lots of data.
initialize_state();
pre_upgrade();
}
}
The problem with the above benchmark is that it’s benchmarking both the pre_upgrade
call and the initialization of the state.
What if we’re only interested in benchmarking the pre_upgrade
call?
To address this, we can use the #[bench(raw)]
macro to specify exactly which code we’d like to benchmark.
#[cfg(feature = "canbench-rs")]
mod benches {
use super::*;
use canbench_rs::bench;
#[bench(raw)]
fn pre_upgrade_bench() -> canbench_rs::BenchResult {
// Some function that fills the state with lots of data.
initialize_state();
// Only benchmark the pre_upgrade. Initializing the state isn't
// included in the results of our benchmark.
canbench_rs::bench_fn(pre_upgrade)
}
}
Running canbench
on the example above will benchmark only the code wrapped in canbench_rs::bench_fn
, which in this case is the call to pre_upgrade
.
$ canbench pre_upgrade_bench
---------------------------------------------------
Benchmark: pre_upgrade_bench (new)
total:
instructions: 717.10 M (new)
heap_increase: 519 pages (new)
stable_memory_increase: 184 pages (new)
---------------------------------------------------
Executed 1 of 1 benchmarks.
§Granular Benchmarking
Building on the example above, the pre_upgrade
function does two steps:
- Serialize the state
- Write to stable memory
Suppose we’re interested in understanding, within pre_upgrade
, the resources spent in each of these steps.
canbench
allows you to do more granular benchmarking using the canbench_rs::bench_scope
function.
Here’s how we can modify our pre_upgrade
function:
#[pre_upgrade]
fn pre_upgrade() {
// Serialize state.
let bytes = {
#[cfg(feature = "canbench-rs")]
let _p = canbench_rs::bench_scope("serialize_state");
STATE.with(|s| Encode!(s).unwrap())
};
// Write to stable memory.
#[cfg(feature = "canbench-rs")]
let _p = canbench_rs::bench_scope("writing_to_stable_memory");
ic_cdk::api::stable::StableWriter::default()
.write(&bytes)
.unwrap();
}
In the code above, we’ve asked canbench
to profile each of these steps separately.
Running canbench
now, each of these steps are reported.
$ canbench pre_upgrade_bench
---------------------------------------------------
Benchmark: pre_upgrade_bench (new)
total:
instructions: 717.11 M (new)
heap_increase: 519 pages (new)
stable_memory_increase: 184 pages (new)
serialize_state (profiling):
instructions: 717.10 M (new)
heap_increase: 519 pages (new)
stable_memory_increase: 0 pages (new)
writing_to_stable_memory (profiling):
instructions: 502 (new)
heap_increase: 0 pages (new)
stable_memory_increase: 184 pages (new)
---------------------------------------------------
Executed 1 of 1 benchmarks.
Structs§
- The results of a benchmark.
- An object used for benchmarking a specific scope.
- A benchmark measurement containing various stats.
Functions§
- Benchmarks the given function.
- Benchmarks the scope this function is declared in.
Attribute Macros§
- A macro for declaring a benchmark where only some part of the function is benchmarked.