canbench_rs/
lib.rs

1//! `canbench` is a tool for benchmarking canisters on the Internet Computer.
2//!
3//! ## Quickstart
4//!
5//! This example is also available to tinker with in the examples directory. See the [fibonacci example](https://github.com/dfinity/bench/tree/main/examples/fibonacci).
6//!
7//! ### 1. Install the `canbench` binary.
8//!
9//! The `canbench` is what runs your canister's benchmarks.
10//!
11//! ```bash
12//! cargo install canbench
13//! ```
14//!
15//! ### 2. Add optional dependency to `Cargo.toml`
16//!
17//! Typically you do not want your benchmarks to be part of your canister when deploying it to the Internet Computer.
18//! Therefore, we include `canbench` only as an optional dependency so that it's only included when running benchmarks.
19//! For more information about optional dependencies, you can read more about them [here](https://doc.rust-lang.org/cargo/reference/features.html#optional-dependencies).
20//!
21//! ```toml
22//! canbench-rs = { version = "x.y.z", optional = true }
23//! ```
24//!
25//! ### 3. Add a configuration to `canbench.yml`
26//!
27//! The `canbench.yml` configuration file tells `canbench` how to build and run you canister.
28//! Below is a typical configuration.
29//! Note that we're compiling the canister with the `canbench` feature so that the benchmarking logic is included in the Wasm.
30//!
31//! ```yml
32//! build_cmd:
33//!   cargo build --release --target wasm32-unknown-unknown --features canbench-rs
34//!
35//! wasm_path:
36//!   ./target/wasm32-unknown-unknown/release/<YOUR_CANISTER>.wasm
37//! ```
38//! #### Init Args
39//!
40//! Init args can be specified using the `init_args` key in the configuration file:
41//! ```yml
42//! init_args:
43//!   hex: 4449444c0001710568656c6c6f
44//! ```
45//!
46//! #### Stable Memory
47//!
48//! A file can be specified to be loaded in the canister's stable memory _after_ initialization.
49//!
50//! ```yml
51//! stable_memory:
52//!   file:
53//!     stable_memory.bin
54//! ```
55//!
56//! <div class="warning">Contents of the stable memory file are loaded <i>after</i> the call to the canister's init method.
57//! Therefore, changes made to stable memory in the init method would be overwritten.</div>
58//!
59//! ### 4. Start benching! 🏋🏽
60//!
61//! Let's say we have a canister that exposes a `query` computing the fibonacci sequence of a given number.
62//! Here's what that query can look like:
63//!
64//! ```rust
65//! #[ic_cdk::query]
66//! fn fibonacci(n: u32) -> u32 {
67//!     if n == 0 {
68//!         return 0;
69//!     } else if n == 1 {
70//!         return 1;
71//!     }
72//!
73//!     let mut a = 0;
74//!     let mut b = 1;
75//!     let mut result = 0;
76//!
77//!     for _ in 2..=n {
78//!         result = a + b;
79//!         a = b;
80//!         b = result;
81//!     }
82//!
83//!     result
84//! }
85//! ```
86//!
87//! Now, let's add some benchmarks to this query:
88//!
89//! ```rust
90//! #[cfg(feature = "canbench-rs")]
91//! mod benches {
92//!     use super::*;
93//!     use canbench_rs::bench;
94//!
95//!     # fn fibonacci(_: u32) -> u32 { 0 }
96//!
97//!     #[bench]
98//!     fn fibonacci_20() {
99//!         // NOTE: the result is printed to prevent the compiler from optimizing the call away.
100//!         println!("{:?}", fibonacci(20));
101//!     }
102//!
103//!     #[bench]
104//!     fn fibonacci_45() {
105//!         // NOTE: the result is printed to prevent the compiler from optimizing the call away.
106//!         println!("{:?}", fibonacci(45));
107//!     }
108//! }
109//! ```
110//!
111//! Run `canbench`. You'll see an output that looks similar to this:
112//!
113//! ```txt
114//! $ canbench
115//!
116//! ---------------------------------------------------
117//!
118//! Benchmark: fibonacci_20 (new)
119//!   total:
120//!     instructions: 2301 (new)
121//!     heap_increase: 0 pages (new)
122//!     stable_memory_increase: 0 pages (new)
123//!
124//! ---------------------------------------------------
125//!
126//! Benchmark: fibonacci_45 (new)
127//!   total:
128//!     instructions: 3088 (new)
129//!     heap_increase: 0 pages (new)
130//!     stable_memory_increase: 0 pages (new)
131//!
132//! ---------------------------------------------------
133//!
134//! Executed 2 of 2 benchmarks.
135//! ```
136//!
137//! ### 5. Track performance regressions
138//!
139//! Notice that `canbench` reported the above benchmarks as "new".
140//! `canbench` allows you to persist the results of these benchmarks.
141//! In subsequent runs, `canbench` reports the performance relative to the last persisted run.
142//!
143//! Let's first persist the results above by running `canbench` again, but with the `persist` flag:
144//!
145//! ```txt
146//! $ canbench --persist
147//! ...
148//! ---------------------------------------------------
149//!
150//! Executed 2 of 2 benchmarks.
151//! Successfully persisted results to canbench_results.yml
152//! ```
153//!
154//! Now, if we run `canbench` again, `canbench` will run the benchmarks, and will additionally report that there were no changes detected in performance.
155//!
156//! ```txt
157//! $ canbench
158//!     Finished release [optimized] target(s) in 0.34s
159//!
160//! ---------------------------------------------------
161//!
162//! Benchmark: fibonacci_20
163//!   total:
164//!     instructions: 2301 (no change)
165//!     heap_increase: 0 pages (no change)
166//!     stable_memory_increase: 0 pages (no change)
167//!
168//! ---------------------------------------------------
169//!
170//! Benchmark: fibonacci_45
171//!   total:
172//!     instructions: 3088 (no change)
173//!     heap_increase: 0 pages (no change)
174//!     stable_memory_increase: 0 pages (no change)
175//!
176//! ---------------------------------------------------
177//!
178//! Executed 2 of 2 benchmarks.
179//! ```
180//!
181//! Let's try swapping out our implementation of `fibonacci` with an implementation that's miserably inefficient.
182//! Replace the `fibonacci` function defined previously with the following:
183//!
184//! ```rust
185//! #[ic_cdk::query]
186//! fn fibonacci(n: u32) -> u32 {
187//!     match n {
188//!         0 => 1,
189//!         1 => 1,
190//!         _ => fibonacci(n - 1) + fibonacci(n - 2),
191//!     }
192//! }
193//! ```
194//!
195//! And running `canbench` again, we see that it detects and reports a regression.
196//!
197//! ```txt
198//! $ canbench
199//!
200//! ---------------------------------------------------
201//!
202//! Benchmark: fibonacci_20
203//!   total:
204//!     instructions: 337.93 K (regressed by 14586.14%)
205//!     heap_increase: 0 pages (no change)
206//!     stable_memory_increase: 0 pages (no change)
207//!
208//! ---------------------------------------------------
209//!
210//! Benchmark: fibonacci_45
211//!   total:
212//!     instructions: 56.39 B (regressed by 1826095830.76%)
213//!     heap_increase: 0 pages (no change)
214//!     stable_memory_increase: 0 pages (no change)
215//!
216//! ---------------------------------------------------
217//!
218//! Executed 2 of 2 benchmarks.
219//! ```
220//!
221//! Apparently, the recursive implementation is many orders of magnitude more expensive than the iterative implementation 😱
222//! Good thing we found out before deploying this implementation to production.
223//!
224//! Notice that `fibonacci_45` took > 50B instructions, which is substantially more than the instruction limit given for a single message execution on the Internet Computer. `canbench` runs benchmarks in an environment that gives them up to 10T instructions.
225//!
226//! ## Additional Examples
227//!
228//! For the following examples, we'll be using the following canister code, which you can also find in the [examples](./examples/btreemap_vs_hashmap) directory.
229//! This canister defines a simple state as well as a `pre_upgrade` function that stores that state into stable memory.
230//!
231//! ```rust
232//! use candid::{CandidType, Encode};
233//! use ic_cdk_macros::pre_upgrade;
234//! use std::cell::RefCell;
235//!
236//! #[derive(CandidType)]
237//! struct User {
238//!     name: String,
239//! }
240//!
241//! #[derive(Default, CandidType)]
242//! struct State {
243//!     users: std::collections::BTreeMap<u64, User>,
244//! }
245//!
246//! thread_local! {
247//!     static STATE: RefCell<State> = RefCell::new(State::default());
248//! }
249//!
250//! #[pre_upgrade]
251//! fn pre_upgrade() {
252//!     // Serialize state.
253//!     let bytes = STATE.with(|s| Encode!(s).unwrap());
254//!
255//!     // Write to stable memory.
256//!     ic_cdk::api::stable::StableWriter::default()
257//!         .write(&bytes)
258//!         .unwrap();
259//! }
260//! ```
261//!
262//! ### Excluding setup code
263//!
264//! Let's say we want to benchmark how long it takes to run the `pre_upgrade` function. We can define the following benchmark:
265//!
266//! ```rust
267//! #[cfg(feature = "canbench-rs")]
268//! mod benches {
269//!     use super::*;
270//!     use canbench_rs::bench;
271//!
272//!     # fn initialize_state() {}
273//!     # fn pre_upgrade() {}
274//!
275//!     #[bench]
276//!     fn pre_upgrade_bench() {
277//!         // Some function that fills the state with lots of data.
278//!         initialize_state();
279//!
280//!         pre_upgrade();
281//!     }
282//! }
283//! ```
284//!
285//! The problem with the above benchmark is that it's benchmarking both the `pre_upgrade` call _and_ the initialization of the state.
286//! What if we're only interested in benchmarking the `pre_upgrade` call?
287//! To address this, we can use the `#[bench(raw)]` macro to specify exactly which code we'd like to benchmark.
288//!
289//! ```rust
290//! #[cfg(feature = "canbench-rs")]
291//! mod benches {
292//!     use super::*;
293//!     use canbench_rs::bench;
294//!
295//!     # fn initialize_state() {}
296//!     # fn pre_upgrade() {}
297//!
298//!     #[bench(raw)]
299//!     fn pre_upgrade_bench() -> canbench_rs::BenchResult {
300//!         // Some function that fills the state with lots of data.
301//!         initialize_state();
302//!
303//!         // Only benchmark the pre_upgrade. Initializing the state isn't
304//!         // included in the results of our benchmark.
305//!         canbench_rs::bench_fn(pre_upgrade)
306//!     }
307//! }
308//! ```
309//!
310//! Running `canbench` on the example above will benchmark only the code wrapped in `canbench_rs::bench_fn`, which in this case is the call to `pre_upgrade`.
311//!
312//! ```txt
313//! $ canbench pre_upgrade_bench
314//!
315//! ---------------------------------------------------
316//!
317//! Benchmark: pre_upgrade_bench (new)
318//!   total:
319//!     instructions: 717.10 M (new)
320//!     heap_increase: 519 pages (new)
321//!     stable_memory_increase: 184 pages (new)
322//!
323//! ---------------------------------------------------
324//!
325//! Executed 1 of 1 benchmarks.
326//! ```
327//!
328//! ### Granular Benchmarking
329//!
330//! Building on the example above, the `pre_upgrade` function does two steps:
331//!
332//! 1. Serialize the state
333//! 2. Write to stable memory
334//!
335//! Suppose we're interested in understanding, within `pre_upgrade`, the resources spent in each of these steps.
336//! `canbench` allows you to do more granular benchmarking using the `canbench_rs::bench_scope` function.
337//! Here's how we can modify our `pre_upgrade` function:
338//!
339//!
340//! ```rust
341//! # use candid::{Encode, CandidType};
342//! # use ic_cdk_macros::pre_upgrade;
343//! # use std::cell::RefCell;
344//! #
345//! # #[derive(CandidType)]
346//! # struct User {
347//! #     name: String,
348//! # }
349//! #
350//! # #[derive(Default, CandidType)]
351//! # struct State {
352//! #     users: std::collections::BTreeMap<u64, User>,
353//! # }
354//! #
355//! # thread_local! {
356//! #     static STATE: RefCell<State> = RefCell::new(State::default());
357//! # }
358//!
359//! #[pre_upgrade]
360//! fn pre_upgrade() {
361//!     // Serialize state.
362//!     let bytes = {
363//!         #[cfg(feature = "canbench-rs")]
364//!         let _p = canbench_rs::bench_scope("serialize_state");
365//!         STATE.with(|s| Encode!(s).unwrap())
366//!     };
367//!
368//!     // Write to stable memory.
369//!     #[cfg(feature = "canbench-rs")]
370//!     let _p = canbench_rs::bench_scope("writing_to_stable_memory");
371//!     ic_cdk::api::stable::StableWriter::default()
372//!         .write(&bytes)
373//!         .unwrap();
374//! }
375//! ```
376//!
377//! In the code above, we've asked `canbench` to profile each of these steps separately.
378//! Running `canbench` now, each of these steps are reported.
379//!
380//! ```txt
381//! $ canbench pre_upgrade_bench
382//!
383//! ---------------------------------------------------
384//!
385//! Benchmark: pre_upgrade_bench (new)
386//!   total:
387//!     instructions: 717.11 M (new)
388//!     heap_increase: 519 pages (new)
389//!     stable_memory_increase: 184 pages (new)
390//!
391//!   serialize_state (profiling):
392//!     instructions: 717.10 M (new)
393//!     heap_increase: 519 pages (new)
394//!     stable_memory_increase: 0 pages (new)
395//!
396//!   writing_to_stable_memory (profiling):
397//!     instructions: 502 (new)
398//!     heap_increase: 0 pages (new)
399//!     stable_memory_increase: 184 pages (new)
400//!
401//! ---------------------------------------------------
402//!
403//! Executed 1 of 1 benchmarks.
404//! ```
405pub use canbench_rs_macros::bench;
406use candid::CandidType;
407use serde::{Deserialize, Serialize};
408use std::cell::RefCell;
409use std::collections::BTreeMap;
410
411thread_local! {
412    static SCOPES: RefCell<BTreeMap<&'static str, Measurement>> =
413        const { RefCell::new(BTreeMap::new()) };
414}
415
416/// The results of a benchmark.
417#[derive(Debug, PartialEq, Serialize, Deserialize, CandidType)]
418pub struct BenchResult {
419    /// A measurement for the entire duration of the benchmark.
420    pub total: Measurement,
421
422    /// Measurements for scopes.
423    #[serde(default)]
424    pub scopes: BTreeMap<String, Measurement>,
425}
426
427/// A benchmark measurement containing various stats.
428#[derive(Debug, PartialEq, Serialize, Deserialize, CandidType, Clone)]
429pub struct Measurement {
430    /// The number of instructions.
431    #[serde(default)]
432    pub instructions: u64,
433
434    /// The increase in heap (measured in pages).
435    #[serde(default)]
436    pub heap_increase: u64,
437
438    /// The increase in stable memory (measured in pages).
439    #[serde(default)]
440    pub stable_memory_increase: u64,
441}
442
443/// Benchmarks the given function.
444pub fn bench_fn<R>(f: impl FnOnce() -> R) -> BenchResult {
445    reset();
446    let start_heap = heap_size();
447    let start_stable_memory = ic_cdk::api::stable::stable64_size();
448    let start_instructions = instruction_count();
449    f();
450    let instructions = instruction_count() - start_instructions;
451    let stable_memory_increase = ic_cdk::api::stable::stable64_size() - start_stable_memory;
452    let heap_increase = heap_size() - start_heap;
453
454    let total = Measurement {
455        instructions,
456        heap_increase,
457        stable_memory_increase,
458    };
459
460    let scopes: std::collections::BTreeMap<_, _> = get_scopes_measurements()
461        .into_iter()
462        .map(|(k, v)| (k.to_string(), v))
463        .collect();
464
465    BenchResult { total, scopes }
466}
467
468/// Benchmarks the scope this function is declared in.
469///
470/// NOTE: It's important to assign this function, otherwise benchmarking won't work correctly.
471///
472/// # Correct Usage
473///
474/// ```
475/// fn my_func() {
476///   let _p = canbench_rs::bench_scope("my_scope");
477///   // Do something.
478/// }
479/// ```
480///
481/// # Incorrect Usages
482///
483/// ```
484/// fn my_func() {
485///   let _ = canbench_rs::bench_scope("my_scope"); // Doesn't capture the scope.
486///   // Do something.
487/// }
488/// ```
489///
490/// ```
491/// fn my_func() {
492///   canbench_rs::bench_scope("my_scope"); // Doesn't capture the scope.
493///   // Do something.
494/// }
495/// ```
496#[must_use]
497pub fn bench_scope(name: &'static str) -> BenchScope {
498    BenchScope::new(name)
499}
500
501/// An object used for benchmarking a specific scope.
502pub struct BenchScope {
503    name: &'static str,
504    start_instructions: u64,
505    start_stable_memory: u64,
506    start_heap: u64,
507}
508
509impl BenchScope {
510    fn new(name: &'static str) -> Self {
511        let start_heap = heap_size();
512        let start_stable_memory = ic_cdk::api::stable::stable64_size();
513        let start_instructions = instruction_count();
514
515        Self {
516            name,
517            start_instructions,
518            start_stable_memory,
519            start_heap,
520        }
521    }
522}
523
524impl Drop for BenchScope {
525    fn drop(&mut self) {
526        let instructions = instruction_count() - self.start_instructions;
527        let stable_memory_increase =
528            ic_cdk::api::stable::stable64_size() - self.start_stable_memory;
529        let heap_increase = heap_size() - self.start_heap;
530
531        SCOPES.with(|p| {
532            let mut p = p.borrow_mut();
533            let prev_scope = p.insert(
534                self.name,
535                Measurement {
536                    instructions,
537                    heap_increase,
538                    stable_memory_increase,
539                },
540            );
541
542            assert!(
543                prev_scope.is_none(),
544                "scope {} cannot be specified multiple times.",
545                self.name
546            );
547        });
548    }
549}
550
551// Clears all scope data.
552fn reset() {
553    SCOPES.with(|p| p.borrow_mut().clear());
554}
555
556// Returns the measurements for any declared scopes.
557fn get_scopes_measurements() -> std::collections::BTreeMap<&'static str, Measurement> {
558    SCOPES.with(|p| p.borrow().clone())
559}
560
561fn instruction_count() -> u64 {
562    #[cfg(target_arch = "wasm32")]
563    {
564        ic_cdk::api::performance_counter(0)
565    }
566
567    #[cfg(not(target_arch = "wasm32"))]
568    {
569        // Consider using cpu time here.
570        0
571    }
572}
573
574fn heap_size() -> u64 {
575    #[cfg(target_arch = "wasm32")]
576    {
577        core::arch::wasm32::memory_size(0) as u64
578    }
579
580    #[cfg(not(target_arch = "wasm32"))]
581    {
582        0
583    }
584}