Struct Dhat

Source
pub struct Dhat(/* private fields */);
Available on crate feature default only.
Expand description

The configuration for Dhat

Can be specified in crate::LibraryBenchmarkConfig::tool or crate::BinaryBenchmarkConfig::tool.

§Example

use iai_callgrind::{LibraryBenchmarkConfig, main, Dhat};

main!(
    config = LibraryBenchmarkConfig::default()
        .tool(Dhat::default());
    library_benchmark_groups = some_group
);

Implementations§

Source§

impl Dhat

Source

pub fn with_args<I, T>(args: T) -> Self
where I: AsRef<str>, T: IntoIterator<Item = I>,

Create a new Callgrind configuration with initial command-line arguments

See also Callgrind::args and Dhat::args

§Examples
use iai_callgrind::Dhat;

let config = Dhat::with_args(["mode=ad-hoc"]);
Source

pub fn args<I, T>(&mut self, args: T) -> &mut Self
where I: AsRef<str>, T: IntoIterator<Item = I>,

Add command-line arguments to the Dhat configuration

Valid arguments are https://valgrind.org/docs/manual/dh-manual.html#dh-manual.options and the core valgrind command-line arguments https://valgrind.org/docs/manual/manual-core.html#manual-core.options.

See also Callgrind::args

§Examples
use iai_callgrind::Dhat;

let config = Dhat::default().args(["interval-size=10000"]);
Source

pub fn enable(&mut self, value: bool) -> &mut Self

Enable this tool. This is the default.

See also Callgrind::enable

use iai_callgrind::Dhat;

let config = Dhat::default().enable(false);
Source

pub fn format<I, T>(&mut self, kinds: T) -> &mut Self
where I: Into<DhatMetric>, T: IntoIterator<Item = I>,

Customize the format of the dhat output

See also Callgrind::format for more details and DhatMetric for valid metrics.

§Examples
use iai_callgrind::{Dhat, DhatMetric};

let config = Dhat::default().format([DhatMetric::TotalBytes, DhatMetric::AtTGmaxBytes]);
Source

pub fn entry_point(&mut self, entry_point: EntryPoint) -> &mut Self

Set or unset the entry point for DHAT

The basic concept of this EntryPoint is almost the same as for Callgrind::entry_point and for additional details see there. For library benchmarks the default entry point is EntryPoint::Default and for binary benchmarks it’s EntryPoint::None.

Note that the default entry point tries to match the benchmark function, so it doesn’t make much sense to use EntryPoint::Default in binary benchmarks. The result of an incorrect entry point is usually that all metrics are 0, which is an indicator that something has gone wrong.

§Details

There are subtle differences to the entry point in callgrind and the calculation of the final metrics shown in the DHAT output can only be done on a best-effort basis. As opposed to callgrind, the default entry point EntryPoint::Default is applied after the benchmark run based on the output files because DHAT does not have a command line argument like --toggle-collect. The DHAT output files however, can’t be used to reliably exclude the setup and teardown of the benchmark function. As a consequence, allocations and deallocations in the setup and teardown function are included in the final metrics. All other (de-)allocations in the benchmark file (around 2000 - 2500 bytes) to prepare the benchmark run are not included what stabilizes the metrics enough to be able to specify limits with [Dhat::limits] for regression checks and focus the metrics on the benchmark function.

Since there is no --toggle-collect argument, it’s possible to define additional frames (the Iai-Callgrind specific DHAT equivalent of callgrind toggles) in the Dhat::frames method.

The EntryPoint::Default matches the benchmark function and a EntryPoint::Custom is convenience for specifying EntryPoint::None and a frame in Dhat::frames.

§Examples

Specifying no entry point in library benchmarks is the same as specifying EntryPoint::Default. It is used here nonetheless for demonstration purposes:

use iai_callgrind::{
    main, LibraryBenchmarkConfig, library_benchmark, library_benchmark_group, Dhat,
    EntryPoint
};
use std::hint::black_box;
use my_lib::to_be_benchmarked;

#[library_benchmark(
    config = LibraryBenchmarkConfig::default()
        .tool(Dhat::default().entry_point(EntryPoint::Default))
)]
fn some_bench() -> Vec<i32> { // <-- DEFAULT ENTRY POINT
    black_box(to_be_benchmarked())
}

library_benchmark_group!(name = some_group; benchmarks = some_bench);
main!(library_benchmark_groups = some_group);

You most likely want to disable the entry point with EntryPoint::None if you’re using DHAT ad-hoc profiling.

use iai_callgrind::{
    main, LibraryBenchmarkConfig, library_benchmark, library_benchmark_group,
    EntryPoint, Dhat
};
use std::hint::black_box;

fn to_be_benchmarked() -> Vec<i32> {
    iai_callgrind::client_requests::dhat::ad_hoc_event(20);
    // allocations worth a weight of `20`
}

#[library_benchmark(
    config = LibraryBenchmarkConfig::default()
        .tool(Dhat::with_args(["--mode=ad-hoc"])
            .entry_point(EntryPoint::None)
        )
)]
fn some_bench() -> Vec<i32> {
    black_box(to_be_benchmarked())
}

library_benchmark_group!(name = some_group; benchmarks = some_bench);
main!(library_benchmark_groups = some_group);
Source

pub fn frames<I, T>(&mut self, frames: T) -> &mut Self
where I: Into<String>, T: IntoIterator<Item = I>,

Add one or multiple frames which will be included in the benchmark metrics

Frames are special to Iai-Callgrind and the DHAT equivalent to callgrind toggles (--toggle-collect) and like --toggle-collect this method accepts simple glob patterns with * and ? wildcards. A Frame describes an entry in the call stack (See the example). Sometimes the Dhat::entry_point is not enough and it is required to specify additional frames. This is especially true in multi-threaded/multi-process applications. Like in callgrind, each thread/subprocess in DHAT is treated as a separate unit and thus requires frames in addition to the default entry point to include the interesting ones in the measurements.

§Example

To demonstrate a general workflow, below is a sanitized example output of dh_view.html of a benchmark of a multi-threaded program. Most of the program points, including the default entry point, are not shown here to safe some space. The spawned thread (std::sys::pal::unix::thread::Thread::new::thread_start) with the function call benchmark_tests::find_primes is the interesting one.

▼ PP 1/1 (3 children) {
    Total:     156,372 bytes (100%, 14,948.32/Minstr) in 76 blocks (100%, 7.27/Minstr), avg size 2,057.53 bytes, avg lifetime 2,907,942.57 instrs (27.8% of program duration)
    At t-gmax: 52,351 bytes (100%) in 20 blocks (100%), avg size 2,617.55 bytes
    At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
    Reads:     117,583 bytes (100%, 11,240.3/Minstr), 0.75/byte
    Writes:    135,680 bytes (100%, 12,970.28/Minstr), 0.87/byte
    Allocated at {
      #0: [root]
    }
  }
  ├─▼ PP 1.1/3 (12 children) {
  │     Total:     154,468 bytes (98.78%, 14,766.31/Minstr) in 57 blocks (75%, 5.45/Minstr), avg size 2,709.96 bytes, avg lifetime 2,937,398.7 instrs (28.08% of program duration)
  │     At t-gmax: 51,375 bytes (98.14%) in 15 blocks (75%), avg size 3,425 bytes
  │     At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  │     Reads:     116,367 bytes (98.97%, 11,124.06/Minstr), 0.75/byte
  │     Writes:    134,872 bytes (99.4%, 12,893.03/Minstr), 0.87/byte
  │     Allocated at {
  │       #1: 0x48CC7A8: malloc (in /usr/lib/valgrind/vgpreload_dhat-amd64-linux.so)
  │     }
  │   }
  │   ├── PP 1.1.1/12 {
  │   │     Total:     81,824 bytes (52.33%, 7,821.93/Minstr) in 29 blocks (38.16%, 2.77/Minstr), avg size 2,821.52 bytes, avg lifetime 785,423.83 instrs (7.51% of program duration)
  │   │     Max:       40,960 bytes in 3 blocks, avg size 13,653.33 bytes
  │   │     At t-gmax: 40,960 bytes (78.24%) in 3 blocks (15%), avg size 13,653.33 bytes
  │   │     At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
  │   │     Reads:     66,824 bytes (56.83%, 6,388.01/Minstr), 0.82/byte
  │   │     Writes:    66,824 bytes (49.25%, 6,388.01/Minstr), 0.82/byte
  │   │     Allocated at {
  │   │       ^1: 0x48CC7A8: malloc (in /usr/lib/valgrind/vgpreload_dhat-amd64-linux.so)
  │   │       #2: 0x40197C7: UnknownInlinedFun (alloc.rs:93)
  │   │       #3: 0x40197C7: UnknownInlinedFun (alloc.rs:188)
  │   │       #4: 0x40197C7: UnknownInlinedFun (alloc.rs:249)
  │   │       #5: 0x40197C7: UnknownInlinedFun (mod.rs:476)
  │   │       #6: 0x40197C7: with_capacity_in<alloc::alloc::Global> (mod.rs:422)
  │   │       #7: 0x40197C7: with_capacity_in<u64, alloc::alloc::Global> (mod.rs:190)
  │   │       #8: 0x40197C7: with_capacity_in<u64, alloc::alloc::Global> (mod.rs:815)
  │   │       #9: 0x40197C7: with_capacity<u64> (mod.rs:495)
  │   │       #10: 0x40197C7: from_iter<u64, core::iter::adapters::filter::Filter<core::ops::range::RangeInclusive<u64>, benchmark_tests::find_primes::{closure_env#0}>> (spec_from_iter_nested.rs:31)
  │   │       #11: 0x40197C7: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter (spec_from_iter.rs:34)
  │   │       #12: 0x4016B97: from_iter<u64, core::iter::adapters::filter::Filter<core::ops::range::RangeInclusive<u64>, benchmark_tests::find_primes::{closure_env#0}>> (mod.rs:3438)
  │   │       #13: 0x4016B97: collect<core::iter::adapters::filter::Filter<core::ops::range::RangeInclusive<u64>, benchmark_tests::find_primes::{closure_env#0}>, alloc::vec::Vec<u64, alloc::alloc::Global>> (iterator.rs:2001)
  │   │       #14: 0x4016B97: benchmark_tests::find_primes (lib.rs:25)
  │   │       #15: 0x4019DA0: {closure#0} (lib.rs:32)
  │   │       #16: 0x4019DA0: std::sys::backtrace::__rust_begin_short_backtrace (backtrace.rs:152)
  │   │       #17: 0x4018BB4: {closure#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>> (mod.rs:559)
  │   │       #18: 0x4018BB4: call_once<alloc::vec::Vec<u64, alloc::alloc::Global>, std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>> (unwind_safe.rs:272)
  │   │       #19: 0x4018BB4: do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>>, alloc::vec::Vec<u64, alloc::alloc::Global>> (panicking.rs:589)
  │   │       #20: 0x4018BB4: try<alloc::vec::Vec<u64, alloc::alloc::Global>, core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>>> (panicking.rs:552)
  │   │       #21: 0x4018BB4: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>>, alloc::vec::Vec<u64, alloc::alloc::Global>> (panic.rs:359)
  │   │       #22: 0x4018BB4: {closure#1}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>> (mod.rs:557)
  │   │       #23: 0x4018BB4: core::ops::function::FnOnce::call_once{{vtable.shim}} (function.rs:250)
  │   │       #24: 0x404A2BA: call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> (boxed.rs:1966)
  │   │       #25: 0x404A2BA: call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> (boxed.rs:1966)
  │   │       #26: 0x404A2BA: std::sys::pal::unix::thread::Thread::new::thread_start (thread.rs:97)
  │   │       #27: 0x49C27EA: ??? (in /usr/lib/libc.so.6)
  │   │       #28: 0x4A45FB3: clone (in /usr/lib/libc.so.6)
  │   │     }
  │   │   }

  ...

As can be seen, the call stack of the program point PP 1.1.1/12 does not include a main function, benchmark function, and so forth because a thread is a completely separate unit. This enables us to exclude uninteresting threads by simply not specifying them here and include the interesting ones for example with:

use iai_callgrind::Dhat;

Dhat::default().frames(["benchmark_tests::find_primes"]);
Source

pub fn soft_limits<K, T>(&mut self, soft_limits: T) -> &mut Self
where K: Into<DhatMetrics>, T: IntoIterator<Item = (K, f64)>,

Configure the limits percentages over/below which a performance regression can be assumed

Same as Callgrind::soft_limits but for DhatMetrics.

§Examples
use iai_callgrind::{Dhat, DhatMetric};

let config = Dhat::default().soft_limits([(DhatMetric::TotalBytes, 5f64)]);
Source

pub fn hard_limits<K, L, T>(&mut self, hard_limits: T) -> &mut Self
where K: Into<DhatMetrics>, L: Into<Limit>, T: IntoIterator<Item = (K, L)>,

Set hard limits above which a performance regression can be assumed

Same as Callgrind::hard_limits but for DhatMetrics.

§Examples

If in a benchmark configured like below, there are more than a total of 10_000 bytes allocated, a performance regression is registered failing the benchmark run.

use iai_callgrind::{Dhat, DhatMetric};

let config = Dhat::default().hard_limits([(DhatMetric::TotalBytes, 10_000)]);

or for a group of metrics but with a special value for TotalBytes:

use iai_callgrind::{Dhat, DhatMetric, DhatMetrics};

let config = Dhat::default().hard_limits([
    (DhatMetrics::Default, 10_000),
    (DhatMetric::TotalBytes.into(), 5_000),
]);
Source

pub fn fail_fast(&mut self, value: bool) -> &mut Self

If set to true, then the benchmarks fail on the first encountered regression

The default is false and the whole benchmark run fails with a regression error after all benchmarks have been run.

§Examples
use iai_callgrind::Dhat;

let config = Dhat::default().fail_fast(true);

Trait Implementations§

Source§

impl AsRef<Tool> for Dhat

Source§

fn as_ref(&self) -> &InternalTool

Converts this type into a shared reference of the (usually inferred) input type.
Source§

impl Clone for Dhat

Source§

fn clone(&self) -> Dhat

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Dhat

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Dhat

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl From<&Dhat> for InternalTool

Source§

fn from(value: &Dhat) -> Self

Converts to this type from the input type.
Source§

impl From<&mut Dhat> for InternalTool

Source§

fn from(value: &mut Dhat) -> Self

Converts to this type from the input type.
Source§

impl From<Dhat> for InternalTool

Source§

fn from(value: Dhat) -> Self

Converts to this type from the input type.

Auto Trait Implementations§

§

impl Freeze for Dhat

§

impl RefUnwindSafe for Dhat

§

impl Send for Dhat

§

impl Sync for Dhat

§

impl Unpin for Dhat

§

impl UnwindSafe for Dhat

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.