pub struct Dhat(/* private fields */);
default
only.Expand description
The configuration for Dhat
Can be specified in crate::LibraryBenchmarkConfig::tool
or
crate::BinaryBenchmarkConfig::tool
.
§Example
use iai_callgrind::{LibraryBenchmarkConfig, main, Dhat};
main!(
config = LibraryBenchmarkConfig::default()
.tool(Dhat::default());
library_benchmark_groups = some_group
);
Implementations§
Source§impl Dhat
impl Dhat
Sourcepub fn with_args<I, T>(args: T) -> Self
pub fn with_args<I, T>(args: T) -> Self
Create a new Callgrind
configuration with initial command-line arguments
See also Callgrind::args
and Dhat::args
§Examples
use iai_callgrind::Dhat;
let config = Dhat::with_args(["mode=ad-hoc"]);
Sourcepub fn args<I, T>(&mut self, args: T) -> &mut Self
pub fn args<I, T>(&mut self, args: T) -> &mut Self
Add command-line arguments to the Dhat
configuration
Valid arguments are https://valgrind.org/docs/manual/dh-manual.html#dh-manual.options and the core valgrind command-line arguments https://valgrind.org/docs/manual/manual-core.html#manual-core.options.
See also Callgrind::args
§Examples
use iai_callgrind::Dhat;
let config = Dhat::default().args(["interval-size=10000"]);
Sourcepub fn enable(&mut self, value: bool) -> &mut Self
pub fn enable(&mut self, value: bool) -> &mut Self
Enable this tool. This is the default.
See also Callgrind::enable
use iai_callgrind::Dhat;
let config = Dhat::default().enable(false);
Sourcepub fn format<I, T>(&mut self, kinds: T) -> &mut Self
pub fn format<I, T>(&mut self, kinds: T) -> &mut Self
Customize the format of the dhat output
See also Callgrind::format
for more details and DhatMetric
for valid metrics.
§Examples
use iai_callgrind::{Dhat, DhatMetric};
let config = Dhat::default().format([DhatMetric::TotalBytes, DhatMetric::AtTGmaxBytes]);
Sourcepub fn entry_point(&mut self, entry_point: EntryPoint) -> &mut Self
pub fn entry_point(&mut self, entry_point: EntryPoint) -> &mut Self
Set or unset the entry point for DHAT
The basic concept of this EntryPoint
is almost the same as for
Callgrind::entry_point
and for additional details see there. For library benchmarks the
default entry point is EntryPoint::Default
and for binary benchmarks it’s
EntryPoint::None
.
Note that the default entry point tries to match the benchmark function, so it doesn’t make
much sense to use EntryPoint::Default
in binary benchmarks. The result of an incorrect
entry point is usually that all metrics are 0
, which is an indicator that something has
gone wrong.
§Details
There are subtle differences to the entry point in callgrind and the calculation of the
final metrics shown in the DHAT output can only be done on a best-effort basis. As opposed
to callgrind, the default entry point EntryPoint::Default
is applied after the benchmark
run based on the output files because DHAT does not have a command line argument like
--toggle-collect
. The DHAT output files however, can’t be used to reliably exclude the
setup
and teardown
of the benchmark function. As a consequence, allocations and
deallocations in the setup
and teardown
function are included in the final metrics. All
other (de-)allocations in the benchmark file (around 2000
- 2500
bytes) to prepare the
benchmark run are not included what stabilizes the metrics enough to be able to specify
limits with [Dhat::limits
] for regression checks and focus the metrics on the benchmark
function.
Since there is no --toggle-collect
argument, it’s possible to define additional frames
(the Iai-Callgrind specific DHAT equivalent of callgrind toggles) in the Dhat::frames
method.
The EntryPoint::Default
matches the benchmark function and a EntryPoint::Custom
is
convenience for specifying EntryPoint::None
and a frame in Dhat::frames
.
§Examples
Specifying no entry point in library benchmarks is the same as specifying
EntryPoint::Default
. It is used here nonetheless for demonstration purposes:
use iai_callgrind::{
main, LibraryBenchmarkConfig, library_benchmark, library_benchmark_group, Dhat,
EntryPoint
};
use std::hint::black_box;
use my_lib::to_be_benchmarked;
#[library_benchmark(
config = LibraryBenchmarkConfig::default()
.tool(Dhat::default().entry_point(EntryPoint::Default))
)]
fn some_bench() -> Vec<i32> { // <-- DEFAULT ENTRY POINT
black_box(to_be_benchmarked())
}
library_benchmark_group!(name = some_group; benchmarks = some_bench);
main!(library_benchmark_groups = some_group);
You most likely want to disable the entry point with EntryPoint::None
if you’re using
DHAT ad-hoc profiling.
use iai_callgrind::{
main, LibraryBenchmarkConfig, library_benchmark, library_benchmark_group,
EntryPoint, Dhat
};
use std::hint::black_box;
fn to_be_benchmarked() -> Vec<i32> {
iai_callgrind::client_requests::dhat::ad_hoc_event(20);
// allocations worth a weight of `20`
}
#[library_benchmark(
config = LibraryBenchmarkConfig::default()
.tool(Dhat::with_args(["--mode=ad-hoc"])
.entry_point(EntryPoint::None)
)
)]
fn some_bench() -> Vec<i32> {
black_box(to_be_benchmarked())
}
library_benchmark_group!(name = some_group; benchmarks = some_bench);
main!(library_benchmark_groups = some_group);
Sourcepub fn frames<I, T>(&mut self, frames: T) -> &mut Self
pub fn frames<I, T>(&mut self, frames: T) -> &mut Self
Add one or multiple frames
which will be included in the benchmark metrics
Frames
are special to Iai-Callgrind and the DHAT equivalent to callgrind toggles
(--toggle-collect
) and like --toggle-collect
this method accepts simple glob patterns
with *
and ?
wildcards. A Frame
describes an entry in the call stack (See the
example). Sometimes the Dhat::entry_point
is not enough and it is required to specify
additional frames. This is especially true in multi-threaded/multi-process applications.
Like in callgrind, each thread/subprocess in DHAT is treated as a separate unit and thus
requires frames
in addition to the default entry point to include the interesting ones in
the measurements.
§Example
To demonstrate a general workflow, below is a sanitized example output of dh_view.html
of
a benchmark of a multi-threaded program. Most of the program points, including the default
entry point, are not shown here to safe some space. The spawned thread
(std::sys::pal::unix::thread::Thread::new::thread_start
) with the function call
benchmark_tests::find_primes
is the interesting one.
▼ PP 1/1 (3 children) {
Total: 156,372 bytes (100%, 14,948.32/Minstr) in 76 blocks (100%, 7.27/Minstr), avg size 2,057.53 bytes, avg lifetime 2,907,942.57 instrs (27.8% of program duration)
At t-gmax: 52,351 bytes (100%) in 20 blocks (100%), avg size 2,617.55 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 117,583 bytes (100%, 11,240.3/Minstr), 0.75/byte
Writes: 135,680 bytes (100%, 12,970.28/Minstr), 0.87/byte
Allocated at {
#0: [root]
}
}
├─▼ PP 1.1/3 (12 children) {
│ Total: 154,468 bytes (98.78%, 14,766.31/Minstr) in 57 blocks (75%, 5.45/Minstr), avg size 2,709.96 bytes, avg lifetime 2,937,398.7 instrs (28.08% of program duration)
│ At t-gmax: 51,375 bytes (98.14%) in 15 blocks (75%), avg size 3,425 bytes
│ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
│ Reads: 116,367 bytes (98.97%, 11,124.06/Minstr), 0.75/byte
│ Writes: 134,872 bytes (99.4%, 12,893.03/Minstr), 0.87/byte
│ Allocated at {
│ #1: 0x48CC7A8: malloc (in /usr/lib/valgrind/vgpreload_dhat-amd64-linux.so)
│ }
│ }
│ ├── PP 1.1.1/12 {
│ │ Total: 81,824 bytes (52.33%, 7,821.93/Minstr) in 29 blocks (38.16%, 2.77/Minstr), avg size 2,821.52 bytes, avg lifetime 785,423.83 instrs (7.51% of program duration)
│ │ Max: 40,960 bytes in 3 blocks, avg size 13,653.33 bytes
│ │ At t-gmax: 40,960 bytes (78.24%) in 3 blocks (15%), avg size 13,653.33 bytes
│ │ At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
│ │ Reads: 66,824 bytes (56.83%, 6,388.01/Minstr), 0.82/byte
│ │ Writes: 66,824 bytes (49.25%, 6,388.01/Minstr), 0.82/byte
│ │ Allocated at {
│ │ ^1: 0x48CC7A8: malloc (in /usr/lib/valgrind/vgpreload_dhat-amd64-linux.so)
│ │ #2: 0x40197C7: UnknownInlinedFun (alloc.rs:93)
│ │ #3: 0x40197C7: UnknownInlinedFun (alloc.rs:188)
│ │ #4: 0x40197C7: UnknownInlinedFun (alloc.rs:249)
│ │ #5: 0x40197C7: UnknownInlinedFun (mod.rs:476)
│ │ #6: 0x40197C7: with_capacity_in<alloc::alloc::Global> (mod.rs:422)
│ │ #7: 0x40197C7: with_capacity_in<u64, alloc::alloc::Global> (mod.rs:190)
│ │ #8: 0x40197C7: with_capacity_in<u64, alloc::alloc::Global> (mod.rs:815)
│ │ #9: 0x40197C7: with_capacity<u64> (mod.rs:495)
│ │ #10: 0x40197C7: from_iter<u64, core::iter::adapters::filter::Filter<core::ops::range::RangeInclusive<u64>, benchmark_tests::find_primes::{closure_env#0}>> (spec_from_iter_nested.rs:31)
│ │ #11: 0x40197C7: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter (spec_from_iter.rs:34)
│ │ #12: 0x4016B97: from_iter<u64, core::iter::adapters::filter::Filter<core::ops::range::RangeInclusive<u64>, benchmark_tests::find_primes::{closure_env#0}>> (mod.rs:3438)
│ │ #13: 0x4016B97: collect<core::iter::adapters::filter::Filter<core::ops::range::RangeInclusive<u64>, benchmark_tests::find_primes::{closure_env#0}>, alloc::vec::Vec<u64, alloc::alloc::Global>> (iterator.rs:2001)
│ │ #14: 0x4016B97: benchmark_tests::find_primes (lib.rs:25)
│ │ #15: 0x4019DA0: {closure#0} (lib.rs:32)
│ │ #16: 0x4019DA0: std::sys::backtrace::__rust_begin_short_backtrace (backtrace.rs:152)
│ │ #17: 0x4018BB4: {closure#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>> (mod.rs:559)
│ │ #18: 0x4018BB4: call_once<alloc::vec::Vec<u64, alloc::alloc::Global>, std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>> (unwind_safe.rs:272)
│ │ #19: 0x4018BB4: do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>>, alloc::vec::Vec<u64, alloc::alloc::Global>> (panicking.rs:589)
│ │ #20: 0x4018BB4: try<alloc::vec::Vec<u64, alloc::alloc::Global>, core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>>> (panicking.rs:552)
│ │ #21: 0x4018BB4: catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>>>, alloc::vec::Vec<u64, alloc::alloc::Global>> (panic.rs:359)
│ │ #22: 0x4018BB4: {closure#1}<benchmark_tests::find_primes_multi_thread::{closure_env#0}, alloc::vec::Vec<u64, alloc::alloc::Global>> (mod.rs:557)
│ │ #23: 0x4018BB4: core::ops::function::FnOnce::call_once{{vtable.shim}} (function.rs:250)
│ │ #24: 0x404A2BA: call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> (boxed.rs:1966)
│ │ #25: 0x404A2BA: call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> (boxed.rs:1966)
│ │ #26: 0x404A2BA: std::sys::pal::unix::thread::Thread::new::thread_start (thread.rs:97)
│ │ #27: 0x49C27EA: ??? (in /usr/lib/libc.so.6)
│ │ #28: 0x4A45FB3: clone (in /usr/lib/libc.so.6)
│ │ }
│ │ }
...
As can be seen, the call stack of the program point PP 1.1.1/12
does not include a main
function, benchmark function, and so forth because a thread is a completely separate unit.
This enables us to exclude uninteresting threads by simply not specifying them here and
include the interesting ones for example with:
use iai_callgrind::Dhat;
Dhat::default().frames(["benchmark_tests::find_primes"]);
Sourcepub fn soft_limits<K, T>(&mut self, soft_limits: T) -> &mut Self
pub fn soft_limits<K, T>(&mut self, soft_limits: T) -> &mut Self
Configure the limits percentages over/below which a performance regression can be assumed
Same as Callgrind::soft_limits
but for DhatMetric
s.
§Examples
use iai_callgrind::{Dhat, DhatMetric};
let config = Dhat::default().soft_limits([(DhatMetric::TotalBytes, 5f64)]);
Sourcepub fn hard_limits<K, L, T>(&mut self, hard_limits: T) -> &mut Self
pub fn hard_limits<K, L, T>(&mut self, hard_limits: T) -> &mut Self
Set hard limits above which a performance regression can be assumed
Same as Callgrind::hard_limits
but for DhatMetric
s.
§Examples
If in a benchmark configured like below, there are more than a total of 10_000
bytes
allocated, a performance regression is registered failing the benchmark run.
use iai_callgrind::{Dhat, DhatMetric};
let config = Dhat::default().hard_limits([(DhatMetric::TotalBytes, 10_000)]);
or for a group of metrics but with a special value for TotalBytes
:
use iai_callgrind::{Dhat, DhatMetric, DhatMetrics};
let config = Dhat::default().hard_limits([
(DhatMetrics::Default, 10_000),
(DhatMetric::TotalBytes.into(), 5_000),
]);
Sourcepub fn fail_fast(&mut self, value: bool) -> &mut Self
pub fn fail_fast(&mut self, value: bool) -> &mut Self
If set to true, then the benchmarks fail on the first encountered regression
The default is false
and the whole benchmark run fails with a regression error after all
benchmarks have been run.
§Examples
use iai_callgrind::Dhat;
let config = Dhat::default().fail_fast(true);