[][src]Crate dhat

This crate provides heap profiling and ad hoc profiling capabilities to Rust programs, similar to those provided by DHAT.

The heap profiling works by using a global allocator that wraps the system allocator, tracks all heap allocations, and on program exit writes data to file so it can be viewed with DHAT's viewer. This corresponds to DHAT's --mode=heap mode.

The ad hoc profiling is via a second mode of operation, where ad hoc events can be manually inserted into a Rust program for aggregation and viewing. This corresponds to DHAT's --mode=ad-hoc mode.

Motivation

DHAT is a powerful heap profiler that comes with Valgrind. This crate is a related but alternative choice for heap profiling Rust programs. DHAT and this crate have the following differences.

  • This crate works on any platform, while DHAT only works on some platforms (Linux, mostly). (Note that DHAT's viewer is just HTML+JS+CSS and should work in any modern web browser on any platform.)
  • This crate causes a much smaller slowdown than DHAT.
  • This crate requires some modifications to a program's source code and recompilation, while DHAT does not.
  • This crate cannot track memory accesses the way DHAT does, because it does not instrument all memory loads and stores.
  • This crate does not provide profiling of copy functions such as memcpy and strcpy, unlike DHAT.
  • The backtraces produced by this crate may be better than those produced by DHAT.

Configuration

In your Cargo.toml file, as well as specifying dhat as a dependency, you should enable source line debug info:

[profile.release]
debug = 1

Usage (heap profiling)

For heap profiling, enable the global allocator by adding this code to your program:

use dhat::{Dhat, DhatAlloc};

#[global_allocator]
static ALLOCATOR: DhatAlloc = DhatAlloc;

Then add the following code to the very start of your main function:

let _dhat = Dhat::start_heap_profiling();

DhatAlloc is slower than the system allocator, so it should only be enabled while profiling.

Usage (ad hoc profiling)

Ad hoc profiling involves manually annotating hot code points and then aggregating the executed annotations in some fashion.

To do this, add the following code to the very start of your main function:

 let _dhat = Dhat::start_ad_hoc_profiling();

Then insert calls like this at points of interest:

dhat::ad_hoc_event(100);

For example, imagine you have a hot function that is called from many call sites. You might want to know how often it is called and which other functions called it the most. In that case, you would add a ad_hoc_event call to that function, and the data collected by this crate and viewed with DHAT's viewer would show you exactly what you want to know.

The meaning of the integer argument to ad_hoc_event will depend on exactly what you are measuring. If there is no meaningful weight to give to an event, you can just use 1.

Running

For both heap profiling and ad hoc profiling, the program will run normally. When the Dhat value is dropped at the end of main, some basic information will be printed to stderr, like so:

dhat: Total:     1,256 bytes in 6 blocks
dhat: At t-gmax: 1,256 bytes in 6 blocks
dhat: At t-end:  1,256 bytes in 6 blocks
dhat: The data in dhat-heap.json is viewable with dhat/dh_view.html

A file called dhat-heap.json (for heap profiling) or dhat-ad-hoc.json (for ad hoc profiling) will be written. It can be viewed in DHAT's viewer.

Viewing

Open DHAT's viewer (dhat/dh_view.html) in a web browser, and click on the "Load…" button to load dhat-heap.json or dhat-ad-hoc.json. Note that you must be using DHAT from Valgrind 3.17 or later. At the time of writing, it is unreleased and must be obtained from the Valgrind repository.

DHAT's viewer shows a tree with nodes that look like this.

PP 1.1/6 {
  Total:     1,024 bytes (81.53%, 3,335,504.89/s) in 1 blocks (16.67%, 3,257.33/s), avg size 1,024 bytes, avg lifetime 61 µs (19.87% of program duration)
  Max:       1,024 bytes in 1 blocks, avg size 1,024 bytes
  At t-gmax: 1,024 bytes (81.53%) in 1 blocks (16.67%), avg size 1,024 bytes
  At t-end:  1,024 bytes (81.53%) in 1 blocks (16.67%), avg size 1,024 bytes
  Allocated at {
    #1: 0x10c1e4108: <alloc::alloc::Global as core::alloc::AllocRef>::alloc (alloc.rs:203:9)
    #2: 0x10c1e4108: alloc::raw_vec::RawVec<T,A>::allocate_in (raw_vec.rs:186:45)
    #3: 0x10c1e4108: alloc::raw_vec::RawVec<T,A>::with_capacity_in (raw_vec.rs:161:9)
    #4: 0x10c1e4108: alloc::raw_vec::RawVec<T>::with_capacity (raw_vec.rs:92:9)
    #5: 0x10c1e4108: alloc::vec::Vec<T>::with_capacity (vec.rs:355:20)
    #6: 0x10c1e4108: std::io::buffered::BufWriter<W>::with_capacity (buffered.rs:517:46)
    #7: 0x10c1e4108: std::io::buffered::LineWriter<W>::with_capacity (buffered.rs:925:29)
    #8: 0x10c1e4108: std::io::buffered::LineWriter<W>::new (buffered.rs:905:9)
    #9: 0x10c1e4108: std::io::stdio::stdout::stdout_init (stdio.rs:543:65)
    #10: 0x10c1e4108: std::io::lazy::Lazy<T>::init (lazy.rs:57:19)
    #11: 0x10c1e4108: std::io::lazy::Lazy<T>::get (lazy.rs:33:18)
    #12: 0x10c1e4108: std::io::stdio::stdout (stdio.rs:536:25)
    #13: 0x10c1e4ccb: std::io::stdio::print_to::{{closure}} (stdio.rs:890:13)
    #14: 0x10c1e4ccb: std::thread::local::LocalKey<T>::try_with (local.rs:265:16)
    #15: 0x10c1e4ccb: std::io::stdio::print_to (stdio.rs:879:18)
    #16: 0x10c1e4ccb: std::io::stdio::_print (stdio.rs:907:5)
    #17: 0x10c0d6826: heap::main (heap.rs:9:5)
    #18: 0x10c0d6a3e: core::ops::function::FnOnce::call_once (function.rs:227:5)
    #19: 0x10c0d65e1: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:137:18)
    #20: 0x10c0d6674: std::rt::lang_start::{{closure}} (rt.rs:66:18)
    #21: 0x10c1ea1f0: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once (function.rs:259:13)
    #22: 0x10c1ea1f0: std::panicking::try::do_call (panicking.rs:373:40)
    #23: 0x10c1ea1f0: std::panicking::try (panicking.rs:337:19)
    #24: 0x10c1ea1f0: std::panic::catch_unwind (panic.rs:379:14)
    #25: 0x10c1ea1f0: std::rt::lang_start_internal (rt.rs:51:25)
    #26: 0x10c0d6651: std::rt::lang_start (rt.rs:65:5)
    #27: 0x10c0d69c2: _main (???:0:0)
  }
}

Full details about the output are in the DHAT documentation.

Note that DHAT uses the word "block" rather than "allocation" to refer to the memory allocated by a single heap allocation operation.

When heap profiling, this crate doesn't track memory accesses (unlike DHAT) and so the "reads" and "writes" measurements are not shown within DHAT's viewer, and "sort metric" views involving reads, writes, or accesses are not available.

The backtraces produced by this crate are trimmed at the front reduce output file sizes and improve readability in DHAT's viewer. Only one allocation-related frame will be shown at the top of the backtrace. That frame may be a function within alloc::alloc, a function within this crate, or a global allocation function like __rg_alloc.

Structs

Dhat

A type whose scope dictates the start and end of profiling.

DhatAlloc

A global allocator that tracks allocations and deallocations on behalf of the Dhat type.

Functions

ad_hoc_event

Register an event during ad hoc profiling. Has no effect unless a Dhat value that was created with Dhat::start_ad_hoc_profiling is in scope. The meaning of the weight argument is determined by the user.