Skip to main content

Crate zer_prof

Crate zer_prof 

Source
Expand description

Host-side NVTX profiling annotations for zer, consumed by nsys.

Provides macros that wrap a block with RAII NVTX ranges visible in the Nsight Systems (nsys) timeline:

MacroNVTX nameActive whenUse for
trace!{name}any featureCPU and GPU host regions
trace_cuda!"CUDA: {name}"cuda feature onlyCUDA kernel dispatch sites
trace_vulkan!"VULKAN: {shader}"vulkan feature onlyVulkan shader dispatch sites

trace_cuda! lets ncu filter to CUDA-specific regions:

  • ncu --nvtx --nvtx-include "regex:^CUDA:.*" ./your_binary

trace_vulkan! lets ncu filter to Vulkan shader regions:

  • ncu --nvtx --nvtx-include "regex:^GPU:.*" ./your_binary

Both macros are zero-cost no-ops when no feature is compiled in.

§Feature flags

FeatureEffect
nvtxActivates NVTX standalone, without any compute backend
cudaActivates NVTX; trace_cuda! active; trace_vulkan! is a no-op
vulkanActivates NVTX; trace_vulkan! active; trace_cuda! is a no-op
avx2Activates NVTX; trace_cuda! and trace_vulkan! are no-ops
cpuActivates NVTX; trace_cuda! and trace_vulkan! are no-ops
(none)All macros expand to bare blocks, zero overhead, no link dep

§Usage

zer_prof::init();  // call once at the start of main()

// Host-side region, visible in nsys timeline for all backends.
let vectors = zer_prof::trace!("compare_batch", {
    comparator.compare_batch(&pairs, &schema)
});

// CUDA kernel dispatch, filtered by ncu --nvtx-include "regex:^CUDA:.*".
let out = zer_prof::trace_cuda!("em_reduce_mstep", {
    backend.run::<EmReduce>(input)
})?;

// Vulkan shader dispatch, filtered by ncu --nvtx-include "regex:^GPU:.*".
let out = zer_prof::trace_vulkan!("compare_fields", {
    backend.run::<CompareFields>(input)
})?;

Macros§

trace
trace_cuda
trace_vulkan

Functions§

init
Initialise profiling state.