Expand description
Host-side NVTX profiling annotations for zer, consumed by nsys.
Provides macros that wrap a block with RAII NVTX ranges visible in the
Nsight Systems (nsys) timeline:
| Macro | NVTX name | Active when | Use for |
|---|---|---|---|
trace! | {name} | any feature | CPU and GPU host regions |
trace_cuda! | "CUDA: {name}" | cuda feature only | CUDA kernel dispatch sites |
trace_vulkan! | "VULKAN: {shader}" | vulkan feature only | Vulkan shader dispatch sites |
trace_cuda! lets ncu filter to CUDA-specific regions:
ncu --nvtx --nvtx-include "regex:^CUDA:.*" ./your_binary
trace_vulkan! lets ncu filter to Vulkan shader regions:
ncu --nvtx --nvtx-include "regex:^GPU:.*" ./your_binary
Both macros are zero-cost no-ops when no feature is compiled in.
§Feature flags
| Feature | Effect |
|---|---|
nvtx | Activates NVTX standalone, without any compute backend |
cuda | Activates NVTX; trace_cuda! active; trace_vulkan! is a no-op |
vulkan | Activates NVTX; trace_vulkan! active; trace_cuda! is a no-op |
avx2 | Activates NVTX; trace_cuda! and trace_vulkan! are no-ops |
cpu | Activates NVTX; trace_cuda! and trace_vulkan! are no-ops |
| (none) | All macros expand to bare blocks, zero overhead, no link dep |
§Usage
ⓘ
zer_prof::init(); // call once at the start of main()
// Host-side region, visible in nsys timeline for all backends.
let vectors = zer_prof::trace!("compare_batch", {
comparator.compare_batch(&pairs, &schema)
});
// CUDA kernel dispatch, filtered by ncu --nvtx-include "regex:^CUDA:.*".
let out = zer_prof::trace_cuda!("em_reduce_mstep", {
backend.run::<EmReduce>(input)
})?;
// Vulkan shader dispatch, filtered by ncu --nvtx-include "regex:^GPU:.*".
let out = zer_prof::trace_vulkan!("compare_fields", {
backend.run::<CompareFields>(input)
})?;Macros§
Functions§
- init
- Initialise profiling state.