Skip to main content

Module profilers

Module profilers 

Source
Expand description

Backend profilers for all compute modalities. Wraps ncu, nsys, CUPTI, perf stat, wgpu timestamps, wasmtime, and more.

Modules§

cuda
CUDA profiler: wraps ncu, nsys, and CUPTI. See spec sections 4.1.1 (ncu), 4.1.2 (nsys), 4.1.3 (CUPTI).
neon
ARM NEON SIMD profiling. Spec section 4.5. Uses perf stat with ARM PMU counters on aarch64 hosts. On x86 hosts, reports graceful error per FALSIFY-CGP-071.
quant
Quantized kernel profiler (Q4K/Q6K CPU). Spec section 4.7. Profiles trueno’s fused dequantization + GEMV CPU kernels.
rayon_parallel
Rayon parallel profiling. Spec section 4.9. Measures parallel efficiency, work stealing, and load balance (Heijunka score).
scalar
Scalar baseline profiling via criterion + perf stat. Spec section 4.4. Establishes the baseline for all speedup calculations.
simd
CPU SIMD profiling via perf stat + renacer + trueno-explain. Spec section 4.2.
system
System health and VRAM collection via nvidia-smi and /proc. Spec sections 9.8 (VRAM), 9.10 (System Health), 9.11 (Energy).
wasm
WASM SIMD128 profiling via wasmtime. Spec section 4.6. Uses wasmtime’s fuel metering for deterministic instruction counting.
wgpu_profiler
Cross-platform GPU profiling via wgpu timestamp queries. Spec section 4.3. Supports Vulkan, Metal, DX12, and WebGPU.