Expand description
Backend profilers for all compute modalities. Wraps ncu, nsys, CUPTI, perf stat, wgpu timestamps, wasmtime, and more.
Modules§
- cuda
- CUDA profiler: wraps ncu, nsys, and CUPTI. See spec sections 4.1.1 (ncu), 4.1.2 (nsys), 4.1.3 (CUPTI).
- neon
- ARM NEON SIMD profiling. Spec section 4.5. Uses perf stat with ARM PMU counters on aarch64 hosts. On x86 hosts, reports graceful error per FALSIFY-CGP-071.
- quant
- Quantized kernel profiler (Q4K/Q6K CPU). Spec section 4.7. Profiles trueno’s fused dequantization + GEMV CPU kernels.
- rayon_
parallel - Rayon parallel profiling. Spec section 4.9. Measures parallel efficiency, work stealing, and load balance (Heijunka score).
- scalar
- Scalar baseline profiling via criterion + perf stat. Spec section 4.4. Establishes the baseline for all speedup calculations.
- simd
- CPU SIMD profiling via perf stat + renacer + trueno-explain. Spec section 4.2.
- system
- System health and VRAM collection via nvidia-smi and /proc. Spec sections 9.8 (VRAM), 9.10 (System Health), 9.11 (Energy).
- wasm
- WASM SIMD128 profiling via wasmtime. Spec section 4.6. Uses wasmtime’s fuel metering for deterministic instruction counting.
- wgpu_
profiler - Cross-platform GPU profiling via wgpu timestamp queries. Spec section 4.3. Supports Vulkan, Metal, DX12, and WebGPU.