Skip to main content

Module analysis

Module analysis 

Source

Modules§

baseline
cgp baseline — Save and load performance baselines. Baselines stored in .cgp-baselines/ directory.
bench
cgp bench — Enhanced criterion benchmarking with hardware counters. Spec section 2.3: run cargo bench, capture criterion output, optionally overlay perf stat counters, check regression.
compare
cgp profile compare — Cross-backend comparison. Spec section 2.2: run the same workload across multiple backends and produce a comparison table with TFLOP/s, bandwidth, and speedup ratios.
compete
cgp compete — Head-to-head competitor comparison. Spec section 2.5: run two or more commands, measure wall time, compute TFLOP/s, and produce a comparison table.
contracts
Performance contract verification (CI/CD gate). Extends provable-contracts framework to performance bounds. See spec section 3.4 and 7.1.
diff
Profile diff: compare two CGP profiles and detect regressions. Spec section 2.6: cgp diff. Must complete in <100ms for two saved JSONs (FALSIFY-CGP-062).
explain
cgp explain — Static code analysis for PTX, SIMD assembly, and WGSL shaders. Spec section 2.7: wraps trueno-explain or performs inline analysis. Detects register pressure, instruction mix, and common performance pitfalls.
muda
Muda (Waste) Detection Engine — Seven categories of GPU compute waste. Mapped from Toyota Production System (Ohno, 1988) [7].
regression
Regression detection using bootstrap confidence intervals. Methodology from Hoefler & Belli (2015) [8]: “Scientific Benchmarking of Parallel Computing Systems.” Also supports PELT changepoint detection [43].
roofline
Roofline model implementation per Williams, Waterman & Patterson (2009) [4]. Supports hierarchical GPU roofline per Yang et al. (2020) [13]. Uses Empirical Roofline Toolkit (ERT) methodology [6].