Module analysis

Source

Modules§

baseline: cgp baseline — Save and load performance baselines. Baselines stored in .cgp-baselines/ directory.
bench: cgp bench — Enhanced criterion benchmarking with hardware counters. Spec section 2.3: run cargo bench, capture criterion output, optionally overlay perf stat counters, check regression.
compare: cgp profile compare — Cross-backend comparison. Spec section 2.2: run the same workload across multiple backends and produce a comparison table with TFLOP/s, bandwidth, and speedup ratios.
compete: cgp compete — Head-to-head competitor comparison. Spec section 2.5: run two or more commands, measure wall time, compute TFLOP/s, and produce a comparison table.
contracts: Performance contract verification (CI/CD gate). Extends provable-contracts framework to performance bounds. See spec section 3.4 and 7.1.
diff: Profile diff: compare two CGP profiles and detect regressions. Spec section 2.6: cgp diff. Must complete in <100ms for two saved JSONs (FALSIFY-CGP-062).
explain: cgp explain — Static code analysis for PTX, SIMD assembly, and WGSL shaders. Spec section 2.7: wraps trueno-explain or performs inline analysis. Detects register pressure, instruction mix, and common performance pitfalls.
muda: Muda (Waste) Detection Engine — Seven categories of GPU compute waste. Mapped from Toyota Production System (Ohno, 1988) [7].
regression: Regression detection using bootstrap confidence intervals. Methodology from Hoefler & Belli (2015) [8]: “Scientific Benchmarking of Parallel Computing Systems.” Also supports PELT changepoint detection [43].
roofline: Roofline model implementation per Williams, Waterman & Patterson (2009) [4]. Supports hierarchical GPU roofline per Yang et al. (2020) [13]. Uses Empirical Roofline Toolkit (ERT) methodology [6].

Module analysis

Module analysis Copy item path

Modules§

Module analysis