Expand description
dsfb-gpu-debug bench-gpu-scale — R.7 money-table headline benchmark
- R.8 bottleneck profiler.
Two distinct entry modes share this subcommand:
- Default mode (R.7) drives the panel-locked scale sweep that
produces the headline
reports/money_table.txt. Each row pairs one GPU dispatch path (Layer A device evidence fabric / Layer B throughput verdict summary / Layer C full audit court) against the same fixture’s CPU Layer B baseline so the speedup column is reproducible. --detail-stagemode (R.8) skips the money-table sweep and instead runs the per-stage bottleneck profiler at three K=1 scale points (canonical 16×128, 64×512 mid-scale, 256×4096 full-scale). Each scale point gets its ownreports/r8_bottleneck_<grid>_K1.txtwith a table of(stage, median µs, % of wall)and the top 3 stages by absolute time. Honest scope note: K>1 batched per-stage timings need a separate_timedbatched FFI (deferred); the K=1 percent breakdown is the proxy R.8 uses for the K=64 row in R.7 because the same kernels run withblockIdx.z = Kat batched scale.
R.7 rows (panel-locked):
- Canonical 16×128, K=32: Layer A, Layer B, Layer C CPU, Layer C GPU
- Scale-large 256×4096, K ∈ {1, 16, 64, 128}: CPU Layer B, GPU
Layer A, GPU Layer B, and Layer C if feasible. K=128 only runs
if the
BatchedGpuWorkspaceallocation succeeds; otherwise the row is marked “not run: alloc refused” and the rest of the sweep continues.
Session-level fields recorded once at the top of the R.7 report:
graph_status— outcome of an opt-inbuild_gpu_throughput_graph_or_demotecall at canonical scale. Eithercapturedordemotedwith a short reason. The graph itself does not drive the bench rows (the rows go through the pre-existing layer dispatch paths); the status is recorded so the case file’s launch-plan provenance can be audited later.graph_plan_hash— the captured topology’s canonical hash, when capture succeeds. Reported as 64 hex chars; absent on demoted.
Output:
- Console: R.7 prints a
=== R.7 Money Table ===block per row plus a final summary table; R.8 prints a=== R.8 Bottleneck Profile ===block per scale point. - Files: R.7 writes
reports/money_table.txt; R.8 writesreports/r8_bottleneck_<grid>_K<K>.txtper scale point.
Honest reporting: every number printed is measured. Rows that fail
to run print n/a in the speedup column and a short reason in the
same row. The R doctrine forbids fabricated numbers; this file
enforces that by only writing rows the bench actually completed.
Functions§
- parse_
and_ run - Run R.7 with the user-supplied CLI flags. Supported flags: