Expand description
GPU program fusion - collapses multiple sequential vyre Program dispatches
into a single fused program for single-GPU-dispatch execution.
keyhog currently dispatches the AC literal-set program, decode programs,
and MoE scoring programs sequentially. Vyre’s fuse_programs /
fuse_programs_vec merge compatible programs into one fused Program,
eliminating per-dispatch overhead (encoder record, submit, poll) and
enabling cross-program data reuse on-chip.
§Design
At scanner compile time, this module attempts to fuse the AC literal-set
program with any active decode programs into a single vyre::Program.
The fused program is cached alongside individual programs in the same
on-disk cache directory (~/.cache/keyhog/programs/), keyed by a
SHA-256 of the constituent program IR hashes.
If fusion fails (incompatible buffer layouts, over-dispatch geometry, self-aliasing), the module logs the failure and the scanner falls back to sequential dispatch. This is a pure optimization - correctness is never compromised.
§Usage
The fused program is lazily initialized via OnceLock on first access.
CompiledScanner::fused_program() returns Option<&vyre::Program>.
The dispatch path in gpu_dispatch.rs checks for the fused program
first and uses it in preference to sequential individual dispatches.