pub fn build_program_sharded(
workgroup_size_x: u32,
opcodes: &[OpcodeHandler],
) -> ProgramExpand description
Build the megakernel IR with a custom workgroup size and optional custom opcodes.
Buffers are declared with concrete with_count(...) sizes so the
backend readback layer allocates the right static staging size - a
count=0 default reads back 4 bytes regardless of how much the
kernel wrote.