avx512_microkernel!() { /* proc-macro */ }Expand description
Generate an AVX-512 row-major C microkernel.
Layout: A is MR×K packed column-major, B is K×NR packed row-major,
C is MR×NR row-major with stride ldc.
Strategy for row-major C (MR rows, NR columns):
- Each C row spans ceil(NR/16) zmm registers
- Total accumulators = MR * ceil(NR/16)
- Per K step: load ceil(NR/16) B zmm, broadcast MR A scalars, MR*ceil(NR/16) FMAs
Register budget check (C-CODEGEN-004): accumulators + B loads + headroom <= 32 zmm