Skip to main content

avx512_microkernel

Macro avx512_microkernel 

Source
avx512_microkernel!() { /* proc-macro */ }
Expand description

Generate an AVX-512 row-major C microkernel.

Layout: A is MR×K packed column-major, B is K×NR packed row-major, C is MR×NR row-major with stride ldc.

Strategy for row-major C (MR rows, NR columns):

  • Each C row spans ceil(NR/16) zmm registers
  • Total accumulators = MR * ceil(NR/16)
  • Per K step: load ceil(NR/16) B zmm, broadcast MR A scalars, MR*ceil(NR/16) FMAs

Register budget check (C-CODEGEN-004): accumulators + B loads + headroom <= 32 zmm