Skip to main content

Module kernels

Module kernels 

Source
Expand description

SIMD microkernels for the inner loop of matrix multiplication.

These kernels compute small tiles of C += A × B using AVX2 or AVX-512 intrinsics. They’re called by the blocked GEMM implementations after packing the input matrices into cache-friendly layouts.

Available kernels:

  • kernel_4x4: 4×4 tile, AVX2 (4 registers)
  • kernel_12x4: 12×4 tile, AVX2 (12 registers, better throughput)
  • kernel_8x8: 8×8 tile, AVX-512 (8 registers, 64 outputs per iteration)

Modules§

kernel_4x4
4×4 AVX2 microkernel for matrix multiplication.
kernel_8x8
8×8 AVX-512 microkernel for matrix multiplication.
kernel_12x4
12×4 AVX2 microkernel for matrix multiplication.