Skip to main content

Module kernels

Module kernels

Expand description

SIMD microkernels for the inner loop of matrix multiplication.

These kernels compute small tiles of C += A × B using AVX2 or AVX-512 intrinsics. They’re called by the blocked GEMM implementations after packing the input matrices into cache-friendly layouts.

Available kernels:

kernel_4x4: 4×4 tile, AVX2 (4 registers)
kernel_12x4: 12×4 tile, AVX2 (12 registers, better throughput)
kernel_8x8: 8×8 tile, AVX-512 (8 registers, 64 outputs per iteration)

Modules§

kernel_4x4: 4×4 AVX2 microkernel for matrix multiplication.
kernel_8x8: 8×8 AVX-512 microkernel for matrix multiplication.
kernel_12x4: 12×4 AVX2 microkernel for matrix multiplication.