Skip to main content

Module matmul

Module matmul 

Source
Expand description

High-performance CPU Matrix Multiplication (GEMM) kernel.

This kernel uses the matrixmultiply crate which implements a BLIS-style macro/microkernel approach with cache-oblivious tiling, SIMD vectorization (AVX/FMA/SSE2/NEON), and optional multithreading.

The kernel strictly respects memory strides. This means if a user transposes a tensor (which is a zero-copy O(1) operation), this kernel correctly reads the memory in transposed order without ever allocating a duplicate buffer.

Functionsยง

matmul_forward
Executes the physical forward pass for 2D Matrix Multiplication: C = A @ B