Module matmul

Expand description

High-performance CPU Matrix Multiplication (GEMM) kernel.

This kernel uses the matrixmultiply crate which implements a BLIS-style macro/microkernel approach with cache-oblivious tiling, SIMD vectorization (AVX/FMA/SSE2/NEON), and optional multithreading.

The kernel strictly respects memory strides. This means if a user transposes a tensor (which is a zero-copy O(1) operation), this kernel correctly reads the memory in transposed order without ever allocating a duplicate buffer.

Functions§

matmul_forward: Executes the physical forward pass for 2D Matrix Multiplication: C = A @ B

Module matmul

Module matmul Copy item path

Functions§

Module matmul