matmul 0.1.0

Fast matrix multiplication in Rust with AVX2/AVX-512 SIMD, built from scratch
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//! Cache-blocked GEMM implementations.
//!
//! These functions break the matrix multiplication into tiles that fit in
//! L1/L2 cache, pack the data for sequential access, then call the SIMD
//! microkernels for the inner computation.
//!
//! Available implementations:
//! - `gemm_4x4`: Uses 4×4 AVX2 kernel
//! - `gemm_12x4`: Uses 12×4 AVX2 kernel (better throughput)
//! - `gemm_8x8`: Uses 8×8 AVX-512 kernel

pub mod gemm_12x4;
pub mod gemm_4x4;
pub mod gemm_8x8;
pub mod simple_simd;