Expand description
High-performance CPU Operator Fusion kernels.
Operator fusion combines multiple mathematical operations into a single loop, drastically reducing memory allocations and RAM bandwidth bottlenecks. Uses matrixmultiply for BLIS-style GEMM in the fused linear kernel.
Functionsยง
- add_
relu_ forward - Fused Element-wise Addition and ReLU: f(A, B) = max(0, A + B)
- linear_
forward - Fused Linear Layer (MatMul + Bias): y = X @ W + b Uses BLIS-style cache-tiled GEMM for the matmul portion, then adds bias in a single pass.