General matrix multiplication for f32, f64 matrices. Operates on matrices with general layout (they can use arbitrary row and column stride).
This crate uses the same macro/microkernel approach to matrix multiplication as the BLIS project.
We presently provide a few good microkernels, portable and for x8664, and only one operation: the general matrixmatrix multiplication (“gemm”).
Matrix Representation
matrixmultiply supports matrices with general stride, so a matrix is passed using a pointer and four integers:
a: *const f32
, pointer to the first element in the matrixm: usize
, number of rowsk: usize
, number of columnsrsa: isize
, row stridecsa: isize
, column stride
In this example, A is a m by k matrix. a
is a pointer to the element at
index 0, 0.
The row stride is the pointer offset (in number of elements) to the element on the next row. It’s the distance from element i, j to i + 1, j.
The column stride is the pointer offset (in number of elements) to the element in the next column. It’s the distance from element i, j to i, j + 1.
For example for a contiguous matrix, row major strides are rsa=k, csa=1 and column major strides are rsa=1, csa=m.
Strides can be negative or even zero, but for a mutable matrix elements may not alias each other.
Portability and Performance

The default kernels are written in portable Rust and available on all targets. These may depend on autovectorization to perform well.

x86 and x8664 features can be detected at runtime by default or compile time (if enabled), and the crate following kernel variants are implemented:

fma

avx

sse2
Other Notes
The functions in this crate are thread safe, as long as the destination matrix is distinct.