General matrix multiplication for f32, f64 matrices.
Allows arbitrary row, column strided matrices.
Uses the same microkernel algorithm as BLIS, but in a much simpler and less featureful implementation. See their multithreading page for a very good diagram over how the algorithm partitions the matrix (Note: this crate does not implement multithreading).
matrixmultiply supports matrices with general stride, so a matrix is passed using a pointer and four integers:
a: *const f32, pointer to the first element in the matrix
m: usize, number of rows
k: usize, number of columns
rsa: isize, row stride
csa: isize, column stride
In this example, A is a m by k matrix.
a is a pointer to the element at
index 0, 0.
The row stride is the pointer offset (in number of elements) to the element on the next row. It’s the distance from element i, j to i + 1, j.
The column stride is the pointer offset (in number of elements) to the element in the next column. It’s the distance from element i, j to i, j + 1.
For example for a contiguous matrix, row major strides are rsa=k, csa=1 and column major strides are rsa=1, csa=m.
Stides can be negative or even zero, but for a mutable matrix elements may not alias each other.