# tract-linalg
linalg stands for "linear algebra". This is a misnamer. This crates contains
low-level, architecture dependant optimisations used by tract-core.
# Functions
* MatMatMul: Extended matrix*matrix product:
* inspired by Gotoblass and BLIS micro kernel approach
* extended for convolution friendly addressing (fused img2col)
* fused output pipeline (min, max, and a few more simple, fast ops)
* f32*f32 -> f32 (à la sgemm)
* i8*i8 -> i32 accumulator -> i32 storage
* i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
* f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
* byte-to-byte lookup table
# Implementations
| MatMatMul f32 | | 4x4 | 8x4 | 8x8 | 16x6
| MatMatMul i8->i8 | | | 8x4 | | 8x8
| MatMatMul i8->i32 | | | | | 8x8
| sigmoid f32 | | | 4n | 4n |
| tanh f32 | | | 4n | 4n |
| byte lookup | | | | |