tract-linalg 0.12.0

Tiny, no-nonsense, self contained, TensorFlow and ONNX inference
Documentation
# tract-linalg

linalg stands for "linear algebra". This is a misnamer. This crates contains
low-level, architecture dependant optimisations used by tract-core.

# Functions

* MatMatMul: Extended matrix*matrix product:
    * inspired by Gotoblass and BLIS micro kernel approach
    * extended for convolution friendly addressing (fused img2col)
    * fused output pipeline (min, max, and a few more simple, fast ops)
    * f32*f32 -> f32 (à la sgemm)
    * i8*i8 -> i32 accumulator -> i32 storage
    * i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
* f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
* byte-to-byte lookup table

# Implementations

|                   |  generic fallback  |   armv6, vfp  |     armv7 neon    |    armv8 simd     |     x64 FMA
|-------------------|--------------------|---------------|-------------------|-------------------|-----------------
| MatMatMul f32     |                    |      4x4      |         8x4       |       8x8         |       16x6
| MatMatMul i8->i8  |                    |               |         8x4       |                   |        8x8
| MatMatMul i8->i32 |                    |               |                   |                   |        8x8
| sigmoid f32       |                    |               |         4n        |        4n         |
| tanh f32          |                    |               |         4n        |        4n         |
| byte lookup       |                    |               |                   |                   |