microgemm/
lib.rs

1/*!
2# microgemm
3[![github]](https://github.com/cospectrum/microgemm)
4[![latest_version]][crates.io]
5[![docs.rs]](https://docs.rs/microgemm)
6
7[github]: https://img.shields.io/badge/github-cospectrum/microgemm-8da0cb?logo=github
8[latest_version]: https://img.shields.io/crates/v/microgemm.svg?logo=rust
9[crates.io]: https://crates.io/crates/microgemm
10[docs.rs]: https://img.shields.io/badge/docs.rs-microgemm-66c2a5?logo=docs.rs
11
12General matrix multiplication with custom configuration in Rust. <br>
13Supports `no_std` and `no_alloc` environments.
14
15The implementation is based on the [BLIS](https://github.com/flame/blis) microkernel approach.
16
17## Install
18```sh
19cargo add microgemm
20```
21
22## Usage
23
24The [`Kernel`] trait is the main abstraction of `microgemm`.
25You can implement it yourself or use [`kernels`] that are already provided out of the box.
26
27[`Kernel`]: crate::Kernel
28[`kernels`]: crate::kernels
29
30### gemm
31
32```rust
33use microgemm::{kernels::GenericKernel8x8, Kernel as _, MatMut, MatRef, PackSizes};
34
35let kernel = GenericKernel8x8::<f32>::new();
36assert_eq!(kernel.mr(), 8);
37assert_eq!(kernel.nr(), 8);
38
39let pack_sizes = PackSizes {
40    mc: 5 * kernel.mr(), // MC must be divisible by MR
41    kc: 190,
42    nc: 9 * kernel.nr(), // NC must be divisible by NR
43};
44let mut packing_buf = vec![0.0; pack_sizes.buf_len()];
45
46let (alpha, beta) = (2.0, -3.0);
47let (m, k, n) = (100, 380, 250);
48
49let a = vec![2.0; m * k];
50let b = vec![3.0; k * n];
51let mut c = vec![4.0; m * n];
52
53let a = MatRef::row_major(m, k, &a);
54let b = MatRef::row_major(k, n, &b);
55let mut c = MatMut::row_major(m, n, &mut c);
56
57// c <- alpha a b + beta c
58kernel.gemm(alpha, a, b, beta, &mut c, pack_sizes, &mut packing_buf);
59println!("{:?}", c.as_slice());
60```
61
62### Implemented Kernels
63
64| Name | Scalar Types | Target |
65| ---- | ------------ | ------ |
66| GenericKernelNxN <br> (N: 2, 4, 8, 16, 32) | T: Copy + Zero + One + Mul + Add | Any |
67| [`NeonKernel4x4`] | f32 | aarch64 and target feature neon |
68| [`NeonKernel8x8`] | f32 | aarch64 and target feature neon |
69
70[`NeonKernel4x4`]: crate::kernels::NeonKernel4x4
71[`NeonKernel8x8`]: crate::kernels::NeonKernel8x8
72
73### Custom Kernel Implementation
74
75```rust
76use microgemm::{typenum::U4, Kernel, MatMut, MatRef};
77
78struct CustomKernel;
79
80impl Kernel for CustomKernel {
81    type Scalar = f64;
82    type Mr = U4;
83    type Nr = U4;
84
85    // dst <- alpha lhs rhs + beta dst
86    fn microkernel(
87        &self,
88        alpha: f64,
89        lhs: MatRef<f64>,
90        rhs: MatRef<f64>,
91        beta: f64,
92        dst: &mut MatMut<f64>,
93    ) {
94        // lhs is col-major
95        assert_eq!(lhs.row_stride(), 1);
96        assert_eq!(lhs.nrows(), Self::MR);
97
98        // rhs is row-major
99        assert_eq!(rhs.col_stride(), 1);
100        assert_eq!(rhs.ncols(), Self::NR);
101
102        // dst is col-major
103        assert_eq!(dst.row_stride(), 1);
104        assert_eq!(dst.nrows(), Self::MR);
105        assert_eq!(dst.ncols(), Self::NR);
106
107        // your microkernel implementation...
108    }
109}
110```
111
112## Benchmarks
113
114All benchmarks are performed in a `single thread` on square matrices of dimension `n`.
115
116### f32
117`PackSizes { mc: n, kc: n, nc: n }`
118
119####  aarch64 (M1)
120```notrust
121   n  NeonKernel8x8           faer matrixmultiply
122 128         75.5µs        242.6µs         46.2µs
123 256        466.3µs          3.2ms        518.2µs
124 512            3ms         15.9ms          2.7ms
1251024         23.9ms        128.4ms           22ms
1262048          191ms             1s        182.8ms
127```
128*/
129
130#![no_std]
131
132#[cfg(test)]
133#[macro_use]
134extern crate approx;
135
136#[cfg(test)]
137#[macro_use]
138extern crate std;
139
140#[cfg(test)]
141mod std_prelude {
142    pub use std::prelude::rust_2021::*;
143}
144
145#[cfg(test)]
146use allocator_api2::alloc::Global as GlobalAllocator;
147
148mod gemm;
149mod kernel;
150
151pub(crate) mod packing;
152#[cfg(test)]
153pub(crate) mod utils;
154
155pub mod kernels;
156pub mod mat;
157
158pub use generic_array::typenum;
159pub use num_traits::{One, Zero};
160
161pub(crate) use gemm::gemm_with_kernel;
162
163pub use kernel::Kernel;
164pub use mat::{MatMut, MatRef};
165pub use packing::PackSizes;