microgemm/lib.rs
1/*!
2# microgemm
3[![github]](https://github.com/cospectrum/microgemm)
4[![latest_version]][crates.io]
5[![docs.rs]](https://docs.rs/microgemm)
6
7[github]: https://img.shields.io/badge/github-cospectrum/microgemm-8da0cb?logo=github
8[latest_version]: https://img.shields.io/crates/v/microgemm.svg?logo=rust
9[crates.io]: https://crates.io/crates/microgemm
10[docs.rs]: https://img.shields.io/badge/docs.rs-microgemm-66c2a5?logo=docs.rs
11
12General matrix multiplication with custom configuration in Rust. <br>
13Supports `no_std` and `no_alloc` environments.
14
15The implementation is based on the [BLIS](https://github.com/flame/blis) microkernel approach.
16
17## Install
18```sh
19cargo add microgemm
20```
21
22## Usage
23
24The [`Kernel`] trait is the main abstraction of `microgemm`.
25You can implement it yourself or use [`kernels`] that are already provided out of the box.
26
27[`Kernel`]: crate::Kernel
28[`kernels`]: crate::kernels
29
30### gemm
31
32```rust
33use microgemm::{kernels::GenericKernel8x8, Kernel as _, MatMut, MatRef, PackSizes};
34
35let kernel = GenericKernel8x8::<f32>::new();
36assert_eq!(kernel.mr(), 8);
37assert_eq!(kernel.nr(), 8);
38
39let pack_sizes = PackSizes {
40 mc: 5 * kernel.mr(), // MC must be divisible by MR
41 kc: 190,
42 nc: 9 * kernel.nr(), // NC must be divisible by NR
43};
44let mut packing_buf = vec![0.0; pack_sizes.buf_len()];
45
46let (alpha, beta) = (2.0, -3.0);
47let (m, k, n) = (100, 380, 250);
48
49let a = vec![2.0; m * k];
50let b = vec![3.0; k * n];
51let mut c = vec![4.0; m * n];
52
53let a = MatRef::row_major(m, k, &a);
54let b = MatRef::row_major(k, n, &b);
55let mut c = MatMut::row_major(m, n, &mut c);
56
57// c <- alpha a b + beta c
58kernel.gemm(alpha, a, b, beta, &mut c, pack_sizes, &mut packing_buf);
59println!("{:?}", c.as_slice());
60```
61
62### Implemented Kernels
63
64| Name | Scalar Types | Target |
65| ---- | ------------ | ------ |
66| GenericKernelNxN <br> (N: 2, 4, 8, 16, 32) | T: Copy + Zero + One + Mul + Add | Any |
67| [`NeonKernel4x4`] | f32 | aarch64 and target feature neon |
68| [`NeonKernel8x8`] | f32 | aarch64 and target feature neon |
69
70[`NeonKernel4x4`]: crate::kernels::NeonKernel4x4
71[`NeonKernel8x8`]: crate::kernels::NeonKernel8x8
72
73### Custom Kernel Implementation
74
75```rust
76use microgemm::{typenum::U4, Kernel, MatMut, MatRef};
77
78struct CustomKernel;
79
80impl Kernel for CustomKernel {
81 type Scalar = f64;
82 type Mr = U4;
83 type Nr = U4;
84
85 // dst <- alpha lhs rhs + beta dst
86 fn microkernel(
87 &self,
88 alpha: f64,
89 lhs: MatRef<f64>,
90 rhs: MatRef<f64>,
91 beta: f64,
92 dst: &mut MatMut<f64>,
93 ) {
94 // lhs is col-major
95 assert_eq!(lhs.row_stride(), 1);
96 assert_eq!(lhs.nrows(), Self::MR);
97
98 // rhs is row-major
99 assert_eq!(rhs.col_stride(), 1);
100 assert_eq!(rhs.ncols(), Self::NR);
101
102 // dst is col-major
103 assert_eq!(dst.row_stride(), 1);
104 assert_eq!(dst.nrows(), Self::MR);
105 assert_eq!(dst.ncols(), Self::NR);
106
107 // your microkernel implementation...
108 }
109}
110```
111
112## Benchmarks
113
114All benchmarks are performed in a `single thread` on square matrices of dimension `n`.
115
116### f32
117`PackSizes { mc: n, kc: n, nc: n }`
118
119#### aarch64 (M1)
120```notrust
121 n NeonKernel8x8 faer matrixmultiply
122 128 75.5µs 242.6µs 46.2µs
123 256 466.3µs 3.2ms 518.2µs
124 512 3ms 15.9ms 2.7ms
1251024 23.9ms 128.4ms 22ms
1262048 191ms 1s 182.8ms
127```
128*/
129
130#![no_std]
131
132#[cfg(test)]
133#[macro_use]
134extern crate approx;
135
136#[cfg(test)]
137#[macro_use]
138extern crate std;
139
140#[cfg(test)]
141mod std_prelude {
142 pub use std::prelude::rust_2021::*;
143}
144
145#[cfg(test)]
146use allocator_api2::alloc::Global as GlobalAllocator;
147
148mod gemm;
149mod kernel;
150
151pub(crate) mod packing;
152#[cfg(test)]
153pub(crate) mod utils;
154
155pub mod kernels;
156pub mod mat;
157
158pub use generic_array::typenum;
159pub use num_traits::{One, Zero};
160
161pub(crate) use gemm::gemm_with_kernel;
162
163pub use kernel::Kernel;
164pub use mat::{MatMut, MatRef};
165pub use packing::PackSizes;