1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
/*!
# microgemm
[![github]](https://github.com/cospectrum/microgemm)
[![latest_version]][crates.io]
[![docs.rs]](https://docs.rs/microgemm)
[github]: https://img.shields.io/badge/github-cospectrum/microgemm-8da0cb?logo=github
[latest_version]: https://img.shields.io/crates/v/microgemm.svg?logo=rust
[crates.io]: https://crates.io/crates/microgemm
[docs.rs]: https://img.shields.io/badge/docs.rs-microgemm-66c2a5?logo=docs.rs
General matrix multiplication with custom configuration in Rust. <br>
Supports `no_std` and `no_alloc` environments.
The implementation is based on the [BLIS](https://github.com/flame/blis) microkernel approach.
## Install
```sh
cargo add microgemm
```
## Usage
The [`Kernel`] trait is the main abstraction of `microgemm`.
You can implement it yourself or use [`kernels`] that are already provided out of the box.
[`Kernel`]: crate::Kernel
[`kernels`]: crate::kernels
### gemm
```rust
use microgemm::{kernels::GenericKernel8x8, Kernel as _, MatMut, MatRef, PackSizes};
let kernel = GenericKernel8x8::<f32>::new();
assert_eq!(kernel.mr(), 8);
assert_eq!(kernel.nr(), 8);
let pack_sizes = PackSizes {
mc: 5 * kernel.mr(), // MC must be divisible by MR
kc: 190,
nc: 9 * kernel.nr(), // NC must be divisible by NR
};
let mut packing_buf = vec![0.0; pack_sizes.buf_len()];
let (alpha, beta) = (2.0, -3.0);
let (m, k, n) = (100, 380, 250);
let a = vec![2.0; m * k];
let b = vec![3.0; k * n];
let mut c = vec![4.0; m * n];
let a = MatRef::row_major(m, k, &a);
let b = MatRef::row_major(k, n, &b);
let mut c = MatMut::row_major(m, n, &mut c);
// c <- alpha a b + beta c
kernel.gemm(alpha, a, b, beta, &mut c, pack_sizes, &mut packing_buf);
println!("{:?}", c.as_slice());
```
### Implemented Kernels
| Name | Scalar Types | Target |
| ---- | ------------ | ------ |
| GenericKernelNxN <br> (N: 2, 4, 8, 16, 32) | T: Copy + Zero + One + Mul + Add | Any |
| [`NeonKernel4x4`] | f32 | aarch64 and target feature neon |
| [`NeonKernel8x8`] | f32 | aarch64 and target feature neon |
[`NeonKernel4x4`]: crate::kernels::NeonKernel4x4
[`NeonKernel8x8`]: crate::kernels::NeonKernel8x8
### Custom Kernel Implementation
```rust
use microgemm::{typenum::U4, Kernel, MatMut, MatRef};
struct CustomKernel;
impl Kernel for CustomKernel {
type Scalar = f64;
type Mr = U4;
type Nr = U4;
// dst <- alpha lhs rhs + beta dst
fn microkernel(
&self,
alpha: f64,
lhs: MatRef<f64>,
rhs: MatRef<f64>,
beta: f64,
dst: &mut MatMut<f64>,
) {
// lhs is col-major
assert_eq!(lhs.row_stride(), 1);
assert_eq!(lhs.nrows(), Self::MR);
// rhs is row-major
assert_eq!(rhs.col_stride(), 1);
assert_eq!(rhs.ncols(), Self::NR);
// dst is col-major
assert_eq!(dst.row_stride(), 1);
assert_eq!(dst.nrows(), Self::MR);
assert_eq!(dst.ncols(), Self::NR);
// your microkernel implementation...
}
}
```
## Benchmarks
All benchmarks are performed in a `single thread` on square matrices of dimension `n`.
### f32
`PackSizes { mc: n, kc: n, nc: n }`
#### aarch64 (M1)
```notrust
n NeonKernel8x8 faer matrixmultiply
128 75.5µs 242.6µs 46.2µs
256 466.3µs 3.2ms 518.2µs
512 3ms 15.9ms 2.7ms
1024 23.9ms 128.4ms 22ms
2048 191ms 1s 182.8ms
```
*/
extern crate approx;
extern crate std;
use Global as GlobalAllocator;
pub
pub
pub use typenum;
pub use ;
pub use gemm_with_kernel;
pub use Kernel;
pub use ;
pub use PackSizes;