BLAS implementation in rust
Only Level1 functions and micro kernels are optimized with platform specific code.
Level3 functions are parallelized with rayon.
These cpus have optimized implementations for them
- x86_64 cpus with fma support
Anyone can contribute anything as they see fit. Just don't forget to run
cargo clippy and
cargo fmt before commiting
Architecture and algorithms are heavily inspired by: