ArgMinMax
Efficient argmin & argmax (in 1 function) with SIMD (SSE, AVX(2), AVX512, NEON) for
f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64.
🚀 The function is generic over the type of the array, so it can be used on &[T] or Vec<T> where T can be f161, f32, f64, i8, i16, i32, i64, u8, u16, u32, u64.
🤝 The trait is implemented for slice, Vec, 1D ndarray::ArrayBase2, and apache arrow::PrimitiveArray3.
⚡ Runtime CPU feature detection is used to select the most efficient implementation for the current CPU. This means that the same binary can be used on different CPUs without recompilation.
👀 The SIMD implementation contains no if checks, ensuring that the runtime of the function is independent of the input data its order (best-case = worst-case = average-case).
🪄 Efficient support for f16 and uints: through (bijective aka symmetric) bitwise operations, f16 (optional1) and uints are converted to ordered integers, allowing to use integer SIMD instructions.
1 for f16 you should enable the
"half"feature.
2 for ndarray::ArrayBase you should enable the"ndarray"feature.
3 for arrow::PrimitiveArray you should enable the"arrow"feature.
Installing
Add the following to your Cargo.toml:
[]
= "0.4"
Example usage
use ArgMinMax; // import trait
let arr: = .collect; // create a vector
let = arr.argminmax; // apply extension
println!;
println!;
Features
- "half": support
f16argminmax (through using thehalfcrate). - "ndarray": add
ArgMinMaxtrait tondarrayitsArray1&ArrayView1.
Benchmarks
Benchmarks on my laptop (AMD Ryzen 7 4800U, 1.8 GHz, 16GB RAM) using criterion show that the function is 3-20x faster than the scalar implementation (depending of data type).
See /benches/results.
Run the benchmarks yourself with the following command:
|
Tests
To run the tests use the following command:
Limitations
❗ Does not support NaNs.
Acknowledgements
Some parts of this library are inspired by the great work of minimalrust's argmm project.