A library that abstracts over SIMD instruction sets, including ones with differing widths.
SIMDeez is designed to allow you to write a function one time and produce SSE2, SSE41, and AVX2 versions of the function.
You can either have the version you want chosen at compile time with cfg
attributes, or at runtime with
target_feature
attributes and using the built in `is_x86_feature_detected!' macro.
SIMDeez is currently in Beta, if there are intrinsics you need that are not currently implemented, create an issue and I'll add them. PRs to add more intrinsics are welcome. Currently things are well fleshed out for i32, i64, f32, and f64 types.
As Rust stabilizes support for Neon and AVX-512 I plan to add those as well.
Refer to the excellent Intel Intrinsics Guide for documentation on these functions:
Features
- SSE2, SSE41, and AVX2
- Can use used with compile time or run time selection
- No runtime overhead
- Uses familiar intel intrinsic naming conventions, easy to port.
_mm_add_ps(a,b)
becomesadd_ps(a,b)
- Fills in missing intrinsics in older APIs with fast SIMD workarounds.
- ceil, floor, round,blend etc
- Can be used by
#[no_std]
projects - Operator overloading:
let sum = va + vb
ors *= s
- Extract or set a single lane with the index operator:
let v1 = v[1];
Compared to stdsimd
- SIMDeez can abstract over differing simd widths. stdsimd does not
- SIMDeez builds on stable rust now, stdsimd does not
Compared to Faster
- SIMDeez can be used with runtime selection, Faster cannot.
- SIMDeez has faster fallbacks for some functions
- SIMDeez does not currently work with iterators, Faster does.
- SIMDeez uses more idiomatic intrinsic syntax while Faster uses more idomatic Rust syntax
- SIMDeez can be used by
#[no_std]
projects - SIMDeez builds on stable rust now, Faster does not.
All of the above could change! Faster seems to generally have the same performance as long as you don't run into some of the slower fallback functions.
Example
// If using runtime feature detection, you will want to be sure this inlines
// so you can leverage target_feature attributes
unsafe
//Call distance as an SSE2 function
unsafe
//Call distance as an SSE41 function
unsafe
//Call distance as an AVX2 function
unsafe ```