A library that abstracts over SIMD instruction sets, including ones with differing widths.
SIMDeez is designed to allow you to write a function one time and produce SSE2, SSE41, and AVX2 versions of the function.
You can either have the version you want chosen at compile time with cfg
attributes, or at runtime with
target_feature
attributes and using the built in is_x86_feature_detected!
macro.
SIMDeez is currently in Beta, if there are intrinsics you need that are not currently implemented, create an issue and I'll add them. PRs to add more intrinsics are welcome. Currently things are well fleshed out for i32, i64, f32, and f64 types.
As Rust stabilizes support for Neon and AVX-512 I plan to add those as well.
Refer to the excellent Intel Intrinsics Guide for documentation on these functions:
Features
- SSE2, SSE41, and AVX2
- Can be used with compile time or run time selection
- No runtime overhead
- Uses familiar intel intrinsic naming conventions, easy to port.
_mm_add_ps(a,b)
becomesadd_ps(a,b)
- Fills in missing intrinsics in older APIs with fast SIMD workarounds.
- ceil, floor, round,blend etc
- Can be used by
#[no_std]
projects - Operator overloading:
let sum = va + vb
ors *= s
- Extract or set a single lane with the index operator:
let v1 = v[1];
- Falls all the way back to scalar code for platforms with no SIMD or unsupported SIMD
Compared to stdsimd
- SIMDeez can abstract over differing simd widths. stdsimd does not
- SIMDeez builds on stable rust now, stdsimd does not
Compared to Faster
- SIMDeez can be used with runtime selection, Faster cannot.
- SIMDeez has faster fallbacks for some functions
- SIMDeez does not currently work with iterators, Faster does.
- SIMDeez uses more idiomatic intrinsic syntax while Faster uses more idomatic Rust syntax
- SIMDeez builds on stable rust now, Faster does not.
All of the above could change! Faster seems to generally have the same performance as long as you don't run into some of the slower fallback functions.
Example
// When using runtime feature detection we need to be sure this inlines into each specific
// function using a given `feature_target` or intrinsics will get downgraded
// All intrinsics are unsafe, so functions using them must be unsafe or
// you must wrap all calls with unsafe blocks.
unsafe
//Call distance as an SSE2 function
unsafe
//Call distance as an SSE41 function
unsafe
//Call distance as an AVX2 function
unsafe