A library that abstracts over SIMD instruction sets, including ones with differing widths. SIMDeez is designed to allow you to write a function one time and produce SSE2, SSE41, AVX2, AVX-512, Neon and WebAssembly SIMD versions of the function. You can either have the version you want chosen at compile time or automatically at runtime.
Originally developed by @jackmott. Active maintenance and releases now happen from this repository.
If there are intrinsics you need that are not currently implemented, create an issue and I'll add them. PRs to add more intrinsics are welcome. Currently things are well fleshed out for i32, i64, f32, and f64 types.
AVX-512 support is now included for x86/x86_64 targets with avx512f, avx512bw, and avx512dq. Runtime dispatch will select it ahead of AVX2 when those features are available.
Refer to the excellent Intel Intrinsics Guide for documentation on these functions:
Features
- SSE2, SSE41, AVX2, AVX-512, Neon, WebAssembly SIMD and scalar fallback
- Can be used with compile time or run time selection
- No runtime overhead
- Uses familiar intel intrinsic naming conventions, easy to port.
_mm_add_ps(a,b)becomesadd_ps(a,b)
- Fills in missing intrinsics in older APIs with fast SIMD workarounds.
- ceil, floor, round, blend, etc.
- Can be used by
#[no_std]projects - Operator overloading:
let sum = va + vbors *= s - Extract or set a single lane with the index operator:
let v1 = v[1]; - Falls all the way back to scalar code for platforms with no SIMD or unsupported SIMD
SIMD math
SIMDeez includes a native, pure-Rust SIMD math surface exposed through simdeez::math and re-exported in simdeez::prelude.
Covered families include:
- core log/exp:
log2_u35,exp2_u35,ln_u35,exp_u35 - trig and inverse trig:
sin_u35,cos_u35,tan_u35,asin_u35,acos_u35,atan_u35,atan2_u35 - hyperbolic and inverse hyperbolic:
sinh_u35,cosh_u35,tanh_u35,asinh_u35,acosh_u35,atanh_u35 - binary misc:
log10_u35,hypot_u35,fmod
For implementation notes and benchmark guidance, see SIMD_MATH.md.
Example:
use *;
Compared to packed_simd
- SIMDeez can abstract over differing simd widths. packed_simd does not
- SIMDeez builds on stable rust now, packed_simd does not
Compared to Faster
- SIMDeez can be used with runtime selection, Faster cannot.
- SIMDeez has faster fallbacks for some functions
- SIMDeez does not currently work with iterators, Faster does.
- SIMDeez uses more idiomatic intrinsic syntax while Faster uses more idiomatic Rust syntax
- SIMDeez builds on stable rust now, Faster does not.
All of the above could change! Faster seems to generally have the same performance as long as you don't run into some of the slower fallback functions.
Example
use ;
use *;
// If you want your SIMD function to use use runtime feature detection to call
// the fastest available version, use the simd_runtime_generate macro:
simd_runtime_generate!;
const SIZE: usize = 200;
This will generate the following functions for you:
distance<S:Simd>the generic version of your functiondistance_scalara scalar fallbackdistance_sse2SSE2 versiondistance_sse41SSE41 versiondistance_avx2AVX2 versiondistance_avx512AVX-512 versiondistance_neonNeon versiondistance_wasmWebAssembly SIMD versiondistance_runtime_select// picks the fastest of the above at runtime
You can use any of these you wish, though typically you would use the runtime_select version
unless you want to force an older instruction set to avoid throttling or for other arcane
reasons.
Optionally you can use the simd_compiletime_generate! macro in the same way. This will
produce 2 active functions via the cfg attribute feature:
distance<S:Simd>the generic version of your functiondistance_compiletimethe fastest instruction set availble for the given compile time feature set
You may also forgo the macros if you know what you are doing, just keep in mind there are lots of arcane subtleties with inlining and target_features that must be managed. See how the macros expand for more detail.