You want to use
std::simd but realized there is no simple, safe and fast way to align your
f32x8 (and friends) in memory and treat them as regular
f32 slices for easy loading and manipulation;
simd_aligned to the rescue.
- built on top of
std::simdfor easy data handling
- supports everything from
- think in flat slices (
&[f32]), but get performance of properly aligned SIMD vectors (
f36sas "best guess" for current platform (WIP)
- provides N-dimensional
Note: Right now this is an experimental crate. Features might be added or removed depending on how
std::simd evolves. At the end of the day it's just about being able to load and manipulate data without much fuzz.
Produces a vector that can hold
10 elements of type
f64. Might internally
5 elements of type
3 of type
f64x4, depending on the platform.
All elements are guaranteed to be properly aligned for fast access.
use packed_simd::*; use simd_aligned::*; // Create vectors of `10` f64 elements with value `0.0`. let mut v1 = VectorD::<f64s>::with(0.0, 10); let mut v2 = VectorD::<f64s>::with(0.0, 10); // Get "flat", mutable view of the vector, and set individual elements: let v1_m = v1.flat_mut(); let v2_m = v2.flat_mut(); // Set some elements on v1 v1_m = 0.0; v1_m = 4.0; v1_m = 8.0; // Set some others on v2 v2_m = 0.0; v2_m = 5.0; v2_m = 9.0; let mut sum = f64s::splat(0.0); // Eventually, do something with the actual SIMD types. Does // `std::simd` vector math, e.g., f64x8 + f64x8 in one operation: sum = v1 + v2;
There is no performance penalty for using
simd_aligned, while retaining all the
simplicity of handling flat arrays.
test vectors::packed ... bench: 77 ns/iter (+/- 4) test vectors::scalar ... bench: 1,177 ns/iter (+/- 464) test vectors::simd_aligned ... bench: 71 ns/iter (+/- 5)
simd_alignedbuilds on top of
std::simd. At aims to provide common, SIMD-aligned data structure that support simple and safe scalar access patterns.
faster(as of today) is really good if you already have exiting flat slices in your code and want operate them "full SIMD ahead". However, in particular when dealing with multiple slices at the same time (e.g., kernel computations) the performance impact of unaligned arrays can become a bit more noticeable (e.g., in the case of ffsvm up to 10% - 20%).
Contains vector definitions with a fixed bit width.
Unified views on SIMD types.
A dynamic (heap allocated) matrix with one axis aligned for fast and safe SIMD access that also provides a flat view on its data.
A dynamic (heap allocated) vector aligned for fast and safe SIMD access that also provides a flat view on its data.
Converts an slice of SIMD vectors into a flat slice of elements.
Converts a mutable slice of SIMD vectors into a flat slice of elements.