Crate simd_aligned

Expand description

NOTE - Do not use this crate for now. It has been reactivated to make FFSVM compile again, but needs some architectural work.

§In One Sentence

You want to use safe SIMD datatypes from wide but realized there is no simple, safe and fast way to align your f32x4 (and friends) in memory and treat them as regular f32 slices for easy loading and manipulation; simd_aligned to the rescue.

§Highlights

built on top of wide for easy data handling
supports everything from u8x16 to f64x4
think in flat slices (&[f32]), but get performance of properly aligned SIMD vectors (&[f32x4])
provides N-dimensional VecD and NxM-dimensional MatD.

§Examples

Produces a vector that can hold 10 elements of type f64. All elements are guaranteed to be properly aligned for fast access.

use simd_aligned::*;

// Create vectors of `10` f64 elements with value `0.0`.
let mut v1 = VecD::<f64x4>::with(0.0, 10);
let mut v2 = VecD::<f64x4>::with(0.0, 10);

// Get "flat", mutable view of the vector, and set individual elements:
let v1_m = v1.flat_mut();
let v2_m = v2.flat_mut();

// Set some elements on v1
v1_m[0] = 0.0;
v1_m[4] = 4.0;
v1_m[8] = 8.0;

// Set some others on v2
v2_m[1] = 0.0;
v2_m[5] = 5.0;
v2_m[9] = 9.0;

let mut sum = f64x4::splat(0.0);

// Eventually, do something with the actual SIMD types. Does
// `std::simd` vector math, e.g., f64x8 + f64x8 in one operation:
sum = v1[0] + v2[0];

§Benchmarks

There is no performance penalty for using simd_aligned, while retaining all the simplicity of handling flat arrays.

test vectors::packed       ... bench:          77 ns/iter (+/- 4)
test vectors::scalar       ... bench:       1,177 ns/iter (+/- 464)
test vectors::simd_aligned ... bench:          71 ns/iter (+/- 5)

§FAQ

§How does it relate to faster and `std::simd`?

simd_aligned builds on top of std::simd. At aims to provide common, SIMD-aligned data structure that support simple and safe scalar access patterns.
faster (as of today) is really good if you already have exiting flat slices in your code and want operate them “full SIMD ahead”. However, in particular when dealing with multiple slices at the same time (e.g., kernel computations) the performance impact of unaligned arrays can become a bit more noticeable (e.g., in the case of ffsvm up to 10% - 20%).

Re-exports§

pub use crate::mat::AccessStrategy;
pub use crate::mat::Columns;
pub use crate::mat::Rows;

Modules§

traits
Unified views on SIMD types.

Structs§

MatD
A dynamic (heap allocated) matrix with one axis aligned for fast and safe SIMD access that also provides a flat view on its data.
MatrixFlat
Produced by MatD::flat, this allow for flat matrix access.
MatrixFlatMut
Provided by MatD::flat_mut, this allow for flat, mutable matrix access.
VecD
A dynamic (heap allocated) vector aligned for fast and safe SIMD access that also provides a flat view on its data.
f32x4
f32x8
f64x2
f64x4
i8x16
i8x32
i16x8
i16x16
i32x4
i32x8
i64x2
i64x4
u8x16
u16x8
u16x16
u32x4
u32x8
u64x2
u64x4

Functions§

packed_as_flat
Converts an slice of SIMD vectors into a flat slice of elements.
packed_as_flat_mut
Converts a mutable slice of SIMD vectors into a flat slice of elements.

Crate simd_alignedCopy item path