[][src]Crate wide

A crate to help you go wide.

Specifically, this crate has data types for blocks of primitives packed together and used as a single unit. This works very well with SIMD/vector hardware of various targets. Both in terms of explicit SIMD usage and also in terms of allowing LLVM's auto-vectorizer to do its job.

All SIMD usage is on a best effort basis. Results will vary based on target, optimization level, method, and if you're using a Nightly compiler or not. Otherwise you get a "fallback" implementation, which will just do the normal computation on each lane individually.

  • Note: The crate will auto-detect if you're using Nightly and take advantage of it. You don't do anything on your part. Activate the always_use_stable feature if you'd like to suppress this effect such as for testing purposes.

What About packed_simd?

Compared to the packed_simd RFC efforts, this crate is less concerned with complete coverage of all possible intrinsics and being totally generic across all widths. Instead, I focus on having a very simple, easy to understand setup that avoids generics and tries to just be plain and obvious at all times. The goal is that using a wide type should be as close as possible to using the scalar version of the same type. Some function designed for f32 inputs and outputs should "just work" when you change it to f32x4 inputs and outputs.

Also, packed_simd is Nightly-only, whereas this crate works on Stable. Even on Stable this crate will give you reasonable levels of SIMD just from LLVM's auto-vectorizer being pretty good at its job when you give it code that it recognizes.

When packed_simd eventually makes it into Stable it might make this crate obsolete. However, in September of 2019 I asked the packed_simd folks if there was any kind of ETA, 6 months, 12 months, or more, and they just said "no ETA". So I'm not gonna wait around for packed_simd.



Architecture specific functionality.



Declares an f32x4 const identifier.


Declares an i32x4 const identifier.


Shuffles around some f32 lanes into a new m128


Shuffles around some f64 lanes into a new m128d



Four f32 values packed together.


Four i32 values packed together.



A cos for just one f32.


A sin for just one f32.


A sqrt for just one f32.


A tan for just one f32.



Lets us declare f32x4 values in a const context. Otherwise useless.


Allows us to declare i32x4 values in a const context. Uninteresting otherwise.