[−][src]Crate wide

A crate to help you go wide.

Specifically, this crate has data types for blocks of primitives packed together and used as a single unit. This works very well with SIMD/vector hardware of various targets. Both in terms of explicit SIMD usage and also in terms of allowing LLVM's auto-vectorizer to do its job.

All SIMD usage is on a best effort basis. Results will vary based on target, optimization level, method, and if you're using a Nightly compiler or not. Otherwise you get a "fallback" implementation, which will just do the normal computation on each lane individually.

Note: The crate will auto-detect if you're using Nightly and take advantage of it. You don't do anything on your part. Activate the always_use_stable feature if you'd like to suppress this effect such as for testing purposes.

What About `packed_simd`?

Compared to the packed_simd RFC efforts, this crate is less concerned with complete coverage of all possible intrinsics and being totally generic across all widths. Instead, I focus on having a very simple, easy to understand setup that avoids generics and tries to just be plain and obvious at all times. The goal is that using a wide type should be as close as possible to using the scalar version of the same type. Some function designed for f32 inputs and outputs should "just work" when you change it to f32x4 inputs and outputs.

Also, packed_simd is Nightly-only, whereas this crate works on Stable. Even on Stable this crate will give you reasonable levels of SIMD just from LLVM's auto-vectorizer being pretty good at its job when you give it code that it recognizes.

When packed_simd eventually makes it into Stable it might make this crate obsolete. However, in September of 2019 I asked the packed_simd folks if there was any kind of ETA, 6 months, 12 months, or more, and they just said "no ETA". So I'm not gonna wait around for packed_simd.

Modules

arch	Architecture specific functionality.

Macros

const_f32_as_f32x4	Declares an `f32x4` const identifier.
const_i32_as_i32x4	Declares an `i32x4` const identifier.
shuffle128	Shuffles around some `f32` lanes into a new `m128`
shuffle128d	Shuffles around some `f64` lanes into a new `m128d`

Structs

f32x4	Four `f32` values packed together.
i32x4	Four `i32` values packed together.

Functions

cos_f32	A `cos` for just one `f32`.
sin_f32	A `sin` for just one `f32`.
sqrt_f32	A `sqrt` for just one `f32`.
tan_f32	A `tan` for just one `f32`.

Unions

ConstUnionHack_f32x4	Lets us declare `f32x4` values in a `const` context. Otherwise useless.
ConstUnionHack_i32x4	Allows us to declare `i32x4` values in a `const` context. Uninteresting otherwise.