[][src]Crate wide

A crate to help you go wide.

Specifically, this crate has data types for blocks of primitives packed together and used as a single unit. This works very well with SIMD/vector hardware of various targets. Both in terms of explicit SIMD usage and also in terms of allowing LLVM's auto-vectorizer to do its job.

All SIMD usage is on a best effort basis. Results will vary based on target, optimization level, method, and if you're using a Nightly compiler or not. Otherwise you get a "fallback" implementation, which will just do the normal computation on each lane individually.

  • Note: The crate will auto-detect if you're using Nightly and take advantage of it. You don't do anything on your part. Activate the always_use_stable feature if you'd like to suppress this effect such as for testing purposes.

What About packed_simd?

Compared to the packed_simd RFC efforts, this crate is less concerned with complete coverage of all possible intrinsics and being totally generic across all widths. Instead, I focus on having a very simple, easy to understand setup that avoids generics and tries to just be plain and obvious at all times. The goal is that using a wide type should be as close as possible to using the scalar version of the same type. Some function designed for f32 inputs and outputs should "just work" when you change it to f32x4 inputs and outputs.

Also, packed_simd is Nightly-only, whereas this crate works on Stable. Even on Stable this crate will give you reasonable levels of SIMD just from LLVM's auto-vectorizer being pretty good at its job when you give it code that it recognizes.

When packed_simd eventually makes it into Stable it might make this crate obsolete. However, in September of 2019 I asked the packed_simd folks if there was any kind of ETA, 6 months, 12 months, or more, and they just said "no ETA". So I'm not gonna wait around for packed_simd.

Modules

arch

Architecture specific functionality.

Macros

const_f32_as_f32x4

Declares an f32x4 const identifier.

const_i32_as_i32x4

Declares an i32x4 const identifier.

shuffle128

Shuffles around some f32 lanes into a new m128

shuffle128d

Shuffles around some f64 lanes into a new m128d

Structs

f32x4

Four f32 values packed together.

i32x4

Four i32 values packed together.

Functions

cos_f32

A cos for just one f32.

sin_f32

A sin for just one f32.

sqrt_f32

A sqrt for just one f32.

tan_f32

A tan for just one f32.

Unions

ConstUnionHack_f32x4

Lets us declare f32x4 values in a const context. Otherwise useless.

ConstUnionHack_i32x4

Allows us to declare i32x4 values in a const context. Uninteresting otherwise.