# Crate packed_simd_2

Expand description

## Portable packed SIMD vectors

This crate is proposed for stabilization as `std::packed_simd` in RFC2366: `std::simd` .

The examples available in the `examples/` sub-directory of the crate showcase how to use the library in practice.

### Introduction

This crate exports `Simd<[T; N]>`: a packed vector of `N` elements of type `T` as well as many type aliases for this type: for example, `f32x4`, which is just an alias for `Simd<[f32; 4]>`.

The operations on packed vectors are, by default, “vertical”, that is, they are applied to each vector lane in isolation of the others:

``````let a = i32x4::new(1, 2, 3, 4);
let b = i32x4::new(5, 6, 7, 8);
assert_eq!(a + b, i32x4::new(6, 8, 10, 12));``````

Many “horizontal” operations are also provided:

``assert_eq!(a.wrapping_sum(), 10);``

In virtually all architectures vertical operations are fast, while horizontal operations are, by comparison, much slower. That is, the most portably-efficient way of performing a reduction over a slice is to collect the results into a vector using vertical operations, and performing a single horizontal operation at the end:

``````fn reduce(x: &[i32]) -> i32 {
assert_eq!(x.len() % 4, 0);
let mut sum = i32x4::splat(0); // [0, 0, 0, 0]
for i in (0..x.len()).step_by(4) {
sum += i32x4::from_slice_unaligned(&x[i..]);
}
sum.wrapping_sum()
}

let x = [0, 1, 2, 3, 4, 5, 6, 7];
assert_eq!(reduce(&x), 28);``````

### Vector types

The vector type aliases are named according to the following scheme:

`{element_type}x{number_of_lanes} == Simd<[element_type; number_of_lanes]>`

where the following element types are supported:

• `i{element_width}`: signed integer
• `u{element_width}`: unsigned integer
• `f{element_width}`: float
• `m{element_width}`: mask (see below)
• `*{const,mut} T`: `const` and `mut` pointers

### Basic operations

``````// Sets all elements to `0`:
let a = i32x4::splat(0);

// Reads a vector from a slice:
let mut arr = [0, 0, 0, 1, 2, 3, 4, 5];
let b = i32x4::from_slice_unaligned(&arr);

// Reads the 4-th element of a vector:
assert_eq!(b.extract(3), 1);

// Returns a new vector where the 4-th element is replaced with `1`:
let a = a.replace(3, 1);
assert_eq!(a, b);

// Writes a vector to a slice:
let a = a.replace(2, 1);
a.write_to_slice_unaligned(&mut arr[4..]);
assert_eq!(arr, [0, 0, 0, 1, 0, 0, 1, 1]);``````

### Conditional operations

One often needs to perform an operation on some lanes of the vector. Vector masks, like `m32x4`, allow selecting on which vector lanes an operation is to be performed:

``````let a = i32x4::new(1, 1, 2, 2);

// Add `1` to the first two lanes of the vector.
let m = m16x4::new(true, true, false, false);
let a = m.select(a + 1, a);
assert_eq!(a, i32x4::splat(2));``````

The elements of a vector mask are either `true` or `false`. Here `true` means that a lane is “selected”, while `false` means that a lane is not selected.

All vector masks implement a `mask.select(a: T, b: T) -> T` method that works on all vectors that have the same number of lanes as the mask. The resulting vector contains the elements of `a` for those lanes for which the mask is `true`, and the elements of `b` otherwise.

The example constructs a mask with the first two lanes set to `true` and the last two lanes set to `false`. This selects the first two lanes of `a + 1` and the last two lanes of `a`, producing a vector where the first two lanes have been incremented by `1`.

note: mask `select` can be used on vector types that have the same number of lanes as the mask. The example shows this by using `m16x4` instead of `m32x4`. It is typically more performant to use a mask element width equal to the element width of the vectors being operated upon. This is, however, not true for 512-bit wide vectors when targeting AVX-512, where the most efficient masks use only 1-bit per element.

All vertical comparison operations returns masks:

``````let a = i32x4::new(1, 1, 3, 3);
let b = i32x4::new(2, 2, 0, 0);

// ge: >= (Greater Eequal; see also lt, le, gt, eq, ne).
let m = a.ge(i32x4::splat(2));

if m.any() {
// all / any / none allow coherent control flow
let d = m.select(a, b);
assert_eq!(d, i32x4::new(2, 2, 3, 3));
}``````

### Conversions

• lossless widening conversions: `From`/`Into` are implemented for vectors with the same number of lanes when the conversion is value preserving (same as in `std`).

• safe bitwise conversions: The cargo feature `into_bits` provides the `IntoBits/FromBits` traits (`x.into_bits()`). These perform safe bitwise `transmute`s when all bit patterns of the source type are valid bit patterns of the target type and are also implemented for the architecture-specific vector types of `std::arch`. For example, `let x: u8x8 = m8x8::splat(true).into_bits();` is provided because all `m8x8` bit patterns are valid `u8x8` bit patterns. However, the opposite is not true, not all `u8x8` bit patterns are valid `m8x8` bit-patterns, so this operation cannot be performed safely using `x.into_bits()`; one needs to use `unsafe { crate::mem::transmute(x) }` for that, making sure that the value in the `u8x8` is a valid bit-pattern of `m8x8`.

• numeric casts (`as`): are performed using `FromCast`/`Cast` (`x.cast()`), just like `as`:

• casting integer vectors whose lane types have the same size (e.g. `i32xN` -> `u32xN`) is a no-op,

• casting from a larger integer to a smaller integer (e.g. `u32xN` -> `u8xN`) will truncate,

• casting from a smaller integer to a larger integer (e.g. `u8xN` -> `u32xN`) will:

• zero-extend if the source is unsigned, or
• sign-extend if the source is signed,
• casting from a float to an integer will round the float towards zero,

• casting from an integer to float will produce the floating point representation of the integer, rounding to nearest, ties to even,

• casting from an `f32` to an `f64` is perfect and lossless,

• casting from an `f64` to an `f32` rounds to nearest, ties to even.

Numeric casts are not very “precise”: sometimes lossy, sometimes value preserving, etc.

### Hardware Features

This crate can use different hardware features based on your configured `RUSTFLAGS`. For example, with no configured `RUSTFLAGS`, `u64x8` on x86_64 will use SSE2 operations like `PCMPEQD`. If you configure `RUSTFLAGS='-C target-feature=+avx2,+avx'` on supported x86_64 hardware the same `u64x8` may use wider AVX2 operations like `VPCMPEQQ`. It is important for performance and for hardware support requirements that you choose an appropriate set of `target-feature` and `target-cpu` options during builds. For more information, see the Performance guide

## Macros

Shuffles vector elements.

## Structs

Wrapper over `T` implementing a lexicoraphical order via the `PartialOrd` and/or `Ord` traits.

Packed SIMD vector type.

## Traits

Numeric cast from `Self` to `T`.

Numeric cast from `T` to `Self`.

This trait is implemented by all mask types

Trait implemented by arrays that can be SIMD types.

This trait is implemented by all SIMD vector types.

## Type Definitions

A vector with 2 `*const T` lanes

A vector with 4 `*const T` lanes

A vector with 8 `*const T` lanes

A 64-bit vector with 2 `f32` lanes.

A 128-bit vector with 4 `f32` lanes.

A 256-bit vector with 8 `f32` lanes.

A 512-bit vector with 16 `f32` lanes.

A 128-bit vector with 2 `f64` lanes.

A 256-bit vector with 4 `f64` lanes.

A 512-bit vector with 8 `f64` lanes.

A 16-bit vector with 2 `i8` lanes.

A 32-bit vector with 4 `i8` lanes.

A 64-bit vector with 8 `i8` lanes.

A 128-bit vector with 16 `i8` lanes.

A 256-bit vector with 32 `i8` lanes.

A 512-bit vector with 64 `i8` lanes.

A 32-bit vector with 2 `i16` lanes.

A 64-bit vector with 4 `i16` lanes.

A 128-bit vector with 8 `i16` lanes.

A 256-bit vector with 16 `i16` lanes.

A 512-bit vector with 32 `i16` lanes.

A 64-bit vector with 2 `i32` lanes.

A 128-bit vector with 4 `i32` lanes.

A 256-bit vector with 8 `i32` lanes.

A 512-bit vector with 16 `i32` lanes.

A 128-bit vector with 2 `i64` lanes.

A 256-bit vector with 4 `i64` lanes.

A 512-bit vector with 8 `i64` lanes.

A 128-bit vector with 1 `i128` lane.

A 256-bit vector with 2 `i128` lanes.

A 512-bit vector with 4 `i128` lanes.

A vector with 2 `isize` lanes.

A vector with 4 `isize` lanes.

A vector with 8 `isize` lanes.

A 16-bit vector mask with 2 `m8` lanes.

A 32-bit vector mask with 4 `m8` lanes.

A 64-bit vector mask with 8 `m8` lanes.

A 128-bit vector mask with 16 `m8` lanes.

A 256-bit vector mask with 32 `m8` lanes.

A 512-bit vector mask with 64 `m8` lanes.

A 32-bit vector mask with 2 `m16` lanes.

A 64-bit vector mask with 4 `m16` lanes.

A 128-bit vector mask with 8 `m16` lanes.

A 256-bit vector mask with 16 `m16` lanes.

A 512-bit vector mask with 32 `m16` lanes.

A 64-bit vector mask with 2 `m32` lanes.

A 128-bit vector mask with 4 `m32` lanes.

A 256-bit vector mask with 8 `m32` lanes.

A 512-bit vector mask with 16 `m32` lanes.

A 128-bit vector mask with 2 `m64` lanes.

A 256-bit vector mask with 4 `m64` lanes.

A 512-bit vector mask with 8 `m64` lanes.

A 128-bit vector mask with 1 `m128` lane.

A 256-bit vector mask with 2 `m128` lanes.

A 512-bit vector mask with 4 `m128` lanes.

A vector with 2 `*mut T` lanes

A vector with 4 `*mut T` lanes

A vector with 8 `*mut T` lanes

A vector mask with 2 `msize` lanes.

A vector mask with 4 `msize` lanes.

A vector mask with 8 `msize` lanes.

A 16-bit vector with 2 `u8` lanes.

A 32-bit vector with 4 `u8` lanes.

A 64-bit vector with 8 `u8` lanes.

A 128-bit vector with 16 `u8` lanes.

A 256-bit vector with 32 `u8` lanes.

A 512-bit vector with 64 `u8` lanes.

A 32-bit vector with 2 `u16` lanes.

A 64-bit vector with 4 `u16` lanes.

A 128-bit vector with 8 `u16` lanes.

A 256-bit vector with 16 `u16` lanes.

A 512-bit vector with 32 `u16` lanes.

A 64-bit vector with 2 `u32` lanes.

A 128-bit vector with 4 `u32` lanes.

A 256-bit vector with 8 `u32` lanes.

A 512-bit vector with 16 `u32` lanes.

A 128-bit vector with 2 `u64` lanes.

A 256-bit vector with 4 `u64` lanes.

A 512-bit vector with 8 `u64` lanes.

A 128-bit vector with 1 `u128` lane.

A 256-bit vector with 2 `u128` lanes.

A 512-bit vector with 4 `u128` lanes.

A vector with 2 `usize` lanes.

A vector with 4 `usize` lanes.

A vector with 8 `usize` lanes.