# packed_simd 0.3.3

Portable Packed SIMD vectors
Documentation

# Portable packed SIMD vectors

This crate is proposed for stabilization as `std::packed_simd` in RFC2366: `std::simd` .

The examples available in the `examples/` sub-directory of the crate showcase how to use the library in practice.

## Introduction

This crate exports [`Simd<[T; N]>`][`Simd`]: a packed vector of `N` elements of type `T` as well as many type aliases for this type: for example, [`f32x4`], which is just an alias for `Simd<[f32; 4]>`.

The operations on packed vectors are, by default, "vertical", that is, they are applied to each vector lane in isolation of the others:

``````# use packed_simd::*;
let a = i32x4::new(1, 2, 3, 4);
let b = i32x4::new(5, 6, 7, 8);
assert_eq!(a + b, i32x4::new(6, 8, 10, 12));
``````

Many "horizontal" operations are also provided:

``````# use packed_simd::*;
# let a = i32x4::new(1, 2, 3, 4);
assert_eq!(a.wrapping_sum(), 10);
``````

In virtually all architectures vertical operations are fast, while horizontal operations are, by comparison, much slower. That is, the most portably-efficient way of performing a reduction over a slice is to collect the results into a vector using vertical operations, and performing a single horizontal operation at the end:

``````# use packed_simd::*;
fn reduce(x: &[i32]) -> i32 {
assert!(x.len() % 4 == 0);
let mut sum = i32x4::splat(0); // [0, 0, 0, 0]
for i in (0..x.len()).step_by(4) {
sum += i32x4::from_slice_unaligned(&x[i..]);
}
sum.wrapping_sum()
}

let x = [0, 1, 2, 3, 4, 5, 6, 7];
assert_eq!(reduce(&x), 28);
``````

## Vector types

The vector type aliases are named according to the following scheme:

`{element_type}x{number_of_lanes} == Simd<[element_type; number_of_lanes]>`

where the following element types are supported:

• `i{element_width}`: signed integer
• `u{element_width}`: unsigned integer
• `f{element_width}`: float
• `m{element_width}`: mask (see below)
• `*{const,mut} T`: `const` and `mut` pointers

## Basic operations

``````# use packed_simd::*;
// Sets all elements to `0`:
let a = i32x4::splat(0);

// Reads a vector from a slice:
let mut arr = [0, 0, 0, 1, 2, 3, 4, 5];
let b = i32x4::from_slice_unaligned(&arr);

// Reads the 4-th element of a vector:
assert_eq!(b.extract(3), 1);

// Returns a new vector where the 4-th element is replaced with `1`:
let a = a.replace(3, 1);
assert_eq!(a, b);

// Writes a vector to a slice:
let a = a.replace(2, 1);
a.write_to_slice_unaligned(&mut arr[4..]);
assert_eq!(arr, [0, 0, 0, 1, 0, 0, 1, 1]);
``````

## Conditional operations

One often needs to perform an operation on some lanes of the vector. Vector masks, like `m32x4`, allow selecting on which vector lanes an operation is to be performed:

``````# use packed_simd::*;
let a = i32x4::new(1, 1, 2, 2);

// Add `1` to the first two lanes of the vector.
let m = m16x4::new(true, true, false, false);
let a = m.select(a + 1, a);
assert_eq!(a, i32x4::splat(2));
``````

The elements of a vector mask are either `true` or `false`. Here `true` means that a lane is "selected", while `false` means that a lane is not selected.

All vector masks implement a `mask.select(a: T, b: T) -> T` method that works on all vectors that have the same number of lanes as the mask. The resulting vector contains the elements of `a` for those lanes for which the mask is `true`, and the elements of `b` otherwise.

The example constructs a mask with the first two lanes set to `true` and the last two lanes set to `false`. This selects the first two lanes of `a + 1` and the last two lanes of `a`, producing a vector where the first two lanes have been incremented by `1`.

note: mask `select` can be used on vector types that have the same number of lanes as the mask. The example shows this by using [`m16x4`] instead of [`m32x4`]. It is typically more performant to use a mask element width equal to the element width of the vectors being operated upon. This is, however, not true for 512-bit wide vectors when targetting AVX-512, where the most efficient masks use only 1-bit per element.

All vertical comparison operations returns masks:

``````# use packed_simd::*;
let a = i32x4::new(1, 1, 3, 3);
let b = i32x4::new(2, 2, 0, 0);

// ge: >= (Greater Eequal; see also lt, le, gt, eq, ne).
let m = a.ge(i32x4::splat(2));

if m.any() {
// all / any / none allow coherent control flow
let d = m.select(a, b);
assert_eq!(d, i32x4::new(2, 2, 3, 3));
}
``````

## Conversions

• lossless widening conversions: [`From`]/[`Into`] are implemented for vectors with the same number of lanes when the conversion is value preserving (same as in `std`).

• safe bitwise conversions: The cargo feature `into_bits` provides the `IntoBits/FromBits` traits (`x.into_bits()`). These perform safe bitwise `transmute`s when all bit patterns of the source type are valid bit patterns of the target type and are also implemented for the architecture-specific vector types of `std::arch`. For example, `let x: u8x8 = m8x8::splat(true).into_bits();` is provided because all `m8x8` bit patterns are valid `u8x8` bit patterns. However, the opposite is not true, not all `u8x8` bit patterns are valid `m8x8` bit-patterns, so this operation cannot be peformed safely using `x.into_bits()`; one needs to use `unsafe { crate::mem::transmute(x) }` for that, making sure that the value in the `u8x8` is a valid bit-pattern of `m8x8`.

• numeric casts (`as`): are peformed using [`FromCast`]/[`Cast`] (`x.cast()`), just like `as`:

• casting integer vectors whose lane types have the same size (e.g. `i32xN` -> `u32xN`) is a no-op,

• casting from a larger integer to a smaller integer (e.g. `u32xN` -> `u8xN`) will truncate,

• casting from a smaller integer to a larger integer (e.g. `u8xN` -> `u32xN`) will:

• zero-extend if the source is unsigned, or
• sign-extend if the source is signed,
• casting from a float to an integer will round the float towards zero,

• casting from an integer to float will produce the floating point representation of the integer, rounding to nearest, ties to even,

• casting from an `f32` to an `f64` is perfect and lossless,

• casting from an `f64` to an `f32` rounds to nearest, ties to even.

Numeric casts are not very "precise": sometimes lossy, sometimes value preserving, etc.