# Crate faster [−] [src]

The SIMD library for humans. Faster allows convenient application of explicit SIMD to existing code. It allows you to write explicit SIMD code once and compile it for any target, regardless of architecture, SIMD capability, or age.

# SIMD Iterators

SIMD iterators are formed using `simd_iter`

, `simd_iter_mut`

, and
`into_simd_iter`

, which return types which allow the usage of the
`simd_map`

and `simd_reduce`

functions. These functions automatically
pack your iterator's data into SIMD vectors and allow you to transparently
operate on them in a closure.

# SIMD Polyfills

Once your data is packed into a SIMD vector, you may perform many common
SIMD operations on it. These operations have names and behavior independent
of any vendor-specific ISA, and have non-SIMD polyfills for machines which
cannot perform these operations in a single cycle. See the `intrin`

module
for all available operations.

# Examples

Faster is currently capable of mapping and reductive operations in SIMD.

## Mapping

The simplest example of a computation with `faster`

is a single map
operation.

extern crate faster; use faster::*; let lots_of_10s = (&[-10i8; 3000][..]).simd_iter() .simd_map(i8s(0), |v| v.abs()) .scalar_collect(); assert_eq!(lots_of_10s, vec![10u8; 3000]);

In this example, a vector of type `i8s`

is passed into the closure. The
exact type of `i8s`

is dependent on compilation target, but it will always
implement the same operations. Because taking the absolute value of a vector
converts it to `u8s`

, the closure will return `u8s`

.

`scalar_collect`

takes the iterator of `u8s`

and converts it into a
`Vec<u8>`

.

## Reduction

Faster can perform reductive operations with similar power to mapping operations:

extern crate faster; use faster::*; let two_hundred = (&[2.0f32; 100][..]).simd_iter() .simd_reduce(f32s(0.0), f32s(0.0), |acc, v| acc + v) .sum(); assert_eq!(two_hundred, 200.0f32);

This example sums every number in the collection. The first parameter to simd_reduce is the default value of the accumulator, just like any other reduction. The second value is used if the collection being reduced over doesn't fit evenly into your system's vectors - it is the default value of the last vector, and each element of the vector is used only if it isn't filled by an element of the collection. Typically, a value of 0 or 1 is a suitable default.

Minding portability is very important when performing reductive operations. See below for some tips on keeping your code portable across all architectures.

## Multiple collections

Faster supports vectorized lockstep iteration over multiple collections.
Simply `zip`

them up, and proceed as normal.

extern crate faster; use faster::*; let sevens = ((&[4i32; 200][..]).simd_iter(), (&[3i32; 200][..]).simd_iter()).zip() .simd_map(tuplify!(2, i32s(0)), |(a, b)| a + b) .scalar_collect();

## Striping Collections

Reading every nth element of a collection can be vectorized on most
machines. Simply call `stripe`

, or one of the slightly-faster tuple-based
functions, such as `stripe_two`

.

extern crate faster; use faster::*; // Computes the determinant of matrices arranged as [a, b, c, d, a, b, c...] let determinants = &[1; 1024][..]).simd_iter().stripe_four().zip() .simd_map(tuplify!(4, f32s(0.0)), |(a, b, c, d)| { a * d - b * c }) .scalar_collect()

# Portability

While `faster`

does most of the work ensuring your code stays portable
across platforms, a user of this library must still understand that it is
very possible to write non-portable algorithms using this library. Anything
which relies on vector width, anything which is impure, and anything which
uses constants in reductive operations is inherently nonportable. Some
examples below:

extern crate faster; use faster::*; let mut flip = true; let impure = (&[1i8; 3000][..]).simd_iter() .simd_map(i8s(0), |v| { flip = !flip; if flip { v + i8s(1) } else { v } }) .scalar_collect(); // Depending on the width of your target's SIMD vectors, `impure` could be // [1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, ...] or // [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, ...], etc.

extern crate faster; use faster::*; let length_dependent = (&[0i8; 10][..]).simd_iter() .simd_reduce(i8s(0), i8s(0), |acc, v| acc + v + i8s(1)).sum(); // `length_dependent` could be a different number on a different target!

As a precaution, it is best practice to keep all functions pure, and only operate on SIMD vectors in your SIMD-enabled closures unless you know exactly what is happening under the hood. It's also important to remember that these problems will crop up even if you only support x86; the width difference between AVX and SSE is the primary source of these issues!

## Re-exports

`pub use prelude::*;` |

## Modules

intrin | |

iters | |

prelude | |

swizzle | |

vec_patterns | |

vecs | |

zip |

## Macros

tuplify |
A macro which takes a number n and an expression, and returns a tuple containing n copies of the expression. Only works for numbers less than or equal to 12. |