[][src]Crate multiversion

This crate provides the target and multiversion attributes for implementing function multiversioning.

Many CPU architectures have a variety of instruction set extensions that provide additional functionality. Common examples are single instruction, multiple data (SIMD) extensions such as SSE and AVX on x86/x86-64 and NEON on ARM/AArch64. When available, these extended features can provide significant speed improvements to some functions. These optional features cannot be haphazardly compiled into programs–executing an unsupported instruction will result in a crash.

Function multiversioning is the practice of compiling multiple versions of a function with various features enabled and safely detecting which version to use at runtime.

Cargo features

There is one cargo feature, std, enabled by default. When enabled, multiversion will use CPU feature detection at runtime to dispatch the appropriate function. Disabling this feature will only allow compile-time function dispatch using #[cfg(target_feature)] and can be used in #[no_std] crates.

Capabilities

The intention of this crate is to allow any function, other than trait methods, to be multiversioned. If any functions do not work please file an issue on GitHub.

The multiversion macro produces additional functions adjacent to the tagged function which do not correspond to a trait member. If you would like to multiversion a trait method, instead try multiversioning a free function or struct method and calling it from the trait method.

Target specification strings

Targets for the target and multiversion attributes are specified as a combination of architecture (as specified in the target_arch attribute) and feature (as specified in the target_feature attribute). A single architecture can be specified as:

  • "arch"
  • "arch+feature"
  • "arch+feature1+feature2"

while multiple architectures can be specified as:

  • "[arch1|arch2]"
  • "[arch1|arch2]+feature"
  • "[arch1|arch2]+feature1+feature2"

The following are all valid target specification strings:

  • "x86" (matches the "x86" architecture)
  • "x86_64+avx+avx2" (matches the "x86_64" architecture with the "avx" and "avx2" features)
  • "[mips|mips64|powerpc|powerpc64]" (matches any of the "mips", "mips64", "powerpc" or "powerpc64" architectures)
  • "[arm|aarch64]+neon" (matches either the "arm" or "aarch64" architectures with the "neon" feature)

Example

The following example is a good candidate for optimization with SIMD. The function square optionally uses the AVX instruction set extension on x86 or x86-64. The SSE instruction set extension is part of x86-64, but is optional on x86 so the square function optionally detects that as well. This is automatically implemented by the multiversion attribute.

The following works by compiling multiple clones of the function with various features enabled and detecting which to use at runtime. If none of the targets match the current CPU (e.g. an older x86-64 CPU, or another architecture such as ARM), a clone without any features enabled is used.

use multiversion::multiversion;

#[multiversion]
#[clone(target = "[x86|x86_64]+avx")]
#[clone(target = "x86+sse")]
fn square(x: &mut [f32]) {
    for v in x {
        *v *= *v;
    }
}

The following produces a nearly identical function, but instead of cloning the function, the implementations are manually specified. This is typically more useful when the implementations aren't identical, such as when using explicit SIMD instructions instead of relying on compiler optimizations.

use multiversion::{multiversion, target};

#[target("[x86|x86_64]+avx")]
unsafe fn square_avx(x: &mut [f32]) {
    for v in x {
        *v *= *v;
    }
}

#[target("x86+sse")]
unsafe fn square_sse(x: &mut [f32]) {
    for v in x {
        *v *= *v;
    }
}

#[multiversion]
#[specialize(target = "[x86|x86_64]+avx", fn = "square_avx", unsafe = true)]
#[specialize(target = "x86+sse", fn = "square_sse", unsafe = true)]
fn square(x: &mut [f32]) {
    for v in x {
        *v *= *v;
    }
}

Static dispatching

Sometimes it may be useful to call multiversioned functions from other multiversioned functions. In these situations it would be inefficient to perform feature detection multiple times. Additionally, the runtime detection prevents the function from being inlined. In this situation, the dispatch helper macro allows bypassing feature detection:

use multiversion::multiversion;

#[multiversion]
#[clone(target = "[x86|x86_64]+avx")]
#[clone(target = "x86+sse")]
fn square(x: &mut [f32]) {
    for v in x {
        *v *= *v
    }
}

#[multiversion]
#[clone(target = "[x86|x86_64]+avx")]
#[clone(target = "x86+sse")]
fn square_plus_one(x: &mut [f32]) {
    dispatch!(square(x)); // this function call bypasses feature detection
    for v in x {
        *v += 1.0;
    }
}

The dispatch macro supports either paths or function calls:

  • dispatch!(foo)
  • dispatch!(Self::foo::<A, B>)
  • dispatch!(foo(a, b))
  • dispatch!(self.foo::<A, B>(a, b))

The statically dispatched function must be multiversioned over a subset of CPU features supported by the caller function. For example, a function compiled for x86_64+avx+avx2 cannot statically dispatch a function compiled for x86_64+avx, but a function compiled for x86_64+avx may statically dispatch a multiversioned function compiled for both [x86|x86_64]+avx and x86+sse since an exact feature match exists for that architecture.

Conditional compilation

The #[cfg] attribute allows conditional compilation based on the target architecture and features, however this does not take into account additional features specified by #[target_feature]. In this scenario, the #[target_cfg] helper attribute provides conditional compilation in functions tagged with multiversion or target.

The #[target_cfg] attribute supports all, any, and not (just like #[cfg]) and supports the following keys:

  • target: takes a target specification string as a value and is true if the target matches the function's target
#[multiversion::multiversion]
#[clone(target = "[x86|x86_64]+avx")]
#[clone(target = "[arm|aarch64]+neon")]
fn print_arch() {
    #[target_cfg(target = "[x86|x86_64]+avx")]
    println!("avx");

    #[target_cfg(target = "[arm|aarch64]+neon")]
    println!("neon");

    #[target_cfg(not(any(target = "[x86|x86_64]+avx", target = "[arm|aarch64]+neon")))]
    println!("generic");
}

Macros

are_cpu_features_detected

Detects CPU features.

Attribute Macros

multiversion

Provides function multiversioning.

target

Provides a less verbose equivalent to the target_arch and target_feature attributes.