Attribute Macro multiversion::multiversion

source ·
#[multiversion]
Expand description

Provides function multiversioning.

The annotated function is compiled multiple times, once for each target, and the best target is selected at runtime.

Options:

  • targets
    • Takes a list of targets, such as targets("x86_64+avx2", "x86_64+sse4.1").
    • Target priority is first to last. The first matching target is used.
    • May also take a special value targets = "simd" to automatically multiversion for common SIMD target features.
  • attrs
    • Takes a list of attributes to attach to each target clone function.
  • dispatcher
    • Selects the preferred dispatcher. Defaults to default.
      • default: If the std feature is enabled, uses either direct or indirect, attempting to choose the fastest choice. If the std feature is not enabled, uses static.
      • static: Detects features at compile time from the enabled target features.
      • indirect: Detect features at runtime, and dispatches with an indirect function call. Cannot be used for generic functions, async functions, or functions that take or return an impl Trait. This is usually the default.
      • direct: Detects features at runtime, and dispatches with direct function calls. This is the default on functions that do not support indirect dispatch, or in the presence of indirect branch exploit mitigations such as retpolines.

§Example

This function is a good candidate for optimization using SIMD. The following compiles square three times, once for each target and once for the generic target. Calling square selects the appropriate version at runtime.

use multiversion::multiversion;

#[multiversion(targets("x86_64+avx", "x86+sse"))]
fn square(x: &mut [f32]) {
    for v in x {
        *v *= *v
    }
}

This example is similar, but targets all supported SIMD instruction sets (not just the two shown above):

use multiversion::multiversion;

#[multiversion(targets = "simd")]
fn square(x: &mut [f32]) {
    for v in x {
        *v *= *v
    }
}

§Notes on dispatcher performance

§Feature detection is performed only once

The direct and indirect dispatchers perform function selection on the first invocation. This is implemented with a static atomic variable containing the selected function.

This implementation has a few benefits:

  • The function selector is typically only invoked once. Subsequent calls are reduced to an atomic load.
  • If called in multiple threads, there is no contention. Both threads may perform feature detection, but the atomic ensures these are synchronized correctly.

§Dispatcher elision

If the optimal set of features is already known to exist at compile time, the entire dispatcher is elided. For example, if the highest priority target requires avx512f and the function is compiled with RUSTFLAGS=-Ctarget-cpu=skylake-avx512, the function is not multiversioned and the highest priority target is used.