Attribute Macro multiversion::multiversion
source · #[multiversion]
Expand description
Provides function multiversioning.
The annotated function is compiled multiple times, once for each target, and the best target is selected at runtime.
Options:
targets
- Takes a list of targets, such as
targets("x86_64+avx2", "x86_64+sse4.1")
. - Target priority is first to last. The first matching target is used.
- May also take a special value
targets = "simd"
to automatically multiversion for common SIMD target features.
- Takes a list of targets, such as
attrs
- Takes a list of attributes to attach to each target clone function.
dispatcher
- Selects the preferred dispatcher. Defaults to
default
.default
: If thestd
feature is enabled, uses eitherdirect
orindirect
, attempting to choose the fastest choice. If thestd
feature is not enabled, usesstatic
.static
: Detects features at compile time from the enabled target features.indirect
: Detect features at runtime, and dispatches with an indirect function call. Cannot be used for generic functions,async
functions, or functions that take or return animpl Trait
. This is usually the default.direct
: Detects features at runtime, and dispatches with direct function calls. This is the default on functions that do not support indirect dispatch, or in the presence of indirect branch exploit mitigations such as retpolines.
- Selects the preferred dispatcher. Defaults to
§Example
This function is a good candidate for optimization using SIMD.
The following compiles square
three times, once for each target and once for the generic
target. Calling square
selects the appropriate version at runtime.
use multiversion::multiversion;
#[multiversion(targets("x86_64+avx", "x86+sse"))]
fn square(x: &mut [f32]) {
for v in x {
*v *= *v
}
}
This example is similar, but targets all supported SIMD instruction sets (not just the two shown above):
use multiversion::multiversion;
#[multiversion(targets = "simd")]
fn square(x: &mut [f32]) {
for v in x {
*v *= *v
}
}
§Notes on dispatcher performance
§Feature detection is performed only once
The direct
and indirect
dispatchers perform function selection on the first invocation.
This is implemented with a static atomic variable containing the selected function.
This implementation has a few benefits:
- The function selector is typically only invoked once. Subsequent calls are reduced to an atomic load.
- If called in multiple threads, there is no contention. Both threads may perform feature detection, but the atomic ensures these are synchronized correctly.
§Dispatcher elision
If the optimal set of features is already known to exist at compile time, the entire dispatcher
is elided. For example, if the highest priority target requires avx512f
and the function is
compiled with RUSTFLAGS=-Ctarget-cpu=skylake-avx512
, the function is not multiversioned and
the highest priority target is used.