[−][src]Crate multiversion
This crate provides the target
, target_clones
, and multiversion
attributes for
implementing function multiversioning.
Many CPU architectures have a variety of instruction set extensions that provide additional functionality. Common examples are single instruction, multiple data (SIMD) extensions such as SSE and AVX on x86/x86-64 and NEON on ARM/AArch64. When available, these extended features can provide significant speed improvements to some functions. These optional features cannot be haphazardly compiled into programs–executing an unsupported instruction will result in a crash.
Function multiversioning is the practice of compiling multiple versions of a function with various features enabled and safely detecting which version to use at runtime.
Getting started
If you are unsure where to start, the target_clones
attribute requires no knowledge of SIMD
beyond understanding the available instruction set extensions for your architecture. For more
advanced usage, hand-written SIMD code can be dispatched with target
and multiversion
.
Capabilities
Most functions can be multiversioned. The following are notable exceptions that are unsupported:
- Methods, associated functions, inner functions, or any other function not at module level. In these cases, create a multiversioned function at module level and call it from the desired location.
- Functions that take or return
impl Trait
(other thanasync
, which is supported).
If any other functions do not work, please file a bug report.
Target specification strings
Targets for the target
, target_clones
, and multiversion
attributes are specified
as a combination of architecture (as specified in the target_arch
attribute) and feature (as
specified in the target_feature
attribute). A single architecture can be specified as:
"arch"
"arch+feature"
"arch+feature1+feature2"
while multiple architectures can be specified as:
"[arch1|arch2]"
"[arch1|arch2]+feature"
"[arch1|arch2]+feature1+feature2"
The following are all valid target specification strings:
"x86"
(matches the"x86"
architecture)"x86_64+avx+avx2"
(matches the"x86_64"
architecture with the"avx"
and"avx2"
features)"[mips|mips64|powerpc|powerpc64]"
(matches any of the"mips"
,"mips64"
,"powerpc"
or"powerpc64"
architectures)"[arm|aarch64]+neon"
(matches either the"arm"
or"aarch64"
architectures with the"neon"
feature)
Example
The following example is a good candidate for optimization with SIMD. The function square
optionally uses the AVX instruction set extension on x86 or x86-64. The SSE instruction set
extension is part of x86-64, but is optional on x86 so the square function optionally detects
that as well. This is automatically implemented by the target_clones
attribute.
This works by compiling multiple clones of the function with various features enabled and detecting which to use at runtime. If none of the targets match the current CPU (e.g. an older x86-64 CPU, or another architecture such as ARM), a clone without any features enabled is used.
use multiversion::target_clones; #[target_clones("[x86|x86_64]+avx", "x86+sse")] fn square(x: &mut [f32]) { for v in x { *v *= *v; } }
The following produces a nearly identical function, but instead of cloning the function, the
implementations are manually specified. This is typically more useful when the implementations
aren't identical, such as when using explicit SIMD instructions instead of relying on compiler
optimizations. The multiversioned function is generated by the multiversion
attribute.
use multiversion::{multiversion, target}; #[target("[x86|x86_64]+avx")] unsafe fn square_avx(x: &mut [f32]) { for v in x { *v *= *v; } } #[target("x86+sse")] unsafe fn square_sse(x: &mut [f32]) { for v in x { *v *= *v; } } #[multiversion( "[x86|x86_64]+avx" => square_avx, "x86+sse" => square_sse, )] fn square(x: &mut [f32]) { for v in x { *v *= *v; } }
Static dispatching
Sometimes it may be useful to call multiversioned functions from other multiversioned functions.
In these situations it would be inefficient to perform feature detection multiple times.
Additionally, the runtime detection prevents the function from being inlined. In this situation,
the #[static_dispatch]
helper attribute allows bypassing feature detection.
The #[static_dispatch]
attribute may be used on use
statements to bring the implementation
with matching features to the current function into scope. Functions created by target_clones
and multiversion
are capable of being statically dispatched. Functions tagged with target
may statically dispatch functions in their body, but cannot themselves be statically
dispatched.
Caveats:
- The caller function must exactly match an available feature set in the called function. A
function compiled for
x86_64+avx+avx2
cannot statically dispatch a function compiled forx86_64+avx
. A function compiled forx86_64+avx
may statically dispatch a function compiled for[x86|x86_64]+avx
, since an exact feature match exists for that architecture. use
groups are not supported (use foo::{bar, baz}
). Renames are supported, however (use bar as baz
)
use multiversion::target_clones; #[target_clones("[x86|x86_64]+avx", "x86+sse")] fn square(x: &mut [f32]) { for v in x { *v *= *v } } #[target_clones("[x86|x86_64]+avx", "x86+sse")] fn square_plus_one(x: &mut [f32]) { #[static_dispatch] use self::square; // or just `use square` in with Rust 1.32.0+ square(x); // this function call bypasses feature detection for v in x { *v += 1.0; } }
Implementation details
The function version dispatcher consists of a function selector and an atomic function pointer. Initially the function pointer will point to the function selector. On invocation, this selector will then choose an implementation, store a pointer to it in the atomic function pointer for later use and then pass on control to the chosen function. On subsequent calls, the chosen function will be called without invoking the function selector.
Some comments on the benefits of this implementation:
- The function selector is only invoked once. Subsequent calls are reduced to an atomic load
and indirect function call (for non-generic, non-
async
functions). Generic andasync
functions cannot be stored in the atomic function pointer, which may result in additional branches. - If called in multiple threads, there is no contention. It is possible for two threads to hit the same function before function selection has completed, which results in each thread invoking the function selector, but the atomic ensures that these are synchronized correctly.
Attribute Macros
multiversion | Provides function multiversioning by explicitly specifying function versions. |
target | Provides a less verbose equivalent to the |
target_clones | Provides automatic function multiversioning by compiling clones of the function for each target. |