[−][src]Crate multiversion
This crate provides the target
and multiversion
attributes for implementing
function multiversioning.
Many CPU architectures have a variety of instruction set extensions that provide additional functionality. Common examples are single instruction, multiple data (SIMD) extensions such as SSE and AVX on x86/x86-64 and NEON on ARM/AArch64. When available, these extended features can provide significant speed improvements to some functions. These optional features cannot be haphazardly compiled into programs–executing an unsupported instruction will result in a crash.
Function multiversioning is the practice of compiling multiple versions of a function with various features enabled and safely detecting which version to use at runtime.
Cargo features
There is one cargo feature, runtime_dispatch
, enabled by default. When enabled,
multiversion
will use CPU feature detection at runtime to dispatch the appropriate function,
which requires the std
crate. Disabling this feature will only allow compile-time function
dispatch using #[cfg(target_feature)]
and can be used in #[no_std]
crates.
Capabilities
The intention of this crate is to allow any function, other than trait methods, to be multiversioned. If any functions do not work please file an issue on GitHub.
The multiversion
macro produces additional functions adjacent to the tagged function which
do not correspond to a trait member. If you would like to multiversion a trait method, instead
try multiversioning a free function or struct method and calling it from the trait method.
Target specification strings
Targets for the target
and multiversion
attributes are specified as a combination of
architecture (as specified in the target_arch
attribute) and feature (as specified in the
target_feature
attribute). A single architecture can be specified as:
"arch"
"arch+feature"
"arch+feature1+feature2"
while multiple architectures can be specified as:
"[arch1|arch2]"
"[arch1|arch2]+feature"
"[arch1|arch2]+feature1+feature2"
The following are all valid target specification strings:
"x86"
(matches the"x86"
architecture)"x86_64+avx+avx2"
(matches the"x86_64"
architecture with the"avx"
and"avx2"
features)"[mips|mips64|powerpc|powerpc64]"
(matches any of the"mips"
,"mips64"
,"powerpc"
or"powerpc64"
architectures)"[arm|aarch64]+neon"
(matches either the"arm"
or"aarch64"
architectures with the"neon"
feature)
Example
The following example is a good candidate for optimization with SIMD. The function square
optionally uses the AVX instruction set extension on x86 or x86-64. The SSE instruction set
extension is part of x86-64, but is optional on x86 so the square function optionally detects
that as well. This is automatically implemented by the multiversion
attribute.
The following works by compiling multiple clones of the function with various features enabled and detecting which to use at runtime. If none of the targets match the current CPU (e.g. an older x86-64 CPU, or another architecture such as ARM), a clone without any features enabled is used.
use multiversion::multiversion; #[multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "x86+sse")] fn square(x: &mut [f32]) { for v in x { *v *= *v; } }
The following produces a nearly identical function, but instead of cloning the function, the implementations are manually specified. This is typically more useful when the implementations aren't identical, such as when using explicit SIMD instructions instead of relying on compiler optimizations.
use multiversion::{multiversion, target}; #[target("[x86|x86_64]+avx")] unsafe fn square_avx(x: &mut [f32]) { for v in x { *v *= *v; } } #[target("x86+sse")] unsafe fn square_sse(x: &mut [f32]) { for v in x { *v *= *v; } } #[multiversion] #[specialize(target = "[x86|x86_64]+avx", fn = "square_avx", unsafe = true)] #[specialize(target = "x86+sse", fn = "square_sse", unsafe = true)] fn square(x: &mut [f32]) { for v in x { *v *= *v; } }
Static dispatching
Sometimes it may be useful to call multiversioned functions from other multiversioned functions.
In these situations it would be inefficient to perform feature detection multiple times.
Additionally, the runtime detection prevents the function from being inlined. In this situation,
the #[static_dispatch]
helper attribute allows bypassing feature detection.
The #[static_dispatch]
attribute accepts the following arguments:
fn
: path to the function to static dispatch.rename
(optional): the binding to use for the statically dispatched function. If not provided, the function name is used.
Caveats:
- The caller function must exactly match an available feature set in the called function. A
function compiled for
x86_64+avx+avx2
cannot statically dispatch a function compiled forx86_64+avx
. A function compiled forx86_64+avx
may statically dispatch a function compiled for[x86|x86_64]+avx
, since an exact feature match exists for that architecture. - The receiver (
self
,&self
, etc.) must be provided as the first argument to the statically dispatched function, e.g.foo(bar)
rather thanbar.foo()
.
use multiversion::multiversion; #[multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "x86+sse")] fn square(x: &mut [f32]) { for v in x { *v *= *v } } #[multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "x86+sse")] #[static_dispatch(fn = "square")] fn square_plus_one(x: &mut [f32]) { square(x); // this function call bypasses feature detection for v in x { *v += 1.0; } }
Conditional compilation
The #[cfg]
attribute allows conditional compilation based on the target architecture and
features, however this does not take into account additional features specified by
#[target_feature]
. In this scenario, the #[target_cfg]
helper attribute provides
conditional compilation in functions tagged with multiversion
or target
.
The #[target_cfg]
attribute supports all
, any
, and not
(just like #[cfg]
) and
supports the following keys:
target
: takes a target specification string as a value and is true if the target matches the function's target
#[multiversion::multiversion] #[clone(target = "[x86|x86_64]+avx")] #[clone(target = "[arm|aarch64]+neon")] fn print_arch() { #[target_cfg(target = "[x86|x86_64]+avx")] println!("avx"); #[target_cfg(target = "[arm|aarch64]+neon")] println!("neon"); #[target_cfg(not(any(target = "[x86|x86_64]+avx", target = "[arm|aarch64]+neon")))] println!("generic"); }
Attribute Macros
multiversion | Provides function multiversioning. |
target | Provides a less verbose equivalent to the |