Expand description
This crate provides the #[make_special] attribute
macro to automatically create a series of target feature specialisations for
the given function. This behaves similarly to the Clang target_clones
attribute.
[dependencies]
maybe_special = "1.1"This crate is designed for Rust edition 2024 (rustc 1.85+).
§Usage
This macro takes in a series of specialisations in the form arch = ["feature1", "feature2", ...]. This macro uses std::arch/std_detect
under the hood, so look at their documentation for more details, especially
since some architectures are currently unstable. Additionally,
specialisations can be marked with static to enable static dispatch on
them, which is explained below.
Usage notes
- This macro does not figure out which specialisations are most optimal for a given function, that is still something you must benchmark yourself.
- Make sure to use this macro sparingly as it can paradoxically add a
significant performance overhead when applied improperly. This macro adds
an atomic memory read and creates an inline boundary for every function it
is applied to. Additionally, the initialisation run upon the first call of
the function can be comparatively quite slow due to performance issues
with
std::arch/std_detect’sis_*_feature_detectedmacros. - This macro can only specialise any function it is applied to. If a function calls another function which isn’t inlined, that callee will not be specialised.
- Under the hood this macro uses the
#[target_feature]attribute which tells LLVM to output code as if those features were enabled. However, it seems there is a bug where any form of LTO undoes some feature-specific optimisations.
Example
#[maybe_special::make_special(
x86 = ["avx512f", "avx512vl"],
static x86 = ["sse4.1"],
riscv = ["v"]
)]
pub fn fast_dot_product(a: [u32; 16], b: [u32; 16]) -> u32 {
a.iter().zip(b.iter()).map(|(a, b)| a * b).sum()
}§Use on types that use self/Self
To allow this macro to work anywhere it must generate the specialisations
inside the outer function, however this has the side-effect of not working
for types that use self/Self (because the inner function doesn’t know
what Self is).
To get around this, you can do something like the following:
impl SomeType {
fn clone_multiple(&self, num: usize) -> Vec<Self> {
#[maybe_special::make_special(x86 = ["avx2"])]
#[inline(always)]
fn inner(val: &SomeType, num: usize) -> Vec<SomeType> {
vec![val.clone(); num]
}
inner(self, num)
}
}§Manual specification implementations
If you wish to implement the specifications manually, you can provide an
implementation yourself by putting => unsafe some_impl after the feature
set. Each impl must have the exact same function signature as the generic
impl. The unsafe keyword is required because you must ensure that this
impl will always return the same result as every other impl, otherwise it is
undefined behaviour and may cause hard to debug errors.
Note: It is not recommended to use manual implementations. LLVM tends to produce more optimised code than anything a human can produce.
Example
fn dot_product_avx2(a: [u32; 16], b: [u32; 16]) -> u32 {
// Your impl here
...
}
#[maybe_special::make_special(
static x86 = ["avx2"] => unsafe dot_product_avx2,
static x86 = ["sse4.1"],
riscv = ["v"]
)]
pub fn dot_product(a: [u32; 16], b: [u32; 16]) -> u32 {
a.iter().zip(b.iter()).map(|(a, b)| a * b).sum()
}§no_std support
By default, this macro utilises std::arch, however this can be disabled
by disabling the std feature. When the std feature is disabled, the code
generated will instead use the unstable std_detect module, which must be
included manually.
§Dispatch types
When calling the outer function, this macro utilises a dispatch function to figure out which specialisation to use. The different dispatch methods are documented below.
Const dispatch
When applied to a const fn, this macro utilises the const_eval_select
compiler intrinsic to either branch to the inner impl at compile-time, or
the regular dynamic dispatch function at run-time. However, this
intrinsic is currently unstable, so you will need to add
#![feature(core_intrinsics, const_eval_select)] to your crate to use this.
Static dispatch
When the executable/library is being compiled with all checked features
enabled, this macro will skip dynamic dispatch, and jump directly to the
inner impl. You can also manually mark a specialisation to do this even if
features not specified are not enabled with the static keyword. This macro
will pick the first static-dispatchable specialisation that meets all its
criteria (or use dynamic dispatch if none meet their criteria at
compile-time).
Function pointer dispatch
This is the default dispatch method. This macro generates a static mutable function pointer that is called upon calling the outer function. Upon first call, instead of directly calling a specialisation or the generic impl, it instead calls an initialiser function that checks for all enabled features at run-time, and determines the best specialisation to call. This result is saved so that all future calls are fast.
Jump table dispatch
When applied to a function that contains generics, impl types, or is
async, function pointer dispatch will not work. This is because all types
must be specified exactly to generate a function pointer. async functions
under the hood desugar to returning an impl Future<Output = Ty>,
therefore making them also behave as if they were generic. Therefore, this
macro falls back to a jump table dispatch method, where instead of utilising
a function pointer directly, it instead utilises an index into a jump table.
This dispatch method is almost identical to the function pointer method,
however can be a few cycles slower.
Attribute Macros§
- make_
special - Refer to the crate-level documentation