1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
//! [![github]](https://github.com/pypylia/maybe_special) [![crates-io]](https://crates.io/crates/maybe_special) [![docs-rs]](https://docs.rs/maybe_special) [](https://github.com/fasterthanlime/free-of-syn)
//!
//! [github]: https://img.shields.io/badge/github-8da0cb?style=for-the-badge&labelColor=555555&logo=github
//! [crates-io]: https://img.shields.io/badge/crates.io-fc8d62?style=for-the-badge&labelColor=555555&logo=rust
//! [docs-rs]: https://img.shields.io/badge/docs.rs-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs
//!
//! This crate provides the [`#[make_special]`](macro@make_special) attribute
//! macro to automatically create a series of target feature specialisations for
//! the given function. This behaves similarly to the Clang [`target_clones`]
//! attribute.
//!
//! [`target_clones`]: https://clang.llvm.org/docs/AttributeReference.html#target-clones
//!
//! ```toml
//! [dependencies]
//! maybe_special = "1.1"
//! ```
//!
//! *This crate is designed for Rust edition 2024 (rustc 1.85+).*
//!
//! # Usage
//! This macro takes in a series of specialisations in the form `arch =
//! ["feature1", "feature2", ...]`. This macro uses [`std::arch`]/[`std_detect`]
//! under the hood, so look at their documentation for more details, especially
//! since some architectures are currently unstable. Additionally,
//! specialisations can be marked with `static` to enable static dispatch on
//! them, which is explained below.
//!
//! <h5>Usage notes</h5>
//!
//! - This macro does not figure out which specialisations are most optimal for
//! a given function, that is still something you must benchmark yourself.
//! - Make sure to use this macro sparingly as it can paradoxically add a
//! significant performance overhead when applied improperly. This macro adds
//! an atomic memory read and creates an inline boundary for every function it
//! is applied to. Additionally, the initialisation run upon the first call of
//! the function can be comparatively quite slow due to performance issues
//! with [`std::arch`]/[`std_detect`]'s `is_*_feature_detected` macros.
//! - This macro can only specialise any function it is applied to. If a
//! function calls another function which isn't inlined, that callee will not
//! be specialised.
//! - Under the hood this macro uses the [`#[target_feature]`](https://doc.rust-lang.org/reference/attributes/codegen.html#the-target_feature-attribute)
//! attribute which tells LLVM to output code as if those features were
//! enabled. However, it seems there is a bug where any form of [LTO] undoes
//! some feature-specific optimisations.
//!
//! [LTO]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
//!
//! <h5>Example</h5>
//!
//! ```
//! #[maybe_special::make_special(
//! x86 = ["avx512f", "avx512vl"],
//! static x86 = ["sse4.1"],
//! riscv = ["v"]
//! )]
//! pub fn fast_dot_product(a: [u32; 16], b: [u32; 16]) -> u32 {
//! a.iter().zip(b.iter()).map(|(a, b)| a * b).sum()
//! }
//! ```
//!
//! # Use on types that use `self`/`Self`
//! To allow this macro to work anywhere it must generate the specialisations
//! inside the outer function, however this has the side-effect of not working
//! for types that use `self`/`Self` (because the inner function doesn't know
//! what `Self` is).
//!
//! To get around this, you can do something like the following:
//! ```
//! impl SomeType {
//! fn clone_multiple(&self, num: usize) -> Vec<Self> {
//! #[maybe_special::make_special(x86 = ["avx2"])]
//! #[inline(always)]
//! fn inner(val: &SomeType, num: usize) -> Vec<SomeType> {
//! vec![val.clone(); num]
//! }
//!
//! inner(self, num)
//! }
//! }
//! ```
//!
//! # Manual specification implementations
//! If you wish to implement the specifications manually, you can provide an
//! implementation yourself by putting `=> unsafe some_impl` after the feature
//! set. Each impl must have the exact same function signature as the generic
//! impl. The `unsafe` keyword is required because you must ensure that this
//! impl will always return the same result as every other impl, otherwise it is
//! [undefined behaviour] and may cause hard to debug errors.
//!
//! **Note: It is not recommended to use manual implementations. LLVM tends to
//! produce more optimised code than anything a human can produce.**
//!
//! [undefined behaviour]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
//!
//! <h5>Example</h5>
//!
//! ```
//! fn dot_product_avx2(a: [u32; 16], b: [u32; 16]) -> u32 {
//! // Your impl here
//! ...
//! }
//!
//! #[maybe_special::make_special(
//! static x86 = ["avx2"] => unsafe dot_product_avx2,
//! static x86 = ["sse4.1"],
//! riscv = ["v"]
//! )]
//! pub fn dot_product(a: [u32; 16], b: [u32; 16]) -> u32 {
//! a.iter().zip(b.iter()).map(|(a, b)| a * b).sum()
//! }
//! ```
//!
//! # `no_std` support
//! By default, this macro utilises [`std::arch`], however this can be disabled
//! by disabling the `std` feature. When the `std` feature is disabled, the code
//! generated will instead use the unstable [`std_detect`] module, which must be
//! included manually.
//!
//! # Dispatch types
//! When calling the outer function, this macro utilises a dispatch function to
//! figure out which specialisation to use. The different dispatch methods are
//! documented below.
//!
//! <h5>Const dispatch</h5>
//!
//! When applied to a `const fn`, this macro utilises the [`const_eval_select`]
//! compiler intrinsic to either branch to the inner impl at compile-time, or
//! the regular dynamic dispatch function at run-time. However, this
//! intrinsic is currently unstable, so you will need to add
//! `#![feature(core_intrinsics, const_eval_select)]` to your crate to use this.
//!
//! [`const_eval_select`]: core::intrinsics::const_eval_select
//!
//! <h5>Static dispatch</h5>
//!
//! When the executable/library is being compiled with all checked features
//! enabled, this macro will skip dynamic dispatch, and jump directly to the
//! inner impl. You can also manually mark a specialisation to do this even if
//! features not specified are not enabled with the `static` keyword. This macro
//! will pick the first static-dispatchable specialisation that meets all its
//! criteria (or use dynamic dispatch if none meet their criteria at
//! compile-time).
//!
//! <h5>Function pointer dispatch</h5>
//!
//! This is the default dispatch method. This macro generates a static mutable
//! function pointer that is called upon calling the outer function. Upon first
//! call, instead of directly calling a specialisation or the generic impl, it
//! instead calls an initialiser function that checks for all enabled features
//! at run-time, and determines the best specialisation to call. This result is
//! saved so that all future calls are fast.
//!
//! <h5>Jump table dispatch</h5>
//!
//! When applied to a function that contains generics, `impl` types, or is
//! `async`, function pointer dispatch will not work. This is because all types
//! must be specified exactly to generate a function pointer. `async` functions
//! under the hood desugar to returning an `impl Future<Output = Ty>`,
//! therefore making them also behave as if they were generic. Therefore, this
//! macro falls back to a jump table dispatch method, where instead of utilising
//! a function pointer directly, it instead utilises an index into a jump table.
//! This dispatch method is almost identical to the function pointer method,
//! however can be a few cycles slower.
//!
//! [`std_detect`]: https://doc.rust-lang.org/nightly/std_detect/index.html
extern crate proc_macro;
use TokenStream;
use ;
use ;
pub use Architecture;
pub use FnBuilder;
pub use Specialisation;
pub
/// Refer to the [crate-level documentation](crate)