#[autoversion]Expand description
Let the compiler auto-vectorize scalar code for each architecture.
Write a plain scalar function and let #[autoversion] generate
architecture-specific copies — each compiled with different
#[target_feature] flags via #[arcane] — plus a runtime dispatcher
that calls the best one the CPU supports.
§Quick start
use archmage::autoversion;
#[autoversion]
fn sum_of_squares(data: &[f32]) -> f32 {
let mut sum = 0.0f32;
for &x in data {
sum += x * x;
}
sum
}
// Call directly — no token, no unsafe:
let result = sum_of_squares(&my_data);Each variant gets #[arcane] → #[target_feature(enable = "avx2,fma,...")],
which unlocks the compiler’s auto-vectorizer for that feature set.
On x86-64, that loop compiles to vfmadd231ps. On aarch64, fmla.
The _scalar fallback compiles without SIMD target features.
§SimdToken — optional placeholder
You can optionally write _token: SimdToken as a parameter. The macro
recognizes it and strips it from the dispatcher — both forms produce
identical output. Prefer the tokenless form for new code.
#[autoversion]
fn normalize(_token: SimdToken, data: &mut [f32], scale: f32) {
for x in data.iter_mut() { *x = (*x - 128.0) * scale; }
}
// Dispatcher is: fn normalize(data: &mut [f32], scale: f32)§What gets generated
#[autoversion] fn process(data: &[f32]) -> f32 expands to:
process_v4(token: X64V4Token, ...)— AVX-512process_v3(token: X64V3Token, ...)— AVX2+FMAprocess_neon(token: NeonToken, ...)— aarch64 NEONprocess_wasm128(token: Wasm128Token, ...)— WASM SIMDprocess_scalar(token: ScalarToken, ...)— no SIMD, always availableprocess(data: &[f32]) -> f32— dispatcher
Variants are private. The dispatcher gets the original function’s visibility. Within the same module, call variants directly for testing or benchmarking.
§Explicit tiers
#[autoversion(v3, v4, neon, arm_v2, wasm128)]
fn process(data: &[f32]) -> f32 { ... }scalar is always included implicitly.
Default tiers: v4, v3, neon, wasm128, scalar.
Known tiers: v1, v2, v3, v3_crypto, v4, v4x, neon,
neon_aes, neon_sha3, neon_crc, arm_v2, arm_v3, wasm128,
wasm128_relaxed, x64_crypto, scalar.
§Methods
For inherent methods, self works naturally:
impl ImageBuffer {
#[autoversion]
fn normalize(&mut self, gamma: f32) {
for pixel in &mut self.data {
*pixel = (*pixel / 255.0).powf(gamma);
}
}
}
buffer.normalize(2.2);For trait method delegation, use _self = Type (nested mode):
impl MyType {
#[autoversion(_self = MyType)]
fn compute_impl(&self, data: &[f32]) -> f32 {
_self.weights.iter().zip(data).map(|(w, d)| w * d).sum()
}
}§Nesting with incant!
Hand-written SIMD for specific tiers, autoversion for the rest:
pub fn process(data: &[f32]) -> f32 {
incant!(process(data), [v4, scalar])
}
#[arcane(import_intrinsics)]
fn process_v4(_t: X64V4Token, data: &[f32]) -> f32 { /* AVX-512 */ }
// Bridge: incant! passes ScalarToken, autoversion doesn't need one
fn process_scalar(_: ScalarToken, data: &[f32]) -> f32 {
process_auto(data)
}
#[autoversion(v3, neon)]
fn process_auto(data: &[f32]) -> f32 { data.iter().sum() }§Comparison with #[magetypes] + incant!
#[autoversion] | #[magetypes] + incant! | |
|---|---|---|
| Generates variants + dispatcher | Yes | Variants only (+ separate incant!) |
| Body touched | No (signature only) | Yes (text substitution) |
| Best for | Scalar auto-vectorization | Hand-written SIMD types |