Attribute Macro autoversion

Source

#[autoversion]

Expand description

Let the compiler auto-vectorize scalar code for each architecture.

Write a plain scalar function and let #[autoversion] generate architecture-specific copies — each compiled with different #[target_feature] flags via #[arcane] — plus a runtime dispatcher that calls the best one the CPU supports.

§Quick start

use archmage::autoversion;

#[autoversion]
fn sum_of_squares(data: &[f32]) -> f32 {
    let mut sum = 0.0f32;
    for &x in data {
        sum += x * x;
    }
    sum
}

// Call directly — no token, no unsafe:
let result = sum_of_squares(&my_data);

Each variant gets #[arcane] → #[target_feature(enable = "avx2,fma,...")], which unlocks the compiler’s auto-vectorizer for that feature set. On x86-64, that loop compiles to vfmadd231ps. On aarch64, fmla. The _scalar fallback compiles without SIMD target features.

§SimdToken — optional placeholder

You can optionally write _token: SimdToken as a parameter. The macro recognizes it and strips it from the dispatcher — both forms produce identical output. Prefer the tokenless form for new code.

#[autoversion]
fn normalize(_token: SimdToken, data: &mut [f32], scale: f32) {
    for x in data.iter_mut() { *x = (*x - 128.0) * scale; }
}
// Dispatcher is: fn normalize(data: &mut [f32], scale: f32)

§What gets generated

#[autoversion] fn process(data: &[f32]) -> f32 expands to:

process_v4(token: X64V4Token, ...) — AVX-512
process_v3(token: X64V3Token, ...) — AVX2+FMA
process_neon(token: NeonToken, ...) — aarch64 NEON
process_wasm128(token: Wasm128Token, ...) — WASM SIMD
process_scalar(token: ScalarToken, ...) — no SIMD, always available
process(data: &[f32]) -> f32 — dispatcher

Variants are private. The dispatcher gets the original function’s visibility. Within the same module, call variants directly for testing or benchmarking.

§Explicit tiers

#[autoversion(v3, v4, neon, arm_v2, wasm128)]
fn process(data: &[f32]) -> f32 { ... }

scalar is always included implicitly.

Default tiers: v4, v3, neon, wasm128, scalar.

Known tiers: v1, v2, v3, v3_crypto, v4, v4x, neon, neon_aes, neon_sha3, neon_crc, arm_v2, arm_v3, wasm128, wasm128_relaxed, x64_crypto, scalar.

§Methods

For inherent methods, self works naturally:

impl ImageBuffer {
    #[autoversion]
    fn normalize(&mut self, gamma: f32) {
        for pixel in &mut self.data {
            *pixel = (*pixel / 255.0).powf(gamma);
        }
    }
}
buffer.normalize(2.2);

For trait method delegation, use _self = Type (nested mode):

impl MyType {
    #[autoversion(_self = MyType)]
    fn compute_impl(&self, data: &[f32]) -> f32 {
        _self.weights.iter().zip(data).map(|(w, d)| w * d).sum()
    }
}

§Nesting with `incant!`

Hand-written SIMD for specific tiers, autoversion for the rest:

pub fn process(data: &[f32]) -> f32 {
    incant!(process(data), [v4, scalar])
}

#[arcane(import_intrinsics)]
fn process_v4(_t: X64V4Token, data: &[f32]) -> f32 { /* AVX-512 */ }

// Bridge: incant! passes ScalarToken, autoversion doesn't need one
fn process_scalar(_: ScalarToken, data: &[f32]) -> f32 {
    process_auto(data)
}

#[autoversion(v3, neon)]
fn process_auto(data: &[f32]) -> f32 { data.iter().sum() }

§Comparison with `#[magetypes]` + `incant!`

	`#[autoversion]`	`#[magetypes]` + `incant!`
Generates variants + dispatcher	Yes	Variants only (+ separate `incant!`)
Body touched	No (signature only)	Yes (text substitution)
Best for	Scalar auto-vectorization	Hand-written SIMD types

autoversion

Attribute Macro autoversion Copy item path