multiversed

Attribute macros wrapping multiversion with predefined SIMD target presets.

Why?

Writing multiversion target strings is tedious and error-prone:

// Without multiversed - verbose and hard to maintain
#[multiversion::multiversion(targets(
    "x86_64+sse+sse2+sse3+ssse3+sse4.1+sse4.2+popcnt+cmpxchg16b+avx+avx2+bmi1+bmi2+f16c+fma+lzcnt+movbe+xsave+fxsr+avx512f+avx512bw+avx512dq+avx512vl+avx512cd",
    "x86_64+sse+sse2+sse3+ssse3+sse4.1+sse4.2+popcnt+cmpxchg16b+avx+avx2+bmi1+bmi2+f16c+fma+lzcnt+movbe+xsave+fxsr"
))]
fn sum(data: &[f32]) -> f32 { data.iter().sum() }

// With multiversed - clean preset names
#[multiversed("x86-64-v4", "x86-64-v3")]
fn sum(data: &[f32]) -> f32 { data.iter().sum() }

Usage

use multiversed::multiversed;

// Use cargo feature defaults (x86-64-v3, aarch64-basic)
#[multiversed]
pub fn dot_product(a: &[f32], b: &[f32]) -> f32 {
    a.iter().zip(b).map(|(x, y)| x * y).sum()
}

// Explicit presets
#[multiversed("x86-64-v4", "aarch64-sve2")]
pub fn optimized_sum(data: &[f32]) -> f32 {
    data.iter().sum()
}

// Multiple tiers - runtime picks best available
#[multiversed("x86-64-v4", "x86-64-v3", "x86-64-v2")]
pub fn tiered_dispatch(data: &[f32]) -> f32 {
    data.iter().sum()
}

// Raw target strings (any string containing '+')
#[multiversed("x86_64+avx2+fma")]
pub fn custom_x86(data: &[f32]) -> f32 {
    data.iter().sum()
}

// Mix presets with raw target strings
#[multiversed("x86-64-v3", "x86_64+avx512f+avx512vbmi2", "aarch64-basic")]
pub fn mixed_targets(data: &[f32]) -> f32 {
    data.iter().sum()
}

Presets

Each preset is a complete, non-cumulative feature set based on the x86-64 psABI microarchitecture levels and ARM architecture versions.

x86-64

Preset	Key Features	Hardware
`x86-64-v2`	SSE4.2, POPCNT	Nehalem 2008+, Bulldozer 2011+
`x86-64-v3`	AVX2, FMA, BMI1/2	Haswell 2013+, Zen 1 2017+
`x86-64-v4`	AVX-512 (F/BW/DQ/VL/CD)	Xeon 2017+, Zen 4 2022+

Note: Intel consumer CPUs (12th-15th gen: Alder Lake, Raptor Lake, Arrow Lake) do not have AVX-512 due to E-core limitations. Only Xeon servers, i9-X/Xeon-W workstations, and AMD Zen 4+ have AVX-512.

aarch64

Preset	Key Features	Hardware
`aarch64-basic`	dotprod, fp16	Neoverse N1, Cortex-A75+, Apple M1+, Snapdragon X
`aarch64-v84`	+ sha3, fcma	Apple M1+, Snapdragon X, Neoverse V1+
`aarch64-sve`	+ SVE, i8mm, bf16	Neoverse V1 (Graviton3)
`aarch64-sve2`	+ SVE2, i8mm, bf16	Neoverse N2/V2+ (Graviton4, Grace, Axion)

Note: SVE/SVE2 is server-only (Neoverse). Apple Silicon, Qualcomm Oryon, and Cortex-A/X mobile cores do not implement SVE.

Dispatch Overhead

Benchmarks show no measurable overhead from feature string complexity:

Configuration	Time (64 floats)
No multiversion	15.4 ns
1 feature	15.8 ns
27 features	15.6 ns
6 targets	15.6 ns

The ~0.3ns difference is the indirect call cost. Feature checking happens at compile time, not runtime.

Cargo Features

# Default: x86-64-v3 + aarch64-basic
multiversed = "0.1"

# Server-focused (AVX-512 + SVE2)
multiversed = { version = "0.1", default-features = false, features = ["x86-64-v4", "aarch64-sve2"] }

# Multiple tiers (runtime dispatch picks best)
multiversed = { version = "0.1", features = ["x86-64-v4"] }  # adds v4 to default v3

# Disable multiversioning (debugging/profiling)
multiversed = { version = "0.1", features = ["force-disable"] }

Special Features

Feature	Description
`force-disable`	Pass through functions unchanged. Useful for debugging or faster builds.

wasm32

The multiversion crate does not support wasm32 (no runtime feature detection). For wasm32 SIMD, compile with the target feature directly:

RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown

Architecture Reference

See ARCH_TABLE.md for detailed CPU feature matrices covering:

x86-64: Intel (Nehalem → Arrow Lake), AMD (Bulldozer → Zen 5)
aarch64: Neoverse (N1/V1/N2/V2), Apple (M1-M5), Cortex (A75-X5), Qualcomm Oryon

How It Works

This crate generates #[multiversion::multiversion(targets(...))] attributes with architecture-appropriate target strings. The actual code generation and runtime dispatch are handled by the excellent multiversion crate.

Cross-compilation works correctly: cargo features control which targets are available, while #[cfg_attr] in the generated code selects based on the actual target architecture.

License

MIT OR Apache-2.0

multiversed 0.1.0