archmage 0.2.1

Safely invoke your intrinsic power, using the tokens granted to you by the CPU. Cast primitive magics faster than any mage alive.
Documentation

archmage

Crates.io Documentation CI License

Safely invoke your intrinsic power, using the tokens granted to you by the CPU.

archmage provides zero-cost capability tokens that prove CPU features are available at runtime, making raw SIMD intrinsics safe to call via the #[arcane] macro.

Quick Start

[dependencies]
archmage = "0.2"
safe_unaligned_simd = "0.2"  # For safe memory operations
use archmage::{Desktop64, SimdToken, arcane};
use std::arch::x86_64::*;

#[arcane]
fn square(_token: Desktop64, data: &[f32; 8]) -> [f32; 8] {
    let v = safe_unaligned_simd::x86_64::_mm256_loadu_ps(data);
    let squared = _mm256_mul_ps(v, v);
    let mut out = [0.0f32; 8];
    safe_unaligned_simd::x86_64::_mm256_storeu_ps(&mut out, squared);
    out
}

fn main() {
    if let Some(token) = Desktop64::summon() {
        let result = square(token, &[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]);
        println!("{:?}", result); // [1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0]
    }
}

How It Works

SIMD intrinsics are unsafe for two reasons:

  1. Feature availability: Calling AVX2 instructions on a CPU without AVX2 is undefined behavior
  2. Memory operations: Load/store intrinsics use raw pointers

archmage solves #1 with capability tokens - zero-sized types that can only be created after runtime CPU detection succeeds:

// summon() checks CPUID and returns Some only if features are available
if let Some(token) = Desktop64::summon() {
    // Token exists = CPU definitely has AVX2 + FMA
}

The #[arcane] macro transforms your function to enable #[target_feature], which makes value-based intrinsics safe (Rust 1.85+):

#[arcane]
fn example(token: Desktop64, data: &[f32; 8]) -> [f32; 8] {
    let v = safe_unaligned_simd::x86_64::_mm256_loadu_ps(data);  // Safe!
    let result = _mm256_mul_ps(v, v);  // Safe! (value-based)
    // ...
}

For memory operations (#2), use the safe_unaligned_simd crate which provides reference-based alternatives.

Token Reference

x86-64 Tokens

Start with Desktop64 for most applications:

Token Features CPU Support
Desktop64 AVX2 + FMA + BMI2 Intel Haswell 2013+, AMD Zen 1 2017+
X64V2Token SSE4.2 + POPCNT Intel Nehalem 2008+, AMD Bulldozer 2011+
X64V3Token AVX2 + FMA + BMI2 Same as Desktop64 (alias)

Individual feature tokens for fine-grained control:

Token Features
Avx2FmaToken AVX2 + FMA
Avx2Token AVX2 only
FmaToken FMA only
AvxToken AVX
Sse42Token SSE4.2
Sse41Token SSE4.1

x86-64 AVX-512 Tokens (requires avx512 feature)

[dependencies]
archmage = { version = "0.2", features = ["avx512"] }
Token Features CPU Support
X64V4Token AVX-512 F/BW/CD/DQ/VL Intel Skylake-X 2017+, AMD Zen 4 2022+
Avx512ModernToken + VBMI2, VNNI, BF16, etc. Intel Ice Lake 2019+, AMD Zen 4+
Avx512Fp16Token + FP16 Intel Sapphire Rapids 2023+

Note: Intel 12th-14th gen consumer CPUs do NOT have AVX-512.

ARM Tokens

Token Features CPU Support
Arm64 NEON All AArch64 (baseline)
NeonToken NEON Same as Arm64 (alias)
NeonAesToken NEON + AES ARM with crypto extensions
NeonSha3Token NEON + SHA3 ARMv8.2+
ArmCryptoToken AES + SHA2 + CRC Most ARMv8 CPUs
ArmCrypto3Token + SHA3 ARMv8.4+ (M1/M2/M3, Graviton 2+)

WASM Tokens

Token Features
Simd128Token WASM SIMD

Token Hierarchy

Tokens form a hierarchy. Higher-level tokens can extract lower-level ones:

if let Some(v3) = X64V3Token::summon() {
    let v2: X64V2Token = v3.v2();           // v3 implies v2
    let avx2_fma: Avx2FmaToken = v3.avx2_fma();
    let avx2: Avx2Token = v3.avx2();
    let fma: FmaToken = v3.fma();
    let sse42: Sse42Token = v3.sse42();
}

Trait Bounds

Use trait bounds for generic SIMD code:

use archmage::{HasX64V2, SimdToken, arcane};

// Accept any token with at least v2 features
#[arcane]
fn process<T: HasX64V2>(_token: T, data: &[u8]) {
    // SSE4.2 intrinsics available
}

Available traits:

Trait Meaning
SimdToken Base trait for all tokens
HasX64V2 Has SSE4.2 + POPCNT
HasX64V4 Has AVX-512 (requires avx512 feature)
Has128BitSimd Has 128-bit vectors
Has256BitSimd Has 256-bit vectors
Has512BitSimd Has 512-bit vectors
HasNeon Has ARM NEON
HasNeonAes Has NEON + AES
HasNeonSha3 Has NEON + SHA3

Cross-Platform Code

All tokens compile on all platforms. summon() returns None on unsupported architectures:

use archmage::{Desktop64, Arm64, SimdToken};

fn process(data: &mut [f32]) {
    if let Some(token) = Desktop64::summon() {
        process_avx2(token, data);
    } else if let Some(token) = Arm64::summon() {
        process_neon(token, data);
    } else {
        process_scalar(data);
    }
}

SIMD Types

archmage provides token-gated SIMD types with ergonomic operators:

use archmage::{Desktop64, SimdToken, simd::f32x8};

if let Some(token) = Desktop64::summon() {
    let a = f32x8::splat(token, 2.0);
    let b = f32x8::from_array(token, [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]);
    let c = a * b + a;  // Operators work naturally
    let result = c.sqrt();
    println!("{:?}", result.to_array());
}

Available Types

Width Float Signed Int Unsigned Int Token Required
128-bit f32x4, f64x2 i8x16, i16x8, i32x4, i64x2 u8x16, u16x8, u32x4, u64x2 Sse41Token
256-bit f32x8, f64x4 i8x32, i16x16, i32x8, i64x4 u8x32, u16x16, u32x8, u64x4 Avx2FmaToken
512-bit f32x16, f64x8 i8x64, i16x32, i32x16, i64x8 u8x64, u16x32, u32x16, u64x8 Avx512Token

Operations

Construction (requires token): splat, from_array, load, zero

Extraction: to_array, as_array, store, raw

Arithmetic: +, -, *, / and assignment variants

Bitwise: &, |, ^ and assignment variants

Math (float): sqrt, abs, floor, ceil, round, min, max, clamp, mul_add, mul_sub, recip, rsqrt

Transcendentals (float): log2_lowp, log2_midp, exp2_lowp, exp2_midp, ln_lowp, ln_midp, exp_lowp, exp_midp, pow_lowp, pow_midp, cbrt_midp

Comparison: simd_eq, simd_ne, simd_lt, simd_le, simd_gt, simd_ge

Reduction: reduce_add, reduce_min, reduce_max

Integer: shl::<N>, shr::<N>, shr_arithmetic::<N>

Feature Flags

Feature Description
std (default) Standard library support
macros (default) #[arcane] macro
avx512 AVX-512 tokens
__composite Transpose, dot product (unstable)
__wide wide crate integration (unstable)

Testing Fallback Paths

Set ARCHMAGE_DISABLE=1 to force summon() to return None:

ARCHMAGE_DISABLE=1 cargo test

License

MIT OR Apache-2.0

AI-Generated Code Notice

Developed with Claude (Anthropic). Review critical paths before production use.