archmage
Safely invoke your intrinsic power, using the tokens granted to you by the CPU.
Zero overhead. Archmage generates identical assembly to hand-written unsafe code. The safety abstractions exist only at compile time—at runtime, you get raw SIMD instructions with no wrapper overhead.
[]
= "0.5"
= "0.5"
Raw intrinsics with #[arcane]
use *;
summon() checks CPUID. #[arcane] enables #[target_feature], making intrinsics safe (Rust 1.85+). The prelude re-exports safe_unaligned_simd functions directly — _mm256_loadu_ps takes &[f32; 8], not a raw pointer. Compile with -C target-cpu=haswell to elide the runtime check.
Inner helpers with #[rite]
#[rite] should be your default. Use #[arcane] only at entry points.
use *;
// Entry point: use #[arcane]
// Inner helper: use #[rite] (no wrapper overhead)
#[rite] adds #[target_feature] + #[inline] without a wrapper function. Since Rust 1.85+, calling #[target_feature] functions from matching contexts is safe—no unsafe needed between #[arcane] and #[rite] functions.
Performance rule: Never call #[arcane] from #[arcane]. Use #[rite] for any function called exclusively from SIMD code.
Why this matters
Processing 1000 8-float vector additions:
| Pattern | Time |
|---|---|
#[rite] helper called from #[arcane] |
572 ns |
#[arcane] called from loop |
2320 ns (4x slower) |
The difference is wrapper overhead. #[rite] inlines fully; #[arcane] generates an inner function call per invocation.
SIMD types with magetypes
use ;
use f32x8;
f32x8 wraps __m256 with token-gated construction and natural operators.
Runtime dispatch with incant!
Write platform-specific variants with concrete types, then dispatch at runtime:
use incant;
use f32x8;
const LANES: usize = 8;
/// AVX2 path — processes 8 floats at a time.
/// Scalar fallback.
/// Public API — dispatches to the best available at runtime.
incant! looks for _v3, _v4, _neon, _wasm128, and _scalar suffixed functions, and dispatches to the best one the CPU supports. Each variant uses concrete SIMD types for its platform; the scalar fallback uses plain math.
#[magetypes] for simple cases
If your function body doesn't use SIMD types (only Token), #[magetypes] can generate the variants for you by replacing Token with the concrete token type for each platform:
use magetypes;
For functions that use platform-specific SIMD types (f32x8, f32x4, etc.), write the variants manually and use incant! as shown above.
Tokens
| Token | Alias | Features |
|---|---|---|
X64V2Token |
SSE4.2, POPCNT | |
X64V3Token |
Desktop64 |
AVX2, FMA, BMI2 |
X64V4Token |
Server64 |
AVX-512 (requires avx512 feature) |
NeonToken |
Arm64 |
NEON |
Wasm128Token |
WASM SIMD | |
ScalarToken |
Always available |
All tokens compile on all platforms. summon() returns None on unsupported architectures. Detection is cached: ~1.3 ns after first call, 0 ns with -Ctarget-cpu=haswell (compiles away).
The prelude
use archmage::prelude::* gives you:
- Tokens:
Desktop64,Arm64,ScalarToken, etc. - Traits:
SimdToken,IntoConcreteToken,HasX64V2, etc. - Macros:
#[arcane],#[rite],#[magetypes],incant! - Intrinsics:
core::arch::*for your platform - Memory ops:
safe_unaligned_simdfunctions (reference-based, no raw pointers)
Feature flags
| Feature | Default | |
|---|---|---|
std |
yes | Standard library |
macros |
yes | #[arcane], #[magetypes], incant! |
safe_unaligned_simd |
yes | Re-exports via prelude |
avx512 |
no | AVX-512 tokens |
License
MIT OR Apache-2.0