archmage
Safely invoke your intrinsic power, using the tokens granted to you by the CPU.
Zero overhead. Archmage generates identical assembly to hand-written unsafe code. The safety abstractions exist only at compile time—at runtime, you get raw SIMD instructions. Calling an #[arcane] function costs exactly the same as calling a bare #[target_feature] function directly.
[]
= "0.6"
= "0.6"
Raw intrinsics with #[arcane]
use *;
summon() checks CPUID. #[arcane] enables #[target_feature], making intrinsics safe (Rust 1.85+). The prelude re-exports safe_unaligned_simd functions directly — _mm256_loadu_ps takes &[f32; 8], not a raw pointer. Compile with -C target-cpu=haswell to elide the runtime check.
Inner helpers with #[rite]
#[rite] should be your default. Use #[arcane] only at entry points.
use *;
// Entry point: use #[arcane]
// Inner helper: use #[rite] (inlines into #[arcane] — features match)
#[rite] adds #[target_feature] + #[inline] without a wrapper function. Since Rust 1.85+, calling #[target_feature] functions from matching contexts is safe—no unsafe needed between #[arcane] and #[rite] functions.
Performance rule: Never call #[arcane] from #[arcane]. Use #[rite] for any function called exclusively from SIMD code.
Why this matters
Processing 1000 8-float vector additions (full benchmark details):
| Pattern | Time | Why |
|---|---|---|
#[rite] in #[arcane] |
547 ns | Features match — LLVM inlines |
#[arcane] per iteration |
2209 ns (4x) | Target-feature boundary per call |
Bare #[target_feature] (no archmage) |
2222 ns (4x) | Same boundary — archmage adds nothing |
The 4x penalty comes from LLVM's #[target_feature] optimization boundary, not from archmage. Bare #[target_feature] has the same cost. With real workloads (DCT-8), the boundary costs up to 6.2x. Use #[rite] for helpers called from SIMD code — it inlines into callers with matching features, eliminating the boundary.
SIMD types with magetypes
use ;
use f32x8;
f32x8 wraps __m256 with token-gated construction and natural operators.
Runtime dispatch with incant!
Write platform-specific variants with concrete types, then dispatch at runtime:
use incant;
use f32x8;
const LANES: usize = 8;
/// AVX2 path — processes 8 floats at a time.
/// Scalar fallback.
/// Public API — dispatches to the best available at runtime.
incant! looks for _v3, _v4, _neon, _wasm128, and _scalar suffixed functions, and dispatches to the best one the CPU supports. Each variant uses concrete SIMD types for its platform; the scalar fallback uses plain math.
#[magetypes] for simple cases
If your function body doesn't use SIMD types (only Token), #[magetypes] can generate the variants for you by replacing Token with the concrete token type for each platform:
use magetypes;
For functions that use platform-specific SIMD types (f32x8, f32x4, etc.), write the variants manually and use incant! as shown above.
Tokens
| Token | Alias | Features |
|---|---|---|
X64V2Token |
SSE4.2, POPCNT | |
X64V3Token |
Desktop64 |
AVX2, FMA, BMI2 |
X64V4Token |
Server64 |
AVX-512 (requires avx512 feature) |
NeonToken |
Arm64 |
NEON |
Wasm128Token |
WASM SIMD | |
ScalarToken |
Always available |
All tokens compile on all platforms. summon() returns None on unsupported architectures. Detection is cached: ~1.3 ns after first call, 0 ns with -Ctarget-cpu=haswell (compiles away).
The prelude
use archmage::prelude::* gives you:
- Tokens:
Desktop64,Arm64,ScalarToken, etc. - Traits:
SimdToken,IntoConcreteToken,HasX64V2, etc. - Macros:
#[arcane],#[rite],#[magetypes],incant! - Intrinsics:
core::arch::*for your platform - Memory ops:
safe_unaligned_simdfunctions (reference-based, no raw pointers)
Testing SIMD dispatch paths
Every incant! dispatch and if let Some(token) = summon() branch creates a fallback path. You can test all of them on your native hardware — no cross-compilation needed.
Exhaustive permutation testing
for_each_token_permutation runs your closure once for every unique combination of token tiers, from "all SIMD enabled" down to "scalar only". It handles the disable/re-enable lifecycle, mutex serialization, cascade logic, and deduplication.
use ;
On an AVX-512 machine, this runs 5–7 permutations (all enabled → AVX-512 only → AVX2+FMA → SSE4.2 → scalar). On a Haswell-era CPU without AVX-512, 3 permutations. Tokens the CPU doesn't have are skipped — they'd produce duplicate states.
Token disabling is process-wide, so run with --test-threads=1:
CompileTimePolicy and -Ctarget-cpu
If you compiled with -Ctarget-cpu=native, the compiler bakes feature detection into the binary. summon() returns Some unconditionally, and tokens can't be disabled at runtime — the runtime check was compiled out.
The CompileTimePolicy enum controls what happens when for_each_token_permutation encounters these undisableable tokens:
Warn— Exclude the token from permutations silently. Warnings are collected in the report.WarnStderr— Same, but also prints each warning to stderr with actionable fix instructions.Fail— Panic with the exact compiler flags needed to fix it.
For full coverage in CI, use the disable_compile_time_tokens feature. This makes compiled_with() return None even when features are baked in, so summon() uses runtime detection and tokens can be disabled:
# In your CI test configuration
[]
= { = "0.6", = ["disable_compile_time_tokens"] }
Enforcing full coverage via env var
Wire an environment variable to switch between Warn in local development and Fail in CI:
use ;
Then in CI (with disable_compile_time_tokens enabled):
ARCHMAGE_FULL_PERMUTATIONS=1
If a token is still compile-time guaranteed (you forgot the feature or have stale RUSTFLAGS), Fail panics with the exact flags to fix it:
x86-64-v3: compile-time guaranteed, excluded from permutations. To include it, either:
1. Add `disable_compile_time_tokens` to archmage features in Cargo.toml
2. Remove `-Ctarget-cpu` from RUSTFLAGS
3. Compile with RUSTFLAGS="-Ctarget-feature=-avx2,-fma,-bmi1,-bmi2,-f16c,-lzcnt"
Manual single-token disable
For targeted tests that only need to disable one token:
use ;
Disabling cascades downward: disabling V2 also disables V3/V4/Modern/Fp16; disabling NEON also disables Aes/Sha3/Crc. dangerously_disable_tokens_except_wasm(true) disables everything at once.
Feature flags
| Feature | Default | |
|---|---|---|
std |
yes | Standard library |
macros |
yes | #[arcane], #[magetypes], incant! |
safe_unaligned_simd |
yes | Re-exports via prelude |
avx512 |
no | AVX-512 tokens |
License
MIT OR Apache-2.0