archmage
Safely invoke your intrinsic power, using the tokens granted to you by the CPU. Cast primitive magics faster than any mage alive.
archmage provides capability tokens that prove CPU feature availability at runtime, making raw SIMD intrinsics safe to call via the #[arcane] macro.
Quick Start
use ;
use avx; // safe load/store (enabled by default)
use *;
How It Works
The Problem
Raw SIMD intrinsics have two safety concerns:
- Feature availability: Calling
_mm256_add_pson a CPU without AVX is undefined behavior - Memory safety:
_mm256_loadu_ps(ptr)dereferences a raw pointer
Rust 1.85+ made value-based intrinsics safe inside #[target_feature] functions, but calling those functions is still unsafe because the compiler can't verify the CPU supports the features.
The Solution: Tokens + #[arcane]
archmage solves this with two components:
1. Capability Tokens - Zero-sized proof types created after runtime CPU detection:
use ;
// summon() checks CPUID and returns Some only if features are available
// (check is elided if compiled with -C target-cpu=native or similar)
if let Some = summon
2. The #[arcane] Macro - Transforms your function to enable #[target_feature]:
The macro generates:
Why is this safe?
inner()has#[target_feature(enable = "avx2")], so Rust allows intrinsics withoutunsafe- Calling
inner()is unsafe, but we know it's valid because:- The function requires a token parameter
- Tokens can only be created via
summon()which checks CPU features - Therefore, if you have a token, the CPU supports the features
Generic Token Bounds
Functions accept any token that provides the required capabilities:
use ;
use avx;
use *;
// Accept any token with AVX2 (Avx2Token, Desktop64, Server64, etc.)
// Require multiple features with inline bounds
// Where clause syntax
The trait hierarchy means broader tokens satisfy narrower bounds:
Desktop64implementsHasAvx2,HasFma,HasSse42, etc.Server64implements everythingDesktop64does, plusHasAvx512f, etc.
Choosing a Token
Start with Desktop64 - it's the sweet spot for modern x86-64:
| Token | Features | Hardware Coverage |
|---|---|---|
Desktop64 |
AVX2 + FMA + BMI2 | Intel Haswell 2013+, AMD Zen 1 2017+ (~95% of x86-64) |
Server64 |
+ AVX-512 | Intel Skylake-X 2017+, AMD Zen 4 2022+ |
X64V2Token |
SSE4.2 + POPCNT | Intel Nehalem 2008+, AMD Bulldozer 2011+ |
For specific features:
| Token | Use Case |
|---|---|
Avx2Token |
Need AVX2 but not FMA |
Avx2FmaToken |
AVX2 + FMA (most floating-point SIMD) |
FmaToken |
FMA only |
Sse2Token |
Baseline x86-64 (always available) |
ARM tokens:
| Token | Features | Hardware |
|---|---|---|
NeonToken |
NEON | All AArch64 (baseline, including Apple M-series) |
SveToken |
SVE | Graviton 3, A64FX |
Sve2Token |
SVE2 | ARMv9: Graviton 4, Cortex-X2+ |
Cross-Architecture Tokens
All token types are available on all architectures. This makes cross-platform code easier to write without #[cfg] guards everywhere:
use ;
// This compiles on ARM, x86, WASM - no #[cfg] needed!
summon()returnsNoneon unsupported architectures- Rust's type system ensures intrinsic methods don't exist on the wrong arch
- You get compile errors if you try to use x86 intrinsics in ARM code
Safe Memory Operations (mem module)
The mem module (enabled by default) provides safe load/store using references instead of raw pointers:
use ;
use avx;
if let Some = summon
Available submodules:
| Module | Functions | Token Required |
|---|---|---|
mem::sse |
_mm_loadu_ps, _mm_storeu_ps, etc. |
impl HasSse |
mem::sse2 |
_mm_loadu_pd, _mm_loadu_si128, etc. |
impl HasSse2 |
mem::avx |
_mm256_loadu_ps, _mm256_storeu_ps, etc. |
impl HasAvx |
mem::avx2 |
_mm256_loadu_si256, etc. |
impl HasAvx2 |
mem::avx512f |
_mm512_loadu_ps, etc. |
impl HasAvx512f |
mem::neon |
vld1q_f32, vst1q_f32, etc. |
impl HasNeon |
The wrappers accept any compatible token (e.g., Desktop64 works with mem::avx because it implements HasAvx).
When to Use archmage
archmage is for when you need specific instructions that autovectorization won't produce:
- Complex shuffles and permutes
- Exact FMA sequences for numerical precision
- DCT butterflies and signal processing
- Gather/scatter operations
- Bit manipulation (BMI1/BMI2)
For portable SIMD without manual intrinsics, use the wide crate instead.
| Approach | When to Use |
|---|---|
| wide | Portable code, let the compiler choose instructions |
| archmage | Need specific instructions, complex algorithms |
Feature Flags
[]
= "0.1"
| Feature | Description |
|---|---|
std (default) |
Enable std library support |
macros (default) |
Enable #[arcane] macro (alias: #[simd_fn]) |
safe_unaligned_simd (default) |
Safe load/store via references (exposed as mem module) |
| Unstable features (API may change): |
| Feature | Description |
|---|---|
__composite |
Higher-level ops (transpose, dot product) |
__wide |
Integration with the wide crate |
Testing Scalar Fallbacks
Set the ARCHMAGE_DISABLE environment variable to force scalar code paths:
ARCHMAGE_DISABLE=1
ARCHMAGE_DISABLE=1
// With ARCHMAGE_DISABLE set, this always takes the fallback path
if let Some = summon else
Methods with Self Receivers
Methods with self, &self, &mut self receivers are supported via the _self = Type argument.
Use _self in the function body instead of self:
use ;
Why _self? The macro generates an inner function where self becomes a regular
parameter named _self. Using _self in your code reminds you that you're not using
the normal self keyword.
All receiver types are supported: self (move), &self (ref), &mut self (mut ref)
License
MIT OR Apache-2.0
AI-Generated Code Notice
Developed with Claude (Anthropic). Not all code manually reviewed. Review critical paths before production use.