# archmage
> Safely invoke your intrinsic power, using the tokens granted to you by the CPU. Cast primitive magics faster than any mage alive.
## CRITICAL: Documentation Examples
**ALWAYS use `archmage::mem` for load/store in examples.** The entire point of this crate is to make SIMD safe. Never write examples with `unsafe { _mm256_loadu_ps(ptr) }` - that defeats the purpose.
```rust
// WRONG - bypasses the safety archmage provides
let v = unsafe { _mm256_loadu_ps(data.as_ptr()) };
// CORRECT - use the safe mem wrappers
let v = avx::_mm256_loadu_ps(token, data);
```
## Quick Start
```bash
cargo test # Run tests
cargo test --all-features # Test with all integrations
cargo clippy --all-features # Lint
cargo run -p xtask -- generate # Regenerate safe_unaligned_simd wrappers
```
## Core Insight: Rust 1.85+ Changed Everything
As of Rust 1.85, **value-based intrinsics are safe inside `#[target_feature]` functions**:
```rust
#[target_feature(enable = "avx2")]
unsafe fn example() {
let a = _mm256_setzero_ps(); // SAFE!
let b = _mm256_add_ps(a, a); // SAFE!
let c = _mm256_fmadd_ps(a, a, a); // SAFE!
// Only memory ops remain unsafe (raw pointers)
let v = unsafe { _mm256_loadu_ps(ptr) }; // Still needs unsafe
}
```
This means we **don't need to wrap** arithmetic, shuffle, compare, bitwise, or other value-based intrinsics. Only:
1. **Tokens** - Prove CPU features are available
2. **`#[arcane]` macro** - Enable `#[target_feature]` via token proof
3. **Safe load/store** - Reference-based memory operations (optional)
## How `#[arcane]` Works
The macro generates an inner function with `#[target_feature]`:
```rust
// You write:
#[arcane]
fn my_kernel(token: impl HasAvx2, data: &[f32; 8]) -> [f32; 8] {
let v = _mm256_setzero_ps(); // Safe!
// ...
}
// Macro generates:
fn my_kernel(token: impl HasAvx2, data: &[f32; 8]) -> [f32; 8] {
#[target_feature(enable = "avx2")]
unsafe fn inner(data: &[f32; 8]) -> [f32; 8] {
let v = _mm256_setzero_ps(); // Safe inside #[target_feature]!
// ...
}
// SAFETY: Token proves CPU support was verified via try_new()
unsafe { inner(data) }
}
```
**Why is this safe?**
1. `inner()` has `#[target_feature]`, so intrinsics are safe inside
2. Calling `inner()` is unsafe, but valid because:
- The function requires a token parameter
- Tokens can only be created via `summon()` which checks CPUID
- If you have a token, the CPU supports the features
## Generic Token Bounds
Use trait bounds to accept any compatible token:
```rust
#[arcane]
fn process(token: impl HasAvx2, data: &[f32; 8]) -> [f32; 8] {
// Works with Avx2Token, X64V3Token, X64V4Token, etc.
}
#[arcane]
fn fma_kernel<T: HasAvx2 + HasFma>(token: T, a: &[f32; 8], b: &[f32; 8]) -> [f32; 8] {
// Requires both AVX2 and FMA
}
```
**Recommended starting point:** `Desktop64` (alias for `X64V3Token`)
## Friendly Aliases
Use these intuitive names instead of memorizing microarchitecture levels:
| `Desktop64` | x86_64 desktops | AVX2 + FMA (Haswell 2013+, Zen 1+) |
| `Server64` | x86_64 servers | + AVX-512 (Xeon 2017+, Zen 4+) |
| `Arm64` | AArch64 | NEON (all 64-bit ARM) |
**Why these names?**
- `Desktop64` - Universal on modern desktops. Intel removed AVX-512 from consumer chips (12th-14th gen), so this is the safe choice.
- `Server64` - AVX-512 is reliable on Xeon servers, Intel HEDT, and AMD Zen 4+.
- `Arm64` - NEON is baseline on all AArch64, always available.
```rust
use archmage::{Desktop64, SimdToken, arcane};
#[arcane]
fn process(token: Desktop64, data: &mut [f32; 8]) {
// AVX2 + FMA intrinsics safe here
}
if let Some(token) = Desktop64::summon() {
process(token, &mut data);
}
```
## Directory Structure
```
src/
├── lib.rs # Main exports
├── tokens/
│ ├── mod.rs # SimdToken trait, marker traits (HasAvx2, etc.)
│ ├── x86.rs # x86 token types
│ ├── arm.rs # ARM token types
│ └── wasm.rs # WASM token types
├── composite/ # Higher-level operations (__composite feature)
│ ├── mod.rs
│ ├── simd_ops.rs # SIMD operation traits
│ ├── scalar_ops.rs # Scalar fallback traits
│ ├── x86_impls.rs # Token trait implementations
│ ├── transpose.rs # 8x8 matrix transpose
│ ├── dot_product.rs # Dot product
│ └── horizontal.rs # Horizontal reductions
├── integrate/
│ └── wide_ops.rs # wide crate integration (__wide feature)
├── mem.rs # Re-exports generated wrappers
└── generated/ # AUTO-GENERATED (safe_unaligned_simd feature)
├── x86/ # 235 x86_64 functions
└── aarch64/ # 240 NEON functions
xtask/
└── src/main.rs # Wrapper generator
```
## Token Hierarchy
**Recommended (friendly aliases):**
- `Desktop64` - AVX2 + FMA + BMI2 (Haswell 2013+, Zen 1+) **← Start here for x86**
- `Server64` - + AVX-512 (Xeon 2017+, Zen 4+)
- `Arm64` - NEON baseline **← Start here for ARM**
**x86 Profile Tokens (same as aliases):**
- `X64V3Token` = `Desktop64`
- `X64V4Token` = `Server64`
- `X64V2Token` - SSE4.2 + POPCNT (Nehalem 2008+)
**x86 Feature Tokens:**
- `Sse2Token` → `Sse41Token` → `Sse42Token` → `AvxToken` → `Avx2Token`
- `FmaToken` (independent), `Avx2FmaToken` (combined)
- `Avx512fToken`, `Avx512bwToken`, `Avx512Vbmi2Token` + VL variants
**ARM:**
- `NeonToken` = `Arm64` (baseline), `SveToken`, `Sve2Token`
## Marker Traits
Enable generic bounds:
```rust
fn requires_avx2(token: impl HasAvx2) { ... }
fn requires_fma(token: impl HasFma) { ... }
fn requires_both<T: HasAvx2 + HasFma>(token: T) { ... }
```
**Width traits:** `Has128BitSimd`, `Has256BitSimd`, `Has512BitSimd`
**Feature traits:** `HasSse`, `HasSse2`, `HasAvx`, `HasAvx2`, `HasFma`, `HasAvx512f`, etc.
**ARM traits:** `HasNeon`, `HasSve`, `HasSve2`
## Safe Memory Operations
With `safe_unaligned_simd` feature, the `mem` module provides reference-based load/store:
```rust
use archmage::{Desktop64, SimdToken, mem::avx};
if let Some(token) = Desktop64::summon() {
let v = avx::_mm256_loadu_ps(token, &data); // Safe! Reference, not pointer
avx::_mm256_storeu_ps(token, &mut out, v);
}
```
## Methods with Self Receivers
Use `_self = Type` to enable self receivers. Use `_self` in the body instead of `self`:
```rust
trait SimdOps {
fn double(&self, token: impl HasAvx2) -> Self;
}
impl SimdOps for [f32; 8] {
#[arcane(_self = [f32; 8])]
fn double(&self, _token: impl HasAvx2) -> Self {
// Use _self instead of self
let v = unsafe { _mm256_loadu_ps(_self.as_ptr()) };
let doubled = _mm256_add_ps(v, v);
let mut out = [0.0f32; 8];
unsafe { _mm256_storeu_ps(out.as_mut_ptr(), doubled) };
out
}
}
```
**All receiver types supported:** `self`, `&self`, `&mut self`
## Generated Wrappers
The `mem` module wraps `safe_unaligned_simd` with token requirements:
```bash
cargo run -p xtask -- generate # Regenerate after safe_unaligned_simd updates
```
The generator:
1. Parses safe_unaligned_simd source from cargo cache
2. Extracts function signatures and `#[target_feature]` attributes
3. Generates wrappers with `impl HasXxx` bounds
4. Groups by feature set (sse, sse2, avx, neon, etc.)
## License
MIT OR Apache-2.0