1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
//! Inlineable `#[rite]` functions for embedding in your own `#[arcane]` code.
//!
//! These functions carry `#[target_feature]` + `#[inline]` directly — no wrapper,
//! no dispatch overhead. When called from a context with matching features (e.g.
//! your own `#[arcane]` entry point), LLVM inlines them fully.
//!
//! # Modules
//!
//! Organized by SIMD unit width:
//!
//! - `x4` — 4×f32 operations (128-bit: SSE on x86-64, NEON on AArch64, SIMD128 on WebAssembly)
//! - `x8` — 8×f32 operations (256-bit: AVX2+FMA on x86-64)
//! - `x16` — 16×f32 operations (512-bit: AVX-512 on x86-64)
//!
//! # Naming Convention
//!
//! Function suffixes match the required token type:
//!
//! - `_v3` — requires [`X64V3Token`](archmage::X64V3Token) (x86-64-v3: AVX2+FMA)
//! - `_v4` — requires [`X64V4Token`](archmage::X64V4Token) (x86-64-v4: AVX-512)
//! - `_neon` — requires [`NeonToken`](archmage::NeonToken) (AArch64 NEON)
//! - `_wasm128` — requires [`Wasm128Token`](archmage::Wasm128Token) (WebAssembly SIMD128)
//!
//! On x86-64, `x4` and `x8` both use `_v3` suffix (both require AVX2+FMA). The
//! module name (`x4` vs `x8`) distinguishes the SIMD width.
//!
//! # Example
//!
//! ```rust,ignore
//! use archmage::arcane;
//! use linear_srgb::tokens::{X64V3Token, x4, x8};
//!
//! #[arcane]
//! fn process_pixel(token: X64V3Token, rgba: [f32; 4]) -> [f32; 4] {
//! // x4 for single RGBA pixels
//! x4::srgb_to_linear_v3(token, rgba)
//! }
//!
//! #[arcane]
//! fn process_batch(token: X64V3Token, data: &mut [f32]) {
//! // x8 for maximum throughput on slices
//! x8::srgb_to_linear_slice_v3(token, data);
//! }
//! ```
// Re-export token types so users can `use linear_srgb::tokens::X64V3Token` etc.
pub use NeonToken;
pub use Wasm128Token;
pub use X64V3Token;
pub use X64V4Token;