Expand description
SIMD abstraction layer for high-performance FFT computation.
Provides a unified interface for SIMD operations across different architectures, enabling vectorized FFT butterflies and complex arithmetic.
§Overview
OxiFFT’s SIMD layer provides:
- Automatic runtime detection via
detect_simd_level() - Unified traits (
SimdVector,SimdComplex) for portable code - Architecture-specific implementations for maximum performance
§Available Backends
| Backend | Architecture | Vector Width | Lanes (f64) | Lanes (f32) | Features |
|---|---|---|---|---|---|
Scalar | All | 64/32-bit | 1 | 1 | Always available |
Sse2F64/Sse2F32 | x86_64 | 128-bit | 2 | 4 | SSE2 (baseline x86_64) |
AvxF64/AvxF32 | x86_64 | 256-bit | 4 | 8 | AVX |
Avx2F64/Avx2F32 | x86_64 | 256-bit | 4 | 8 | AVX2 + FMA3 |
Avx512F64/Avx512F32 | x86_64 | 512-bit | 8 | 16 | AVX-512F |
NeonF64/NeonF32 | aarch64 | 128-bit | 2 | 4 | NEON (mandatory) |
| Portable* | All | Variable | 2-8 | 4-16 | Nightly + portable_simd |
*Portable SIMD requires nightly Rust and the portable_simd feature flag.
§CPU Requirements
§x86_64
- SSE2: Required for x86_64 (guaranteed on all modern CPUs since 2003)
- AVX: Intel Sandy Bridge (2011+), AMD Bulldozer (2011+)
- AVX2 + FMA: Intel Haswell (2013+), AMD Excavator (2015+)
- AVX-512: Intel Skylake-X (2017+), AMD Zen 4 (2022+), limited server CPUs
§aarch64 (ARM64)
- NEON: Mandatory on aarch64, always available (Apple M1/M2/M3, AWS Graviton, Ampere)
§Runtime Detection
Use detect_simd_level() to query the highest available SIMD level at runtime:
use oxifft::simd::{detect_simd_level, SimdLevel};
let level = detect_simd_level();
match level {
SimdLevel::Avx512 => println!("Using AVX-512 (512-bit vectors)"),
SimdLevel::Avx2 => println!("Using AVX2 with FMA (256-bit vectors)"),
SimdLevel::Avx => println!("Using AVX (256-bit vectors)"),
SimdLevel::Sse2 => println!("Using SSE2 (128-bit vectors)"),
SimdLevel::Neon => println!("Using NEON (128-bit vectors)"),
SimdLevel::Sve => println!("Using ARM SVE (scalable vectors)"),
SimdLevel::Scalar => println!("No SIMD, using scalar fallback"),
}§Performance Guidelines
§Memory Alignment
For optimal SIMD performance, data should be aligned:
- SSE2/NEON: 16-byte alignment
- AVX/AVX2: 32-byte alignment
- AVX-512: 64-byte alignment
Use alloc_complex_aligned or AlignedBuffer for aligned memory.
Unaligned loads/stores work but may be slower on some architectures.
§Expected Speedups
Typical speedups over scalar code for FFT operations:
- SSE2/NEON: 1.5-2x for f64, 2-3x for f32
- AVX/AVX2: 2-3x for f64, 3-5x for f32
- AVX-512: 3-5x for f64, 5-8x for f32
Actual speedups depend on problem size, memory bandwidth, and cache behavior.
§FMA (Fused Multiply-Add)
AVX2 and later include FMA instructions which:
- Compute
a * b + cin a single operation - Provide better precision (single rounding instead of two)
- Reduce pipeline stalls in complex arithmetic
§Feature Flags
portable_simd: Enable experimental portable SIMD backend (requires nightly)
§Example: Using SIMD Traits
use oxifft::simd::{SimdVector, SimdComplex};
fn complex_butterfly<V: SimdComplex>(a: V, b: V) -> (V, V) {
V::butterfly(a, b) // Returns (a+b, a-b)
}§Safety
All SIMD types use unsafe internally but expose a safe API. The unsafe
operations are:
load_aligned/store_aligned: Require proper alignmentload_unaligned/store_unaligned: Require valid pointer for LANES elements
OxiFFT’s internal code handles alignment automatically.
Structs§
- Avx2F32
- AVX2 f32 vector type with FMA support (8 lanes).
- Avx2F64
- AVX2 f64 vector type with FMA support (4 lanes).
- Avx512
F32 - AVX-512 f32 vector type (16 lanes).
- Avx512
F64 - AVX-512 f64 vector type (8 lanes).
- AvxF32
- AVX f32 vector type (8 lanes).
- AvxF64
- AVX f64 vector type (4 lanes).
- Scalar
- Scalar “SIMD” type (1-lane fallback).
- Sse2F32
- SSE2 f32 vector type (4 lanes).
- Sse2F64
- SSE2 f64 vector type (2 lanes).
Enums§
- Simd
Level - SIMD capability level.
Traits§
- Simd
Complex - Complex SIMD operations.
- Simd
Vector - Core SIMD vector trait.
Functions§
- detect_
simd_ level - Detect the highest available SIMD level.