Expand description
SIMD codelet generation.
Generates architecture-aware SIMD FFT codelets with multi-architecture dispatch. At compile time, generates:
- AVX-512F variant (512-bit,
8×f64/16×f32) forx86_64 - AVX2+FMA variant (256-bit,
4×f64) forx86_64 - Pure-AVX variant (256-bit,
4×f64, no FMA, no AVX2) forx86_64 - SSE2 variant (128-bit,
2×f64) forx86_64 - NEON variant (128-bit,
2×f64) foraarch64 - Scalar fallback for all architectures
The dispatcher function selects the best SIMD path at runtime using
is_x86_feature_detected! (x86_64) or compile-time cfg (aarch64).
Probe order for x86_64: AVX-512F > AVX2+FMA > AVX > SSE2 > scalar.
AVX-512F is probed first (when the host supports it) to enable
_mm512_fmadd_pd/_mm512_fmsub_pd based butterfly arithmetic.
Functions§
- generate
- Generate a SIMD-optimized codelet for the given FFT size.