Skip to main content

Module gen_simd

Module gen_simd 

Source
Expand description

SIMD codelet generation.

Generates architecture-aware SIMD FFT codelets with multi-architecture dispatch. At compile time, generates:

  • AVX-512F variant (512-bit, 8×f64 / 16×f32) for x86_64
  • AVX2+FMA variant (256-bit, 4×f64) for x86_64
  • Pure-AVX variant (256-bit, 4×f64, no FMA, no AVX2) for x86_64
  • SSE2 variant (128-bit, 2×f64) for x86_64
  • NEON variant (128-bit, 2×f64) for aarch64
  • Scalar fallback for all architectures

The dispatcher function selects the best SIMD path at runtime using is_x86_feature_detected! (x86_64) or compile-time cfg (aarch64).

Probe order for x86_64: AVX-512F > AVX2+FMA > AVX > SSE2 > scalar. AVX-512F is probed first (when the host supports it) to enable _mm512_fmadd_pd/_mm512_fmsub_pd based butterfly arithmetic.

Functions§

generate
Generate a SIMD-optimized codelet for the given FFT size.