Skip to main content

Module simd

Module simd 

Source
Expand description

SIMD abstraction layer for high-performance FFT computation.

Provides a unified interface for SIMD operations across different architectures, enabling vectorized FFT butterflies and complex arithmetic.

§Overview

OxiFFT’s SIMD layer provides:

§Available Backends

BackendArchitectureVector WidthLanes (f64)Lanes (f32)Features
ScalarAll64/32-bit11Always available
Sse2F64/Sse2F32x86_64128-bit24SSE2 (baseline x86_64)
AvxF64/AvxF32x86_64256-bit48AVX
Avx2F64/Avx2F32x86_64256-bit48AVX2 + FMA3
Avx512F64/Avx512F32x86_64512-bit816AVX-512F
NeonF64/NeonF32aarch64128-bit24NEON (mandatory)
Portable*AllVariable2-84-16Nightly + portable_simd

*Portable SIMD requires nightly Rust and the portable_simd feature flag.

§CPU Requirements

§x86_64

  • SSE2: Required for x86_64 (guaranteed on all modern CPUs since 2003)
  • AVX: Intel Sandy Bridge (2011+), AMD Bulldozer (2011+)
  • AVX2 + FMA: Intel Haswell (2013+), AMD Excavator (2015+)
  • AVX-512: Intel Skylake-X (2017+), AMD Zen 4 (2022+), limited server CPUs

§aarch64 (ARM64)

  • NEON: Mandatory on aarch64, always available (Apple M1/M2/M3, AWS Graviton, Ampere)

§Runtime Detection

Use detect_simd_level() to query the highest available SIMD level at runtime:

use oxifft::simd::{detect_simd_level, SimdLevel};

let level = detect_simd_level();
match level {
    SimdLevel::Avx512 => println!("Using AVX-512 (512-bit vectors)"),
    SimdLevel::Avx2 => println!("Using AVX2 with FMA (256-bit vectors)"),
    SimdLevel::Avx => println!("Using AVX (256-bit vectors)"),
    SimdLevel::Sse2 => println!("Using SSE2 (128-bit vectors)"),
    SimdLevel::Neon => println!("Using NEON (128-bit vectors)"),
    SimdLevel::Sve => println!("Using ARM SVE (scalable vectors)"),
    SimdLevel::Scalar => println!("No SIMD, using scalar fallback"),
}

§Performance Guidelines

§Memory Alignment

For optimal SIMD performance, data should be aligned:

  • SSE2/NEON: 16-byte alignment
  • AVX/AVX2: 32-byte alignment
  • AVX-512: 64-byte alignment

Use alloc_complex_aligned or AlignedBuffer for aligned memory. Unaligned loads/stores work but may be slower on some architectures.

§Expected Speedups

Typical speedups over scalar code for FFT operations:

  • SSE2/NEON: 1.5-2x for f64, 2-3x for f32
  • AVX/AVX2: 2-3x for f64, 3-5x for f32
  • AVX-512: 3-5x for f64, 5-8x for f32

Actual speedups depend on problem size, memory bandwidth, and cache behavior.

§FMA (Fused Multiply-Add)

AVX2 and later include FMA instructions which:

  • Compute a * b + c in a single operation
  • Provide better precision (single rounding instead of two)
  • Reduce pipeline stalls in complex arithmetic

§Feature Flags

  • portable_simd: Enable experimental portable SIMD backend (requires nightly)

§Example: Using SIMD Traits

use oxifft::simd::{SimdVector, SimdComplex};

fn complex_butterfly<V: SimdComplex>(a: V, b: V) -> (V, V) {
    V::butterfly(a, b) // Returns (a+b, a-b)
}

§Safety

All SIMD types use unsafe internally but expose a safe API. The unsafe operations are:

  • load_aligned/store_aligned: Require proper alignment
  • load_unaligned/store_unaligned: Require valid pointer for LANES elements

OxiFFT’s internal code handles alignment automatically.

Structs§

Avx2F32
AVX2 f32 vector type with FMA support (8 lanes).
Avx2F64
AVX2 f64 vector type with FMA support (4 lanes).
Avx512F32
AVX-512 f32 vector type (16 lanes).
Avx512F64
AVX-512 f64 vector type (8 lanes).
AvxF32
AVX f32 vector type (8 lanes).
AvxF64
AVX f64 vector type (4 lanes).
Scalar
Scalar “SIMD” type (1-lane fallback).
Sse2F32
SSE2 f32 vector type (4 lanes).
Sse2F64
SSE2 f64 vector type (2 lanes).

Enums§

SimdLevel
SIMD capability level.

Traits§

SimdComplex
Complex SIMD operations.
SimdVector
Core SIMD vector trait.

Functions§

detect_simd_level
Detect the highest available SIMD level.