Skip to main content

Module multi_transform

Module multi_transform 

Source
Expand description

Build-time codegen for SIMD vrank multi-transform codelets.

A multi-transform codelet processes V DFTs of size N simultaneously.

§Implementations

  • SSE2 f32 (V=4): true SIMD for sizes 2 and 4 via notw_{size}_v4_sse2_f32_soa.
  • AVX2 f32 (V=8): true SIMD for sizes 2, 4, and 8 via notw_{size}_v8_avx2_f32_soa.
  • All other combos: sequential scalar fallback over AoS layout.

§Data layouts

§AoS (Array-of-Structs) — outer function signature

For V transforms of size N:

data[element_idx * v * 2 + transform_idx * 2 + 0]  = re of x[element_idx] for transform transform_idx
data[element_idx * v * 2 + transform_idx * 2 + 1]  = im of x[element_idx] for transform transform_idx

§SoA (Struct-of-Arrays) — inner SIMD function signature

For V transforms of size N (only used internally by SIMD paths):

re_in[element_idx * v + transform_idx] = real  part of x[element_idx] for transform transform_idx
im_in[element_idx * v + transform_idx] = imag  part of x[element_idx] for transform transform_idx

The SIMD functions operate natively in SoA. The outer AoS function optionally calls the inner SoA function (when ISA + precision match a SIMD path), otherwise falls back to the sequential scalar loop.

§Generated function signatures

Outer (AoS, called by users):

pub unsafe fn notw_4_v8_avx2_f32(
    input: *const f32, output: *mut f32,
    istride: usize, ostride: usize, count: usize,
)

Inner SoA SIMD helpers (emitted alongside, for direct use or testing):

pub unsafe fn notw_4_v8_avx2_f32_soa(
    re_in: *const f32, im_in: *const f32,
    re_out: *mut f32, im_out: *mut f32,
)

Structs§

MultiTransformConfig
Configuration for a vectorized multi-transform codelet.

Enums§

Precision
Floating-point precision for a multi-transform codelet.
SimdIsa
Target ISA for a multi-transform codelet.

Functions§

generate_from_macro
Entry point for the gen_multi_transform_codelet! proc-macro.
generate_multi_transform
Generate a multi-transform codelet TokenStream.