pub fn fft<R: Runtime>(
device: &R::Device,
input: &[f32],
) -> (Vec<f32>, Vec<f32>)Expand description
Computes the Cooley-Tukey radix-2 DIT FFT of input.
If input.len() is not a power of two the signal is zero-padded to the
next power of two. Both returned vectors have length input.len().next_power_of_two().
§Launch strategy
All stages where half_stride < TILE_SIZE / 2 are fused into a single
butterfly_inner dispatch using workgroup shared memory — eliminating the
per-stage kernel-launch overhead that dominates small-N performance.
The remaining outer stages use butterfly_stage_radix4 (two radix-2 stages
per dispatch) where possible, falling back to butterfly_stage for a single
trailing stage when the outer-stage count is odd.
| N | Inner | Outer | Total |
|---|---|---|---|
| ≤ 1 024 | 1 | 0 | 1 |
| 4 096 | 1 | 1 (r4) | 2 |
| 65 536 | 1 | 3 (r4) | 4 |
§Example
ⓘ
use cubecl::wgpu::WgpuRuntime;
use gpu_fft::fft::fft;
let (real, imag) = fft::<WgpuRuntime>(&Default::default(), &[1.0f32, 0.0, 0.0, 0.0]);