Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
gpufft
Unified GPU-accelerated FFT for Rust, backed by VkFFT on Vulkan and cuFFT on CUDA.
Why
The Rust ecosystem has good CPU FFT libraries (rustfft, ndrustfft) and solid
GPU compute frameworks (wgpu, cubecl, cudarc), but no ergonomic
cross-vendor GPU FFT. gpufft fills that gap with a single trait surface that
works the same on NVIDIA, AMD, and Intel GPUs.
- Vulkan backend wraps VkFFT via a thin
FFI shim (
gpufft-vulkan-sys), on pureashwith nowgpudependency. - CUDA backend wraps cuFFT via
bindgen on the system CUDA Toolkit (
gpufft-cuda-sys). - Backends are selected by Cargo feature flags. Buffers and plans are typed
at the backend plus scalar level, so mixing a Vulkan buffer with a CUDA
plan, or an
f32plan with aComplex64buffer, is a compile error. - One plan-creation method per transform kind:
plan_c2c,plan_r2c,plan_c2r. f32 (Complex32/f32) and f64 (Complex64/f64) at 1D, 2D, and 3D.
Installation
[]
= { = "0.1", = ["vulkan"] }
# or
= { = "0.1", = ["cuda"] }
# or both
= { = "0.1", = ["vulkan", "cuda"] }
System prerequisites
Vulkan backend
- Fedora:
sudo dnf install vulkan-headers vulkan-loader-devel glslang-devel spirv-tools-devel - Debian/Ubuntu: install the LunarG Vulkan SDK.
CUDA backend: CUDA Toolkit 12.x or later on the build host (the bindgen
pass needs cufft.h + cuda_runtime.h). Runtime needs a matching NVIDIA
driver. CUDA_PATH / CUDA_HOME override the default /usr/local/cuda lookup.
Usage
Each device exposes three plan types:
| Method | Use |
|---|---|
plan_c2c::<T>() |
complex-to-complex, in-place |
plan_r2c::<F>() |
real-to-complex, forward only |
plan_c2r::<F>() |
complex-to-real, inverse only |
use ;
use Complex32;
let device = new_device?;
let mut buffer = device.?;
buffer.write?;
let mut plan = device.?;
plan.execute?;
let mut host_out = vec!;
buffer.read?;
R2C / C2R use Hermitian-symmetric half-spectra on the last (contiguous)
dimension, matching cuFFT and VkFFT conventions. For Shape::D3([nx, ny, nz])
the complex side has nx * ny * (nz / 2 + 1) elements.
Backend-generic code composes over any B: Backend:
use ;
Performance
3D R2C+C2R pair, f32, 10 iterations after 3 warmup. NVIDIA RTX 5060 Laptop GPU, Vulkan 1.4, VkFFT 1.3.4, CUDA 13.0 / driver 580:
| Shape | cuFFT | VkFFT (gpufft) | ratio |
|---|---|---|---|
| 32³ | 25 µs | 86 µs | 3.4× |
| 64³ | 36 µs | 105 µs | 2.9× |
| 128³ | 117 µs | 386 µs | 3.3× |
| 256³ | 2.5 ms | 5.28 ms | 2.1× |
cuFFT is NVIDIA's vendor-tuned path; VkFFT is the cross-vendor fallback and is the only option on AMD / Intel GPUs. The Vulkan backend uses a compute-shader padder to collapse the innermost-axis stride handling for R2C / C2R into a single dispatch.
Design notes
- C2C is truly zero-copy: the user's buffer is passed through
VkFFTLaunchParams.bufferat dispatch time. - R2C / C2R use a compute-shader padder to align the real innermost
axis to VkFFT's
2 * (n/2 + 1)stride. - Plan lifetimes:
VkFFTConfigurationretains raw pointers to its handle fields for the application's lifetime, so all handle storage lives inside a boxedInnerstruct with a stable heap address.
Non-goals (v0.1)
- Cross-backend buffer sharing (external-memory interop is a later concern).
- General GPU compute framework; use
wgpu/cubeclfor that. - Real-to-real transforms (DCT / DST), 4D+ shapes, non-power-of-two auto-tuning.
Crate layout
| Crate | Purpose |
|---|---|
gpufft |
Public API, trait surface, backend modules |
gpufft-vulkan-sys |
FFI bindings to vendored VkFFT |
gpufft-cuda-sys |
FFI bindings to cuFFT (system CUDA Toolkit) |
Repository: https://github.com/alejandro-soto-franco/gpufft
License
Apache-2.0.