Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
gpufft
Unified GPU-accelerated FFT for Rust, backed by VkFFT on Vulkan and cuFFT on CUDA.
Why
The Rust ecosystem has good CPU FFT libraries (rustfft, ndrustfft) and solid
GPU compute frameworks (wgpu, cubecl, cudarc), but no ergonomic
cross-vendor GPU FFT. gpufft fills that gap with a single trait surface that
works the same on NVIDIA, AMD, and Intel GPUs.
- Vulkan backend wraps VkFFT via a thin
FFI shim (
gpufft-vulkan-sys), on pureashwith nowgpudependency. - CUDA backend wraps cuFFT via
bindgen on the system CUDA Toolkit (
gpufft-cuda-sys). - Backends are selected by Cargo feature flags. Buffers and plans are typed
at the backend plus scalar level, so mixing a Vulkan buffer with a CUDA
plan, or an
f32plan with aComplex64buffer, is a compile error. - One plan-creation method per transform kind:
plan_c2c,plan_r2c,plan_c2r. f32 (Complex32/f32) and f64 (Complex64/f64) at 1D, 2D, and 3D.
Installation
[]
= { = "0.1", = ["vulkan"] }
# or
= { = "0.1", = ["cuda"] }
# or both
= { = "0.1", = ["vulkan", "cuda"] }
System prerequisites
Vulkan backend
- Fedora:
sudo dnf install vulkan-headers vulkan-loader-devel glslang-devel spirv-tools-devel - Debian/Ubuntu: install the LunarG Vulkan SDK.
CUDA backend: CUDA Toolkit 12.x or later on the build host (the bindgen
pass needs cufft.h + cuda_runtime.h). Runtime needs a matching NVIDIA
driver. CUDA_PATH / CUDA_HOME override the default /usr/local/cuda lookup.
Usage
Each device exposes three plan types:
| Method | Use |
|---|---|
plan_c2c::<T>() |
complex-to-complex, in-place |
plan_r2c::<F>() |
real-to-complex, forward only |
plan_c2r::<F>() |
complex-to-real, inverse only |
use ;
use Complex32;
let device = new_device?;
let mut buffer = device.?;
buffer.write?;
let mut plan = device.?;
plan.execute?;
let mut host_out = vec!;
buffer.read?;
R2C / C2R use Hermitian-symmetric half-spectra on the last (contiguous)
dimension, matching cuFFT and VkFFT conventions. For Shape::D3([nx, ny, nz])
the complex side has nx * ny * (nz / 2 + 1) elements.
Backend-generic code composes over any B: Backend:
use ;
Performance
3D R2C+C2R pair, f32, 10 iterations after 3 warmup. NVIDIA RTX 5060 Laptop GPU, Vulkan 1.4, VkFFT 1.3.4, CUDA 13.0 / driver 580:
| Shape | cuFFT | VkFFT (gpufft) | ratio |
|---|---|---|---|
| 32³ | 25 µs | 86 µs | 3.4× |
| 64³ | 36 µs | 105 µs | 2.9× |
| 128³ | 117 µs | 386 µs | 3.3× |
| 256³ | 2.5 ms | 5.28 ms | 2.1× |
cuFFT is NVIDIA's vendor-tuned path; VkFFT is the cross-vendor fallback and is the only option on AMD / Intel GPUs. The Vulkan backend uses a compute-shader padder to collapse the innermost-axis stride handling for R2C / C2R into a single dispatch.
Design notes
- C2C is truly zero-copy: the user's buffer is passed through
VkFFTLaunchParams.bufferat dispatch time. - R2C / C2R use a compute-shader padder to align the real innermost
axis to VkFFT's
2 * (n/2 + 1)stride. - Plan lifetimes:
VkFFTConfigurationretains raw pointers to its handle fields for the application's lifetime, so all handle storage lives inside a boxedInnerstruct with a stable heap address.
Shared memory
On Linux with both vulkan and cuda features, the shared feature gates
zero-copy interop between the two backends via VK_KHR_external_memory_fd
- CUDA's
cudaImportExternalMemory. The same physical allocation is addressable from both VkFFT and cuFFT plans — no host roundtrip, no staging buffer.
[]
= { = "0.1", = ["shared"] }
use ;
let vk_dev = new_device?;
let cu_dev = new_device?;
let buf = new?;
let mut vk_plan = vk_dev.?;
let mut cu_plan = cu_dev.?;
vk_plan.execute_shared?;
cu_plan.execute_shared?;
Constraints:
- Linux only (relies on FD-based external memory; Win32 / D3D handle types are not yet wired).
- The caller must construct both backends on the same physical GPU. No UUID gate is enforced in v0.1 — cross-GPU import will either fail at driver level or silently produce a non-shared mapping.
- C2C only in v0.1. R2C / C2R
execute_sharedvariants are a follow-up. - Normalization is NOT applied on the CUDA path (the runtime backend
rejects
PlanDesc::normalize = true). For a forward-then-inverse pair through CUDA, divide by N on the host or run the inverse on Vulkan.
Non-goals (v0.1)
- Cross-backend buffer sharing (external-memory interop is a later concern).
- General GPU compute framework; use
wgpu/cubeclfor that. - Real-to-real transforms (DCT / DST), 4D+ shapes, non-power-of-two auto-tuning.
Crate layout
| Crate | Purpose |
|---|---|
gpufft |
Public API, trait surface, backend modules |
gpufft-vulkan-sys |
FFI bindings to vendored VkFFT |
gpufft-cuda-sys |
FFI bindings to cuFFT (system CUDA Toolkit) |
Repository: https://github.com/alejandro-soto-franco/gpufft
License
Apache-2.0.