gpufft 0.1.2

Unified GPU-accelerated FFT for Rust: Vulkan via VkFFT, CUDA via cuFFT.
docs.rs failed to build gpufft-0.1.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

gpufft

Unified GPU-accelerated FFT for Rust, backed by VkFFT on Vulkan and cuFFT on CUDA.

crates.io docs.rs License: Apache-2.0

Why

The Rust ecosystem has good CPU FFT libraries (rustfft, ndrustfft) and solid GPU compute frameworks (wgpu, cubecl, cudarc), but no ergonomic cross-vendor GPU FFT. gpufft fills that gap with a single trait surface that works the same on NVIDIA, AMD, and Intel GPUs.

  • Vulkan backend wraps VkFFT via a thin FFI shim (gpufft-vulkan-sys), on pure ash with no wgpu dependency.
  • CUDA backend wraps cuFFT via bindgen on the system CUDA Toolkit (gpufft-cuda-sys).
  • Backends are selected by Cargo feature flags. Buffers and plans are typed at the backend plus scalar level, so mixing a Vulkan buffer with a CUDA plan, or an f32 plan with a Complex64 buffer, is a compile error.
  • One plan-creation method per transform kind: plan_c2c, plan_r2c, plan_c2r. f32 (Complex32 / f32) and f64 (Complex64 / f64) at 1D, 2D, and 3D.

Installation

[dependencies]
gpufft = { version = "0.1", features = ["vulkan"] }
# or
gpufft = { version = "0.1", features = ["cuda"] }
# or both
gpufft = { version = "0.1", features = ["vulkan", "cuda"] }

System prerequisites

Vulkan backend

  • Fedora: sudo dnf install vulkan-headers vulkan-loader-devel glslang-devel spirv-tools-devel
  • Debian/Ubuntu: install the LunarG Vulkan SDK.

CUDA backend: CUDA Toolkit 12.x or later on the build host (the bindgen pass needs cufft.h + cuda_runtime.h). Runtime needs a matching NVIDIA driver. CUDA_PATH / CUDA_HOME override the default /usr/local/cuda lookup.

Usage

Each device exposes three plan types:

Method Use
plan_c2c::<T>() complex-to-complex, in-place
plan_r2c::<F>() real-to-complex, forward only
plan_c2r::<F>() complex-to-real, inverse only
use gpufft::{
    vulkan::VulkanBackend, BufferOps, C2cPlanOps, Device, Direction, PlanDesc, Shape,
};
use num_complex::Complex32;

let device = VulkanBackend::new_device(Default::default())?;
let mut buffer = device.alloc::<Complex32>(1024)?;
buffer.write(&host_data)?;

let mut plan = device.plan_c2c::<Complex32>(&PlanDesc {
    shape: Shape::D1(1024),
    batch: 1,
    normalize: false,
})?;
plan.execute(&mut buffer, Direction::Forward)?;

let mut host_out = vec![Complex32::default(); 1024];
buffer.read(&mut host_out)?;

R2C / C2R use Hermitian-symmetric half-spectra on the last (contiguous) dimension, matching cuFFT and VkFFT conventions. For Shape::D3([nx, ny, nz]) the complex side has nx * ny * (nz / 2 + 1) elements.

Backend-generic code composes over any B: Backend:

use gpufft::{Backend, C2cPlanOps, Direction};

fn forward_c2c<B: Backend>(
    plan: &mut B::C2cPlan<num_complex::Complex32>,
    buf: &mut B::Buffer<num_complex::Complex32>,
) -> Result<(), B::Error> {
    plan.execute(buf, Direction::Forward)
}

Performance

3D R2C+C2R pair, f32, 10 iterations after 3 warmup. NVIDIA RTX 5060 Laptop GPU, Vulkan 1.4, VkFFT 1.3.4, CUDA 13.0 / driver 580:

Shape cuFFT VkFFT (gpufft) ratio
32³ 25 µs 86 µs 3.4×
64³ 36 µs 105 µs 2.9×
128³ 117 µs 386 µs 3.3×
256³ 2.5 ms 5.28 ms 2.1×

cuFFT is NVIDIA's vendor-tuned path; VkFFT is the cross-vendor fallback and is the only option on AMD / Intel GPUs. The Vulkan backend uses a compute-shader padder to collapse the innermost-axis stride handling for R2C / C2R into a single dispatch.

Design notes

  • C2C is truly zero-copy: the user's buffer is passed through VkFFTLaunchParams.buffer at dispatch time.
  • R2C / C2R use a compute-shader padder to align the real innermost axis to VkFFT's 2 * (n/2 + 1) stride.
  • Plan lifetimes: VkFFTConfiguration retains raw pointers to its handle fields for the application's lifetime, so all handle storage lives inside a boxed Inner struct with a stable heap address.

Non-goals (v0.1)

  • Cross-backend buffer sharing (external-memory interop is a later concern).
  • General GPU compute framework; use wgpu / cubecl for that.
  • Real-to-real transforms (DCT / DST), 4D+ shapes, non-power-of-two auto-tuning.

Crate layout

Crate Purpose
gpufft Public API, trait surface, backend modules
gpufft-vulkan-sys FFI bindings to vendored VkFFT
gpufft-cuda-sys FFI bindings to cuFFT (system CUDA Toolkit)

Repository: https://github.com/alejandro-soto-franco/gpufft

License

Apache-2.0.