docs.rs failed to build gpufft-0.1.2
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

gpufft

Unified GPU-accelerated FFT for Rust, backed by VkFFT on Vulkan and cuFFT on CUDA.

Why

The Rust ecosystem has good CPU FFT libraries (rustfft, ndrustfft) and solid GPU compute frameworks (wgpu, cubecl, cudarc), but no ergonomic cross-vendor GPU FFT. gpufft fills that gap with a single trait surface that works the same on NVIDIA, AMD, and Intel GPUs.

Vulkan backend wraps VkFFT via a thin FFI shim (gpufft-vulkan-sys), on pure ash with no wgpu dependency.
CUDA backend wraps cuFFT via bindgen on the system CUDA Toolkit (gpufft-cuda-sys).
Backends are selected by Cargo feature flags. Buffers and plans are typed at the backend plus scalar level, so mixing a Vulkan buffer with a CUDA plan, or an f32 plan with a Complex64 buffer, is a compile error.
One plan-creation method per transform kind: plan_c2c, plan_r2c, plan_c2r. f32 (Complex32 / f32) and f64 (Complex64 / f64) at 1D, 2D, and 3D.

Installation

[dependencies]
gpufft = { version = "0.1", features = ["vulkan"] }
# or
gpufft = { version = "0.1", features = ["cuda"] }
# or both
gpufft = { version = "0.1", features = ["vulkan", "cuda"] }

System prerequisites

Vulkan backend

Fedora: sudo dnf install vulkan-headers vulkan-loader-devel glslang-devel spirv-tools-devel
Debian/Ubuntu: install the LunarG Vulkan SDK.

CUDA backend: CUDA Toolkit 12.x or later on the build host (the bindgen pass needs cufft.h + cuda_runtime.h). Runtime needs a matching NVIDIA driver. CUDA_PATH / CUDA_HOME override the default /usr/local/cuda lookup.

Usage

Each device exposes three plan types:

Method	Use
`plan_c2c::<T>()`	complex-to-complex, in-place
`plan_r2c::<F>()`	real-to-complex, forward only
`plan_c2r::<F>()`	complex-to-real, inverse only

use gpufft::{
    vulkan::VulkanBackend, BufferOps, C2cPlanOps, Device, Direction, PlanDesc, Shape,
};
use num_complex::Complex32;

let device = VulkanBackend::new_device(Default::default())?;
let mut buffer = device.alloc::<Complex32>(1024)?;
buffer.write(&host_data)?;

let mut plan = device.plan_c2c::<Complex32>(&PlanDesc {
    shape: Shape::D1(1024),
    batch: 1,
    normalize: false,
})?;
plan.execute(&mut buffer, Direction::Forward)?;

let mut host_out = vec![Complex32::default(); 1024];
buffer.read(&mut host_out)?;

R2C / C2R use Hermitian-symmetric half-spectra on the last (contiguous) dimension, matching cuFFT and VkFFT conventions. For Shape::D3([nx, ny, nz]) the complex side has nx * ny * (nz / 2 + 1) elements.

Backend-generic code composes over any B: Backend:

use gpufft::{Backend, C2cPlanOps, Direction};

fn forward_c2c<B: Backend>(
    plan: &mut B::C2cPlan<num_complex::Complex32>,
    buf: &mut B::Buffer<num_complex::Complex32>,
) -> Result<(), B::Error> {
    plan.execute(buf, Direction::Forward)
}

Performance

3D R2C+C2R pair, f32, 10 iterations after 3 warmup. NVIDIA RTX 5060 Laptop GPU, Vulkan 1.4, VkFFT 1.3.4, CUDA 13.0 / driver 580:

Shape	cuFFT	VkFFT (gpufft)	ratio
32³	25 µs	86 µs	3.4×
64³	36 µs	105 µs	2.9×
128³	117 µs	386 µs	3.3×
256³	2.5 ms	5.28 ms	2.1×

cuFFT is NVIDIA's vendor-tuned path; VkFFT is the cross-vendor fallback and is the only option on AMD / Intel GPUs. The Vulkan backend uses a compute-shader padder to collapse the innermost-axis stride handling for R2C / C2R into a single dispatch.

Design notes

C2C is truly zero-copy: the user's buffer is passed through VkFFTLaunchParams.buffer at dispatch time.
R2C / C2R use a compute-shader padder to align the real innermost axis to VkFFT's 2 * (n/2 + 1) stride.
Plan lifetimes: VkFFTConfiguration retains raw pointers to its handle fields for the application's lifetime, so all handle storage lives inside a boxed Inner struct with a stable heap address.

Non-goals (v0.1)

Cross-backend buffer sharing (external-memory interop is a later concern).
General GPU compute framework; use wgpu / cubecl for that.
Real-to-real transforms (DCT / DST), 4D+ shapes, non-power-of-two auto-tuning.

Crate layout

Crate	Purpose
`gpufft`	Public API, trait surface, backend modules
`gpufft-vulkan-sys`	FFI bindings to vendored VkFFT
`gpufft-cuda-sys`	FFI bindings to cuFFT (system CUDA Toolkit)

Repository: https://github.com/alejandro-soto-franco/gpufft

License

Apache-2.0.

gpufft 0.1.2