docs.rs failed to build gpufft-0.1.3
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

gpufft

Unified GPU-accelerated FFT for Rust, backed by VkFFT on Vulkan and cuFFT on CUDA.

Why

The Rust ecosystem has good CPU FFT libraries (rustfft, ndrustfft) and solid GPU compute frameworks (wgpu, cubecl, cudarc), but no ergonomic cross-vendor GPU FFT. gpufft fills that gap with a single trait surface that works the same on NVIDIA, AMD, and Intel GPUs.

Vulkan backend wraps VkFFT via a thin FFI shim (gpufft-vulkan-sys), on pure ash with no wgpu dependency.
CUDA backend wraps cuFFT via bindgen on the system CUDA Toolkit (gpufft-cuda-sys).
Backends are selected by Cargo feature flags. Buffers and plans are typed at the backend plus scalar level, so mixing a Vulkan buffer with a CUDA plan, or an f32 plan with a Complex64 buffer, is a compile error.
One plan-creation method per transform kind: plan_c2c, plan_r2c, plan_c2r. f32 (Complex32 / f32) and f64 (Complex64 / f64) at 1D, 2D, and 3D.

Installation

[dependencies]
gpufft = { version = "0.1", features = ["vulkan"] }
# or
gpufft = { version = "0.1", features = ["cuda"] }
# or both
gpufft = { version = "0.1", features = ["vulkan", "cuda"] }

System prerequisites

Vulkan backend

Fedora: sudo dnf install vulkan-headers vulkan-loader-devel glslang-devel spirv-tools-devel
Debian/Ubuntu: install the LunarG Vulkan SDK.

CUDA backend: CUDA Toolkit 12.x or later on the build host (the bindgen pass needs cufft.h + cuda_runtime.h). Runtime needs a matching NVIDIA driver. CUDA_PATH / CUDA_HOME override the default /usr/local/cuda lookup.

Usage

Each device exposes three plan types:

Method	Use
`plan_c2c::<T>()`	complex-to-complex, in-place
`plan_r2c::<F>()`	real-to-complex, forward only
`plan_c2r::<F>()`	complex-to-real, inverse only

use gpufft::{
    vulkan::VulkanBackend, BufferOps, C2cPlanOps, Device, Direction, PlanDesc, Shape,
};
use num_complex::Complex32;

let device = VulkanBackend::new_device(Default::default())?;
let mut buffer = device.alloc::<Complex32>(1024)?;
buffer.write(&host_data)?;

let mut plan = device.plan_c2c::<Complex32>(&PlanDesc {
    shape: Shape::D1(1024),
    batch: 1,
    normalize: false,
})?;
plan.execute(&mut buffer, Direction::Forward)?;

let mut host_out = vec![Complex32::default(); 1024];
buffer.read(&mut host_out)?;

R2C / C2R use Hermitian-symmetric half-spectra on the last (contiguous) dimension, matching cuFFT and VkFFT conventions. For Shape::D3([nx, ny, nz]) the complex side has nx * ny * (nz / 2 + 1) elements.

Backend-generic code composes over any B: Backend:

use gpufft::{Backend, C2cPlanOps, Direction};

fn forward_c2c<B: Backend>(
    plan: &mut B::C2cPlan<num_complex::Complex32>,
    buf: &mut B::Buffer<num_complex::Complex32>,
) -> Result<(), B::Error> {
    plan.execute(buf, Direction::Forward)
}

Performance

3D R2C+C2R pair, f32, 10 iterations after 3 warmup. NVIDIA RTX 5060 Laptop GPU, Vulkan 1.4, VkFFT 1.3.4, CUDA 13.0 / driver 580:

Shape	cuFFT	VkFFT (gpufft)	ratio
32³	25 µs	86 µs	3.4×
64³	36 µs	105 µs	2.9×
128³	117 µs	386 µs	3.3×
256³	2.5 ms	5.28 ms	2.1×

cuFFT is NVIDIA's vendor-tuned path; VkFFT is the cross-vendor fallback and is the only option on AMD / Intel GPUs. The Vulkan backend uses a compute-shader padder to collapse the innermost-axis stride handling for R2C / C2R into a single dispatch.

Design notes

C2C is truly zero-copy: the user's buffer is passed through VkFFTLaunchParams.buffer at dispatch time.
R2C / C2R use a compute-shader padder to align the real innermost axis to VkFFT's 2 * (n/2 + 1) stride.
Plan lifetimes: VkFFTConfiguration retains raw pointers to its handle fields for the application's lifetime, so all handle storage lives inside a boxed Inner struct with a stable heap address.

Shared memory

On Linux with both vulkan and cuda features, the shared feature gates zero-copy interop between the two backends via VK_KHR_external_memory_fd

CUDA's cudaImportExternalMemory. The same physical allocation is addressable from both VkFFT and cuFFT plans — no host roundtrip, no staging buffer.

[dependencies]
gpufft = { version = "0.1", features = ["shared"] }

use gpufft::{
    cuda::CudaBackend,
    shared::SharedFftBuffer,
    vulkan::VulkanBackend,
    Backend, Direction, PlanDesc, Shape,
};

let vk_dev = VulkanBackend::new_device(Default::default())?;
let cu_dev = CudaBackend::new_device(Default::default())?;
let buf = SharedFftBuffer::new(&vk_dev, &cu_dev, 1024)?;

let mut vk_plan = vk_dev.plan_c2c::<num_complex::Complex32>(&PlanDesc {
    shape: Shape::D1(1024), batch: 1, normalize: false,
})?;
let mut cu_plan = cu_dev.plan_c2c::<num_complex::Complex32>(&PlanDesc {
    shape: Shape::D1(1024), batch: 1, normalize: false,
})?;

vk_plan.execute_shared(&buf, Direction::Forward)?;
cu_plan.execute_shared(&buf, Direction::Inverse)?;

Constraints:

Linux only (relies on FD-based external memory; Win32 / D3D handle types are not yet wired).
The caller must construct both backends on the same physical GPU. No UUID gate is enforced in v0.1 — cross-GPU import will either fail at driver level or silently produce a non-shared mapping.
C2C only in v0.1. R2C / C2R execute_shared variants are a follow-up.
Normalization is NOT applied on the CUDA path (the runtime backend rejects PlanDesc::normalize = true). For a forward-then-inverse pair through CUDA, divide by N on the host or run the inverse on Vulkan.

Non-goals (v0.1)

Cross-backend buffer sharing (external-memory interop is a later concern).
General GPU compute framework; use wgpu / cubecl for that.
Real-to-real transforms (DCT / DST), 4D+ shapes, non-power-of-two auto-tuning.

Crate layout

Crate	Purpose
`gpufft`	Public API, trait surface, backend modules
`gpufft-vulkan-sys`	FFI bindings to vendored VkFFT
`gpufft-cuda-sys`	FFI bindings to cuFFT (system CUDA Toolkit)

Repository: https://github.com/alejandro-soto-franco/gpufft

License

Apache-2.0.

gpufft 0.1.3