moe-gpu-dsp
Zero-copy GPU signal processing framework for Rust. Batch cuFFT, CUDA kernel dispatch, and full STFT/IFFT pipelines that stay entirely on GPU memory.
Built on cudarc 0.19.
What it does
Upload a signal once. Run all processing on GPU. Download the result once. No CPU round-trips between stages.
Signal -> GPU upload -> Hann window -> Batch FFT R2C
-> Magnitude -> Processing kernels -> Soft mask
-> Batch FFT C2R -> Overlap-add -> GPU download -> Output
Kernels included
| Kernel | Purpose |
|---|---|
window_frames |
Hann windowing with hop-based frame extraction |
magnitude |
Complex to float magnitude with transpose |
median_filter |
2D median filter (horizontal or vertical) |
soft_mask |
a2 / (a2 + b^2 + eps) applied to complex data |
overlap_add |
ISTFT reconstruction via atomicAdd |
All kernels are compiled once at init via NVRTC (typically < 0.1s).
Usage
[]
= { = "0.1", = ["cuda"] }
use ;
// Init GPU + compile kernels
let dsp = new.unwrap;
// Upload + window
let = dsp.window_frames;
// Batch FFT (all frames at once)
let complex = dsp.batch_fft_r2c;
// Magnitude (transposed for frequency-axis processing)
let mag = dsp.magnitude;
let complex_floats = complex_as_floats;
// Median filter (horizontal = time axis, vertical = frequency axis)
let h_filtered = dsp.median_filter;
let v_filtered = dsp.median_filter;
// Soft mask + IFFT + overlap-add
let mut masked = dsp.soft_mask;
let output = dsp.batch_ifft_c2r_ola;
Setup
1. Install CUDA toolkit
Linux / WSL:
# Ubuntu/Debian
# Verify
Windows: Download from NVIDIA CUDA Toolkit and install.
2. Set environment variables
The build needs to find your CUDA installation:
If your CUDA toolkit version differs from your driver version (common on WSL), pin it:
# Check driver version: nvidia-smi
# Set toolkit version to match driver (e.g. driver 13.1 = 13010)
3. Add to your project
[]
= { = "0.1", = ["cuda"] }
4. Build
5. Run
At runtime, the CUDA libraries must be on LD_LIBRARY_PATH:
Configure GPU architecture
Default target is sm_86 (RTX 3070/3080/3090). Change it in DspConfig:
let config = DspConfig ;
Common architectures:
| GPU | Architecture |
|---|---|
| RTX 2070/2080 | sm_75 |
| RTX 3070/3080/3090 | sm_86 |
| RTX 4070/4080/4090 | sm_89 |
| A100 | sm_80 |
| H100 | sm_90 |
WSL-specific notes
- CUDA toolkit goes in WSL. Do NOT install Linux GPU drivers in WSL (the Windows driver bridges automatically).
- If you get
CUDA_ERROR_UNSUPPORTED_PTX_VERSION, your toolkit version is newer than your driver. Install an older toolkit or update the Windows NVIDIA driver. - cuFFT libraries are in
/usr/local/cuda/lib64/. Make sureLD_LIBRARY_PATHincludes this at runtime.
Graceful fallback
Without the cuda feature, GpuDsp::new() returns None. You can fall back to CPU:
let dsp = new;
if dsp.is_none
Performance
On RTX 3070, processing a 3-minute audio signal (7.4M samples, 14,562 STFT frames):
- NVRTC kernel compilation: 0.03s (cached after first run)
- Full pipeline (window + FFT + magnitude + 2x median filter + soft mask + IFFT + overlap-add): 99ms
License
MIT OR Apache-2.0