ferrotorch-gpu 0.4.5

CUDA GPU backend for ferrotorch
docs.rs failed to build ferrotorch-gpu-0.4.5
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: ferrotorch-gpu-0.4.2

ferrotorch-gpu

CUDA GPU backend for ferrotorch.

What it provides

  • GpuDevice -- CUDA device management and context handling
  • Data transfer -- cpu_to_gpu, gpu_to_cpu, tensor_to_gpu, tensor_to_cpu, alloc_zeros
  • GpuTensor -- GPU-resident tensor wrapper with cuda() and cuda_default() helpers
  • CUDA buffer -- CudaBuffer for raw GPU memory, CudaAllocator for memory management
  • BLAS -- gpu_matmul_f32, gpu_matmul_f64 via cuBLAS, gpu_bmm_* batched
  • cuSOLVER -- gpu_lu_solve, gpu_cholesky, gpu_eigh, gpu_eig, gpu_lstsq via cuSOLVER
  • cuFFT -- gpu_fft_*, gpu_rfft_*, gpu_ifft_* interleaved-complex FFTs
  • Convolution -- gpu_conv2d_f32
  • Element-wise kernels -- gpu_add, gpu_sub, gpu_mul, gpu_neg, gpu_relu, plus 50+ PTX kernels for reductions, scans, masked ops, scatter/gather, strided copies
  • Graph capture -- GpuGraphPool for CUDA-graph stream capture and replay
  • Memory guard -- MemoryGuard, MemoryWatchdog, MemoryStats, OomPolicy for GPU memory management and pressure monitoring

Feature flags

Feature Default Description
cuda yes Links against CUDA driver API via cudarc

Quick start

use ferrotorch_gpu::{GpuDevice, cpu_to_gpu, gpu_to_cpu};

let device = GpuDevice::new(0)?;
let host_data = vec![1.0f32, 2.0, 3.0];
let gpu_buf = cpu_to_gpu(&host_data, &device)?;
let back = gpu_to_cpu(&gpu_buf, &device)?;
assert_eq!(back, host_data);

Part of ferrotorch

This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.

License

MIT OR Apache-2.0