Crate cudarc

Expand description

Safe CUDA wrappers for:

library	dynamic load	dynamic link	static link
CUDA driver	✅	✅	❌
NVRTC	✅	✅	✅
cuRAND	✅	✅	✅
cuBLAS	✅	✅	✅
cuBLASLt	✅	✅	✅
NCCL	✅	✅	✅
cuDNN	✅	✅	✅
cuSPARSE	✅	✅	✅
cuSOLVER	✅	✅	❌
cuFILE	✅	✅	✅
CUPTI	✅	✅	✅
nvtx	✅	✅	❌

CUDA Versions supported

11.4-11.8
12.0-12.9
13.0

CUDNN versions supported:

9.12.0

NCCL versions supported:

2.28.3

§Configuring CUDA version

Select cuda version with one of:

-F cuda-version-from-build-system: At build time will get the cuda toolkit version using nvcc
- -F fallback-latest: can be used to control behavior if this fails. default is not enabled, which will cause the build script to panic. if -F fallback-latest is enabled, we will use the highest bindings we have.
-F cuda-<major>0<minor>0 to build for a specific version of cuda

§Configuring linking

By default we use -F dynamic-loading, which will not require any libraries to be present at build time.

You can also enable -F dynamic-linking or -F static-linking for your use case.

§crate organization

Each of the modules for the above is organized into three levels:

A safe module which provides safe abstractions over the result module
A result which is a thin wrapper around the sys module to ensure all functions return Result
A sys module which contains the raw FFI bindings

API	Safe	Result	Sys
driver	driver::safe	driver::result	driver::sys
cublas	cublas::safe	cublas::result	cublas::sys
cublaslt	cublaslt::safe	cublaslt::result	cublaslt::sys
nvrtc	nvrtc::safe	nvrtc::result	nvrtc::sys
curand	curand::safe	curand::result	curand::sys
cudnn	cudnn::safe	cudnn::result	cudnn::sys
cusparse	-	cusparse::result	cusparse::sys
cusolver	cusolver::safe	cusolver::result	cusolver::sys
cusolvermg	cusolvermg::safe	cusolvermg::result	cusolvermg::sys
cupti	-	cupti::result	cupti::sys
nvtx	[nvtx::safe]	[nvtx::result]	[nvtx::sys]

§Core Concepts

At the core is the driver API, which exposes a bunch of structs, but the main ones are:

driver::CudaContext is a handle to a specific device ordinal (e.g. 0, 1, 2, …)
driver::CudaStream is how you submit work to a device
driver::CudaSlice<T>, which represents a Vec<T> on the device, can be allocated using the aforementioned driver::CudaStream.

Here is a table of similar concepts between CPU and Cuda:

Concept	CPU	Cuda
Memory allocator	`std::alloc::GlobalAlloc`	`driver::CudaContext`
List of values on heap	`Vec<T>`	`driver::CudaSlice<T>`
Slice	`&[T]`	`driver::CudaView<T>`
Mutable Slice	`&mut [T]`	`driver::CudaViewMut<T>`
Function	`Fn`	`driver::CudaFunction`
Calling a function	`my_function(a, b, c)`	`driver::LaunchArgs::launch()`
Thread	`std::thread::Thread`	`driver::CudaStream`

§Combining the different APIs

All the highest level apis have been designed to work together.

§nvrtc

nvrtc::compile_ptx() outputs a nvrtc::Ptx, which can be loaded into a device with driver::CudaContext::load_module().

§cublas

cublas::CudaBlas can perform gemm operations using cublas::Gemm<T>, and cublas::Gemv<T>. Both of these traits can generically accept memory allocated by the driver in the form of: driver::CudaSlice<T>, driver::CudaView<T>, and driver::CudaViewMut<T>.

§curand

curand::CudaRng can fill a driver::CudaSlice<T> with random data, based on one of its available distributions.

§Combining safe/result/sys

The result and sys levels are very inter-changeable for each API. However, the safe apis don’t necessarily allow you to mix in the result level. This is to encourage going through the safe API when possible.

If you need some functionality that isn’t present in the safe api, please open a ticket.

Modules§

cublas: CudaBlas wraps around the cublas API.
cublaslt: CudaBlasLT wraps around cuBLASLt via:
cudnn
cufile: Cufile wraps around cuFILE via:
cupti: Wrappers around the CUDA Profiling Tools Interface, in two levels: an unsafe low-level API and a (still unsafe) thin wrapper around it.
curand: CudaRng safe bindings around cuRAND.
cusolver
cusolvermg
cusparse
driver: Wrappers around the CUDA driver API, in three levels. See crate documentation for description of each.
nccl: Comm wraps around the NCCL API, via:
nvrtc: Wrappers around the Nvidia Runtime Compilation (nvrtc) API, in three levels. See crate documentation for description of each.
runtime: Wrappers around the CUDA Runtime API, in two levels: an unsafe low-level API and a (still unsafe) thin wrapper around it.
types: Exposes CudaTypeName which maps between rust type names and the corresponding cuda kernel type names.

Macros§

group

Crate cudarc

Crate cudarc Copy item path

§Configuring CUDA version

§Configuring linking

§crate organization

§Core Concepts

§Combining the different APIs

§nvrtc

§cublas

§curand

§Combining safe/result/sys

Modules§

Macros§

Crate cudarc