Expand description
Safe CUDA wrappers for:
| library | dynamic load | dynamic link | static link |
|---|---|---|---|
| CUDA driver | ✅ | ✅ | ❌ |
| NVRTC | ✅ | ✅ | ✅ |
| cuRAND | ✅ | ✅ | ✅ |
| cuBLAS | ✅ | ✅ | ✅ |
| cuBLASLt | ✅ | ✅ | ✅ |
| NCCL | ✅ | ✅ | ✅ |
| cuDNN | ✅ | ✅ | ✅ |
| cuSPARSE | ✅ | ✅ | ✅ |
| cuSOLVER | ✅ | ✅ | ❌ |
| cuFILE | ✅ | ✅ | ✅ |
| CUPTI | ✅ | ✅ | ✅ |
| nvtx | ✅ | ✅ | ❌ |
CUDA Versions supported
- 11.4-11.8
- 12.0-12.9
- 13.0
CUDNN versions supported:
- 9.12.0
NCCL versions supported:
- 2.28.3
§Configuring CUDA version
Select cuda version with one of:
-F cuda-version-from-build-system: At build time will get the cuda toolkit version usingnvcc-F fallback-latest: can be used to control behavior if this fails. default is not enabled, which will cause the build script to panic. if-F fallback-latestis enabled, we will use the highest bindings we have.
-F cuda-<major>0<minor>0to build for a specific version of cuda
§Configuring linking
By default we use -F dynamic-loading, which will not require any libraries to be present at build time.
You can also enable -F dynamic-linking or -F static-linking for your use case.
§crate organization
Each of the modules for the above is organized into three levels:
- A
safemodule which provides safe abstractions over theresultmodule - A
resultwhich is a thin wrapper around thesysmodule to ensure all functions return Result - A
sysmodule which contains the raw FFI bindings
| API | Safe | Result | Sys |
|---|---|---|---|
| driver | driver::safe | driver::result | driver::sys |
| cublas | cublas::safe | cublas::result | cublas::sys |
| cublaslt | cublaslt::safe | cublaslt::result | cublaslt::sys |
| nvrtc | nvrtc::safe | nvrtc::result | nvrtc::sys |
| curand | curand::safe | curand::result | curand::sys |
| cudnn | cudnn::safe | cudnn::result | cudnn::sys |
| cusparse | - | cusparse::result | cusparse::sys |
| cusolver | cusolver::safe | cusolver::result | cusolver::sys |
| cusolvermg | cusolvermg::safe | cusolvermg::result | cusolvermg::sys |
| cupti | - | cupti::result | cupti::sys |
| nvtx | [nvtx::safe] | [nvtx::result] | [nvtx::sys] |
§Core Concepts
At the core is the driver API, which exposes a bunch of structs, but the main ones are:
driver::CudaContextis a handle to a specific device ordinal (e.g. 0, 1, 2, …)driver::CudaStreamis how you submit work to a devicedriver::CudaSlice<T>, which represents aVec<T>on the device, can be allocated using the aforementioneddriver::CudaStream.
Here is a table of similar concepts between CPU and Cuda:
| Concept | CPU | Cuda |
|---|---|---|
| Memory allocator | std::alloc::GlobalAlloc | driver::CudaContext |
| List of values on heap | Vec<T> | driver::CudaSlice<T> |
| Slice | &[T] | driver::CudaView<T> |
| Mutable Slice | &mut [T] | driver::CudaViewMut<T> |
| Function | Fn | driver::CudaFunction |
| Calling a function | my_function(a, b, c) | driver::LaunchArgs::launch() |
| Thread | std::thread::Thread | driver::CudaStream |
§Combining the different APIs
All the highest level apis have been designed to work together.
§nvrtc
nvrtc::compile_ptx() outputs a nvrtc::Ptx, which can
be loaded into a device with driver::CudaContext::load_module().
§cublas
cublas::CudaBlas can perform gemm operations using cublas::Gemm<T>,
and cublas::Gemv<T>. Both of these traits can generically accept memory
allocated by the driver in the form of: driver::CudaSlice<T>,
driver::CudaView<T>, and driver::CudaViewMut<T>.
§curand
curand::CudaRng can fill a driver::CudaSlice<T> with random data, based on
one of its available distributions.
§Combining safe/result/sys
The result and sys levels are very inter-changeable for each API. However, the safe apis don’t necessarily allow you to mix in the result level. This is to encourage going through the safe API when possible.
If you need some functionality that isn’t present in the safe api, please open a ticket.
Modules§
- cublas
- CudaBlas wraps around the cublas API.
- cublaslt
- CudaBlasLT wraps around cuBLASLt via:
- cudnn
- cufile
- Cufile wraps around cuFILE via:
- cupti
- Wrappers around the CUDA Profiling Tools Interface, in two levels: an unsafe low-level API and a (still unsafe) thin wrapper around it.
- curand
- CudaRng safe bindings around cuRAND.
- cusolver
- cusolvermg
- cusparse
- driver
- Wrappers around the CUDA driver API, in three levels. See crate documentation for description of each.
- nccl
- Comm wraps around the NCCL API, via:
- nvrtc
- Wrappers around the Nvidia Runtime Compilation (nvrtc) API, in three levels. See crate documentation for description of each.
- runtime
- Wrappers around the CUDA Runtime API, in two levels: an unsafe low-level API and a (still unsafe) thin wrapper around it.
- types
- Exposes CudaTypeName which maps between rust type names and the corresponding cuda kernel type names.