Safe abstractions over:
Each of the modules for the above is organized into three levels:
safemodule which provides safe abstractions over the
resultwhich is a thin wrapper around the
sysmodule to ensure all functions return Result
sysmodule which contains the raw FFI bindings
At the core is the driver API, which exposes a bunch of structs, but the main ones are:
driver::CudaDeviceis a handle to a specific device ordinal (e.g. 0, 1, 2, …)
driver::CudaSlice<T>, which represents a
Vec<T>on the device, can be allocated using the aforementioned CudaDevice.
Here is a table of similar concepts between CPU and Cuda:
|List of values on heap|
|Calling a function|
Combining the different APIs
All the highest level apis have been designed to work together.
nvrtc::compile_ptx() outputs a
nvrtc::Ptx, which can
be loaded into a device with
cublas::CudaBlas can perform gemm operations using
cublas::Gemv<T>. Both of these traits can generically accept memory
allocated by the driver in the form of:
curand::CudaRng can fill a
driver::CudaSlice<T> with random data, based on
one of its available distributions.
The result and sys levels are very inter-changeable for each API. However, the safe apis don’t necessarily allow you to mix in the result level. This is to encourage going through the safe API when possible.
If you need some functionality that isn’t present in the safe api, please open a ticket.
- Wrappers around the cublas API, in three levels. See crate documentation for description of each.
- Wrappers around the cuRAND API in three levels. See crate documentation for description of each.
- Wrappers around the CUDA driver API, in three levels. See crate documentation for description of each.
- Wrappers around the Nvidia Runtime Compilation (nvrtc) API, in three levels. See crate documentation for description of each.
- Exposes CudaTypeName which maps between rust type names and the corresponding cuda kernel type names.