Trait cudarc::driver::safe::LaunchAsync

source ·
pub unsafe trait LaunchAsync<Params> {
    // Required methods
    unsafe fn launch(
        self,
        cfg: LaunchConfig,
        params: Params
    ) -> Result<(), DriverError>;
    unsafe fn launch_on_stream(
        self,
        stream: &CudaStream,
        cfg: LaunchConfig,
        params: Params
    ) -> Result<(), DriverError>;
}
Expand description

Consumes a CudaFunction to execute asychronously on the device with params determined by generic parameter Params.

This is impl’d multiple times for different number and types of params. In general, Params should impl DeviceRepr.

let my_kernel: CudaFunction = dev.get_func("my_module", "my_kernel").unwrap();
let cfg: LaunchConfig = LaunchConfig {
    grid_dim: (1, 1, 1),
    block_dim: (1, 1, 1),
    shared_mem_bytes: 0,
};
let params = (1i32, 2u64, 3usize);
unsafe { my_kernel.launch(cfg, params) }.unwrap();

§Safety

This is not safe really ever, because there’s no garuntee that Params will work for any CudaFunction passed in. Great care should be taken to ensure that CudaFunction works with Params and that the correct parameters have &mut in front of them.

Additionally, kernels can mutate data that is marked as immutable, such as &CudaSlice<T>.

See LaunchAsync::launch for more details

Required Methods§

source

unsafe fn launch( self, cfg: LaunchConfig, params: Params ) -> Result<(), DriverError>

Launches the CudaFunction with the corresponding Params.

§Safety

This method is very unsafe.

See cuda documentation notes on this as well: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

  1. params can be changed regardless of & or &mut usage.
  2. params will be changed at some later point after the function returns because the kernel is executed async.
  3. There are no guaruntees that the params are the correct number/types/order for func.
  4. Specifying the wrong values for LaunchConfig can result in accessing/modifying values past memory limits.
§Asynchronous mutation

Since this library queues kernels to be launched on a single stream, and really the only way to modify crate::driver::CudaSlice is through kernels, mutating the same crate::driver::CudaSlice with multiple kernels is safe. This is because each kernel is executed sequentially on the stream.

Modifying a value on the host that is in used by a kernel is undefined behavior. But is hard to do accidentally.

Also for this reason, do not pass in any values to kernels that can be modified on the host. This is the reason DeviceRepr is not implemented for rust primitive references.

§Use after free

Since the drop implementation for crate::driver::CudaSlice also occurs on the device’s single stream, any kernels launched before the drop will complete before the value is actually freed.

If you launch a kernel or drop a value on a different stream this may not hold

source

unsafe fn launch_on_stream( self, stream: &CudaStream, cfg: LaunchConfig, params: Params ) -> Result<(), DriverError>

Launch the function on a stream concurrent to the device’s default work stream.

§Safety

This method is even more unsafe than LaunchAsync::launch, all the same rules apply, except now things are executing in parallel to each other.

That means that if any of the kernels modify the same memory location, you’ll get race conditions or potentially undefined behavior.

Implementors§

source§

impl LaunchAsync<&mut Vec<*mut c_void>> for CudaFunction

source§

impl LaunchAsync<&mut [*mut c_void]> for CudaFunction

source§

impl<A: DeviceRepr> LaunchAsync<(A,)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr> LaunchAsync<(A, B)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr> LaunchAsync<(A, B, C)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr> LaunchAsync<(A, B, C, D)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr> LaunchAsync<(A, B, C, D, E)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr> LaunchAsync<(A, B, C, D, E, F)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr, G: DeviceRepr> LaunchAsync<(A, B, C, D, E, F, G)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr, G: DeviceRepr, H: DeviceRepr> LaunchAsync<(A, B, C, D, E, F, G, H)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr, G: DeviceRepr, H: DeviceRepr, I: DeviceRepr> LaunchAsync<(A, B, C, D, E, F, G, H, I)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr, G: DeviceRepr, H: DeviceRepr, I: DeviceRepr, J: DeviceRepr> LaunchAsync<(A, B, C, D, E, F, G, H, I, J)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr, G: DeviceRepr, H: DeviceRepr, I: DeviceRepr, J: DeviceRepr, K: DeviceRepr> LaunchAsync<(A, B, C, D, E, F, G, H, I, J, K)> for CudaFunction

source§

impl<A: DeviceRepr, B: DeviceRepr, C: DeviceRepr, D: DeviceRepr, E: DeviceRepr, F: DeviceRepr, G: DeviceRepr, H: DeviceRepr, I: DeviceRepr, J: DeviceRepr, K: DeviceRepr, L: DeviceRepr> LaunchAsync<(A, B, C, D, E, F, G, H, I, J, K, L)> for CudaFunction