pub struct Kernel { /* private fields */ }Expand description
A launchable GPU kernel with module lifetime management.
Holds an Arc<Module> to ensure the PTX module remains loaded
as long as any Kernel references it. This is important because
Function handles become invalid once their parent module is
unloaded.
§Creating a kernel
let module = Arc::new(Module::from_ptx(ptx)?);
let kernel = Kernel::from_module(module, "my_kernel")?;
println!("loaded kernel: {}", kernel.name());§Launching
let stream = Stream::new(&ctx)?;
let params = LaunchParams::new(4u32, 256u32);
kernel.launch(¶ms, &stream, &(42u32, 1024u32))?;Implementations§
Source§impl Kernel
impl Kernel
Sourcepub fn from_module(module: Arc<Module>, name: &str) -> CudaResult<Self>
pub fn from_module(module: Arc<Module>, name: &str) -> CudaResult<Self>
Creates a new Kernel from a module and function name.
Looks up the named function in the module. The Arc<Module> ensures
the module is not unloaded while this kernel exists.
§Errors
Returns CudaError::NotFound if no
function with the given name exists in the module, or another
CudaError on driver failure.
Sourcepub fn launch<A: KernelArgs>(
&self,
params: &LaunchParams,
stream: &Stream,
args: &A,
) -> CudaResult<()>
pub fn launch<A: KernelArgs>( &self, params: &LaunchParams, stream: &Stream, args: &A, ) -> CudaResult<()>
Launches the kernel with the given parameters and arguments on a stream.
This is the primary entry point for kernel execution. It calls
cuLaunchKernel with the specified grid/block dimensions, shared
memory, stream, and kernel arguments.
The launch is asynchronous — it returns immediately and the kernel
executes on the GPU. Use Stream::synchronize to wait for completion.
§Type safety
The args parameter accepts any type implementing KernelArgs,
including tuples of Copy types up to 24 elements. The caller is
responsible for ensuring the argument types match the kernel signature.
§Errors
Returns a CudaError if the launch fails
(e.g., invalid dimensions, insufficient resources, driver error).
Sourcepub fn function(&self) -> &Function
pub fn function(&self) -> &Function
Returns a reference to the underlying Function handle.
This can be used for occupancy queries and other function-level
operations provided by oxicuda-driver.
Sourcepub fn max_active_blocks_per_sm(
&self,
block_size: i32,
dynamic_smem: usize,
) -> CudaResult<i32>
pub fn max_active_blocks_per_sm( &self, block_size: i32, dynamic_smem: usize, ) -> CudaResult<i32>
Returns the maximum number of active blocks per streaming multiprocessor for a given block size and dynamic shared memory.
Delegates to Function::max_active_blocks_per_sm.
§Parameters
block_size— number of threads per block.dynamic_smem— dynamic shared memory per block in bytes.
§Errors
Returns a CudaError if the query fails.
Sourcepub fn optimal_block_size(&self, dynamic_smem: usize) -> CudaResult<(i32, i32)>
pub fn optimal_block_size(&self, dynamic_smem: usize) -> CudaResult<(i32, i32)>
Returns the optimal block size for this kernel and the minimum grid size to achieve maximum occupancy.
Delegates to Function::optimal_block_size.
Returns (min_grid_size, optimal_block_size).
§Parameters
dynamic_smem— dynamic shared memory per block in bytes.
§Errors
Returns a CudaError if the query fails.