Skip to main content

Kernel

oxicuda_launch::kernel

Struct Kernel

pub struct Kernel { /* private fields */ }

Expand description

A launchable GPU kernel with module lifetime management.

Holds an Arc<Module> to ensure the PTX module remains loaded as long as any Kernel references it. This is important because Function handles become invalid once their parent module is unloaded.

§Creating a kernel

let module = Arc::new(Module::from_ptx(ptx)?);
let kernel = Kernel::from_module(module, "my_kernel")?;
println!("loaded kernel: {}", kernel.name());

§Launching

let stream = Stream::new(&ctx)?;
let params = LaunchParams::new(4u32, 256u32);
kernel.launch(&params, &stream, &(42u32, 1024u32))?;

Implementations§

impl Kernel

pub fn from_module(module: Arc<Module>, name: &str) -> CudaResult<Self>

Creates a new Kernel from a module and function name.

Looks up the named function in the module. The Arc<Module> ensures the module is not unloaded while this kernel exists.

§Errors

Returns CudaError::NotFound if no function with the given name exists in the module, or another CudaError on driver failure.

pub fn launch<A: KernelArgs>( &self, params: &LaunchParams, stream: &Stream, args: &A, ) -> CudaResult<()>

Launches the kernel with the given parameters and arguments on a stream.

This is the primary entry point for kernel execution. It calls cuLaunchKernel with the specified grid/block dimensions, shared memory, stream, and kernel arguments.

The launch is asynchronous — it returns immediately and the kernel executes on the GPU. Use Stream::synchronize to wait for completion.

§Type safety

The args parameter accepts any type implementing KernelArgs, including tuples of Copy types up to 24 elements. The caller is responsible for ensuring the argument types match the kernel signature.

§Errors

Returns a CudaError if the launch fails (e.g., invalid dimensions, insufficient resources, driver error).

pub fn name(&self) -> &str

Returns the kernel function name.

pub fn function(&self) -> &Function

Returns a reference to the underlying Function handle.

This can be used for occupancy queries and other function-level operations provided by oxicuda-driver.

pub fn max_active_blocks_per_sm( &self, block_size: i32, dynamic_smem: usize, ) -> CudaResult<i32>

Returns the maximum number of active blocks per streaming multiprocessor for a given block size and dynamic shared memory.

Delegates to Function::max_active_blocks_per_sm.

§Parameters

block_size — number of threads per block.
dynamic_smem — dynamic shared memory per block in bytes.

§Errors

Returns a CudaError if the query fails.

pub fn optimal_block_size(&self, dynamic_smem: usize) -> CudaResult<(i32, i32)>

Returns the optimal block size for this kernel and the minimum grid size to achieve maximum occupancy.

Delegates to Function::optimal_block_size.

Returns (min_grid_size, optimal_block_size).

§Parameters

dynamic_smem — dynamic shared memory per block in bytes.

§Errors

Returns a CudaError if the query fails.

Trait Implementations§

impl Debug for Kernel

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Display for Kernel

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

impl Freeze for Kernel

impl RefUnwindSafe for Kernel

impl Send for Kernel

impl Sync for Kernel

impl Unpin for Kernel

impl UnsafeUnpin for Kernel

impl UnwindSafe for Kernel

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToString for T
where T: Display + ?Sized,

fn to_string(&self) -> String

Converts the given value to a String. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more