Skip to main content

Kernel

Struct Kernel 

Source
pub struct Kernel { /* private fields */ }
Expand description

A launchable GPU kernel with module lifetime management.

Holds an Arc<Module> to ensure the PTX module remains loaded as long as any Kernel references it. This is important because Function handles become invalid once their parent module is unloaded.

§Creating a kernel

let module = Arc::new(Module::from_ptx(ptx)?);
let kernel = Kernel::from_module(module, "my_kernel")?;
println!("loaded kernel: {}", kernel.name());

§Launching

let stream = Stream::new(&ctx)?;
let params = LaunchParams::new(4u32, 256u32);
kernel.launch(&params, &stream, &(42u32, 1024u32))?;

Implementations§

Source§

impl Kernel

Source

pub fn from_module(module: Arc<Module>, name: &str) -> CudaResult<Self>

Creates a new Kernel from a module and function name.

Looks up the named function in the module. The Arc<Module> ensures the module is not unloaded while this kernel exists.

§Errors

Returns CudaError::NotFound if no function with the given name exists in the module, or another CudaError on driver failure.

Source

pub fn launch<A: KernelArgs>( &self, params: &LaunchParams, stream: &Stream, args: &A, ) -> CudaResult<()>

Launches the kernel with the given parameters and arguments on a stream.

This is the primary entry point for kernel execution. It calls cuLaunchKernel with the specified grid/block dimensions, shared memory, stream, and kernel arguments.

The launch is asynchronous — it returns immediately and the kernel executes on the GPU. Use Stream::synchronize to wait for completion.

§Type safety

The args parameter accepts any type implementing KernelArgs, including tuples of Copy types up to 24 elements. The caller is responsible for ensuring the argument types match the kernel signature.

§Errors

Returns a CudaError if the launch fails (e.g., invalid dimensions, insufficient resources, driver error).

Source

pub fn name(&self) -> &str

Returns the kernel function name.

Source

pub fn function(&self) -> &Function

Returns a reference to the underlying Function handle.

This can be used for occupancy queries and other function-level operations provided by oxicuda-driver.

Source

pub fn max_active_blocks_per_sm( &self, block_size: i32, dynamic_smem: usize, ) -> CudaResult<i32>

Returns the maximum number of active blocks per streaming multiprocessor for a given block size and dynamic shared memory.

Delegates to Function::max_active_blocks_per_sm.

§Parameters
  • block_size — number of threads per block.
  • dynamic_smem — dynamic shared memory per block in bytes.
§Errors

Returns a CudaError if the query fails.

Source

pub fn optimal_block_size(&self, dynamic_smem: usize) -> CudaResult<(i32, i32)>

Returns the optimal block size for this kernel and the minimum grid size to achieve maximum occupancy.

Delegates to Function::optimal_block_size.

Returns (min_grid_size, optimal_block_size).

§Parameters
  • dynamic_smem — dynamic shared memory per block in bytes.
§Errors

Returns a CudaError if the query fails.

Trait Implementations§

Source§

impl Debug for Kernel

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for Kernel

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more