Skip to main content

KernelFunction

Struct KernelFunction 

Source
pub struct KernelFunction<'a> { /* private fields */ }

Implementations§

Source§

impl KernelFunction<'_>

Source

pub const unsafe fn from_raw( handle: DeviceFunction, module: &Module, ) -> KernelFunction<'_>

Source

pub const unsafe fn launch_operation<'kernel, 'config, P>( &'kernel self, config: &'config LaunchConfig, params: P, ) -> KernelLaunchOperation<'kernel, 'config, P>

Creates a stream operation that launches this kernel.

§Safety

If this operation is recorded during stream capture, CUDA copies kernel argument values into the captured graph. For pointer arguments, only the pointer address is copied. The caller must ensure every copied pointer value remains valid for every captured graph execution that can use this operation, and mutable pointer arguments must remain exclusive for the work ordered by those graph launches.

Source

pub fn launch<'a, P>(&self, config: &LaunchConfig, params: P) -> Result<()>
where P: KernelLaunchArgs<'a>,

Invokes this kernel function on a grid of blocks. Each block contains the threads specified by LaunchConfig::block_dim.

LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block.

Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.

Launching the kernel invalidates the persistent function state set through the following deprecated APIs: sys::cuFuncSetBlockShape, sys::cuFuncSetSharedSize, sys::cuParamSetSize, sys::cuParamSeti, sys::cuParamSetf, sys::cuParamSetv.

The kernel must either have been compiled with toolchain version 3.2 or later so that it contains kernel parameter information, or have no kernel parameters. If either of these conditions is not met, the launch returns crate::error::Status::InvalidImage.

§Errors

Returns crate::error::Status::InvalidImage if the kernel parameter metadata requirements above are not met. Also returns an error if the module context cannot be bound, CUDA rejects the launch, or a previous asynchronous launch reports an error.

Source

pub fn launch_on<'a, P>( &self, config: &LaunchConfig, params: P, stream: &Stream, ) -> Result<()>
where P: KernelLaunchArgs<'a>,

Invokes this kernel function on a grid of blocks using the given stream. Each block contains the threads specified by LaunchConfig::block_dim.

LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block.

Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.

Launching the kernel invalidates the persistent function state set through the following deprecated APIs: sys::cuFuncSetBlockShape, sys::cuFuncSetSharedSize, sys::cuParamSetSize, sys::cuParamSeti, sys::cuParamSetf, sys::cuParamSetv.

The kernel must either have been compiled with toolchain version 3.2 or later so that it contains kernel parameter information, or have no kernel parameters. If either of these conditions is not met, the launch returns crate::error::Status::InvalidImage.

§Errors

Returns crate::error::Status::InvalidImage if the kernel parameter metadata requirements above are not met. Also returns an error if stream belongs to a different context, the module context cannot be bound, CUDA rejects the launch, or a previous asynchronous launch reports an error.

Source

pub unsafe fn add_to_graph<'a, P>( &self, graph: &mut Graph, dependencies: &[GraphNode], config: &LaunchConfig, params: P, ) -> Result<GraphNode>
where P: KernelLaunchArgs<'a>,

Adds this kernel to graph as a kernel node.

§Safety

CUDA copies each kernel argument value during this call. Non-pointer argument values may be borrowed from stack or temporary storage that outlives this call. If an argument value is a pointer, CUDA stores only the pointer address. The caller must ensure every copied pointer value remains valid for every graph instantiation, update, and launch that can execute the created node. Mutable pointer arguments must remain exclusive for the work ordered by those launches.

Source

pub unsafe fn set_graph_node_params<'a, P>( &self, executable: &mut ExecutableGraph, node: GraphNode, config: &LaunchConfig, params: P, ) -> Result<()>
where P: KernelLaunchArgs<'a>,

Updates this kernel’s parameters in an executable graph node.

§Safety

CUDA copies each kernel argument value during this call. Non-pointer argument values may be borrowed from stack or temporary storage that outlives this call. If an argument value is a pointer, CUDA stores only the pointer address. The caller must ensure every copied pointer value remains valid for every future launch that can execute node. Mutable pointer arguments must remain exclusive for the work ordered by those launches.

Source

pub const fn module(&self) -> &Module

Source

pub fn name(&self) -> Result<String>

Source

pub fn attribute(&self, attribute: FunctionAttribute) -> Result<i32>

Source

pub fn set_attribute( &self, attribute: FunctionAttribute, value: i32, ) -> Result<()>

Source

pub fn set_max_dynamic_shared_memory_bytes(&self, bytes: i32) -> Result<()>

Source

pub fn set_preferred_shared_memory_carveout( &self, carveout: SharedMemoryCarveout, ) -> Result<()>

Source

pub fn attributes(&self) -> Result<FunctionAttributes>

Source

pub fn occupancy_max_active_blocks_per_multiprocessor( &self, block_size: i32, dynamic_shared_memory_bytes: usize, ) -> Result<i32>

Source

pub fn occupancy_max_active_blocks_per_multiprocessor_with_flags( &self, block_size: i32, dynamic_shared_memory_bytes: usize, flags: OccupancyFlags, ) -> Result<i32>

Returns the maximum number of active blocks per streaming multiprocessor.

flags controls how special cases are handled. The valid flags are:

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_available_dynamic_shared_memory_per_block( &self, num_blocks: i32, block_size: i32, ) -> Result<usize>

Returns dynamic shared memory available per block when launching num_blocks blocks on a streaming multiprocessor.

The returned value is the maximum size of dynamic shared memory that allows num_blocks blocks per streaming multiprocessor.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_max_potential_block_size( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, ) -> Result<OccupancyMaxPotentialBlockSize>

Source

pub fn occupancy_max_potential_block_size_with_flags( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, flags: OccupancyFlags, ) -> Result<OccupancyMaxPotentialBlockSize>

An extended version of sys::cuOccupancyMaxPotentialBlockSize. In addition to arguments passed to sys::cuOccupancyMaxPotentialBlockSize, KernelFunction::occupancy_max_potential_block_size_with_flags also takes flags.

flags controls how special cases are handled. The valid flags are:

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_max_potential_cluster_size( &self, config: ClusterLaunchConfig, ) -> Result<i32>

Given this kernel and launch configuration, returns the maximum cluster size.

The cluster dimensions in config are ignored. If the kernel has a required cluster size set, the returned value reflects the required cluster size.

By default this returns a value that is portable on future hardware. A higher value may be returned if the kernel function allows non-portable cluster sizes.

Respects the compile-time launch bounds.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_max_active_clusters( &self, config: ClusterLaunchConfig, ) -> Result<i32>

Given this kernel and launch configuration, returns the maximum number of clusters that could co-exist on the target device.

If the kernel already has a required cluster size set, the cluster size from config must either be unspecified or match the required size. Without required sizes, the cluster size must be specified in config; otherwise this method returns an error.

Various kernel function attributes may affect occupancy calculation. Runtime environment may affect how the hardware schedules the clusters, so the calculated occupancy is not guaranteed to be achievable.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, config does not specify a valid cluster size for this kernel, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub const fn as_raw(&self) -> DeviceFunction

Trait Implementations§

Source§

impl<'a> Debug for KernelFunction<'a>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> !Send for KernelFunction<'a>

§

impl<'a> !Sync for KernelFunction<'a>

§

impl<'a> Freeze for KernelFunction<'a>

§

impl<'a> RefUnwindSafe for KernelFunction<'a>

§

impl<'a> Unpin for KernelFunction<'a>

§

impl<'a> UnsafeUnpin for KernelFunction<'a>

§

impl<'a> UnwindSafe for KernelFunction<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.