pub struct KernelFunction<'a> { /* private fields */ }Implementations§
Source§impl KernelFunction<'_>
impl KernelFunction<'_>
pub const unsafe fn from_raw( handle: DeviceFunction, module: &Module, ) -> KernelFunction<'_>
Sourcepub const unsafe fn launch_operation<'kernel, 'config, P>(
&'kernel self,
config: &'config LaunchConfig,
params: P,
) -> KernelLaunchOperation<'kernel, 'config, P>
pub const unsafe fn launch_operation<'kernel, 'config, P>( &'kernel self, config: &'config LaunchConfig, params: P, ) -> KernelLaunchOperation<'kernel, 'config, P>
Creates a stream operation that launches this kernel.
§Safety
If this operation is recorded during stream capture, CUDA copies kernel argument values into the captured graph. For pointer arguments, only the pointer address is copied. The caller must ensure every copied pointer value remains valid for every captured graph execution that can use this operation, and mutable pointer arguments must remain exclusive for the work ordered by those graph launches.
Sourcepub fn launch<'a, P>(&self, config: &LaunchConfig, params: P) -> Result<()>where
P: KernelLaunchArgs<'a>,
pub fn launch<'a, P>(&self, config: &LaunchConfig, params: P) -> Result<()>where
P: KernelLaunchArgs<'a>,
Invokes this kernel function on a grid of blocks.
Each block contains the threads specified by LaunchConfig::block_dim.
LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block.
Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.
Launching the kernel invalidates the persistent function state set through the following deprecated APIs: sys::cuFuncSetBlockShape, sys::cuFuncSetSharedSize, sys::cuParamSetSize, sys::cuParamSeti, sys::cuParamSetf, sys::cuParamSetv.
The kernel must either have been compiled with toolchain version 3.2 or later so that it contains kernel parameter information, or have no kernel parameters.
If either of these conditions is not met, the launch returns crate::error::Status::InvalidImage.
§Errors
Returns crate::error::Status::InvalidImage if the kernel parameter metadata
requirements above are not met. Also returns an error if the module
context cannot be bound, CUDA rejects the launch, or a previous
asynchronous launch reports an error.
Sourcepub fn launch_on<'a, P>(
&self,
config: &LaunchConfig,
params: P,
stream: &Stream,
) -> Result<()>where
P: KernelLaunchArgs<'a>,
pub fn launch_on<'a, P>(
&self,
config: &LaunchConfig,
params: P,
stream: &Stream,
) -> Result<()>where
P: KernelLaunchArgs<'a>,
Invokes this kernel function on a grid of blocks using the given stream.
Each block contains the threads specified by LaunchConfig::block_dim.
LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block.
Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.
Launching the kernel invalidates the persistent function state set through the following deprecated APIs: sys::cuFuncSetBlockShape, sys::cuFuncSetSharedSize, sys::cuParamSetSize, sys::cuParamSeti, sys::cuParamSetf, sys::cuParamSetv.
The kernel must either have been compiled with toolchain version 3.2 or later so that it contains kernel parameter information, or have no kernel parameters.
If either of these conditions is not met, the launch returns crate::error::Status::InvalidImage.
§Errors
Returns crate::error::Status::InvalidImage if the kernel parameter metadata
requirements above are not met. Also returns an error if stream belongs
to a different context, the module context cannot be bound, CUDA rejects
the launch, or a previous asynchronous launch reports an error.
Sourcepub unsafe fn add_to_graph<'a, P>(
&self,
graph: &mut Graph,
dependencies: &[GraphNode],
config: &LaunchConfig,
params: P,
) -> Result<GraphNode>where
P: KernelLaunchArgs<'a>,
pub unsafe fn add_to_graph<'a, P>(
&self,
graph: &mut Graph,
dependencies: &[GraphNode],
config: &LaunchConfig,
params: P,
) -> Result<GraphNode>where
P: KernelLaunchArgs<'a>,
Adds this kernel to graph as a kernel node.
§Safety
CUDA copies each kernel argument value during this call. Non-pointer argument values may be borrowed from stack or temporary storage that outlives this call. If an argument value is a pointer, CUDA stores only the pointer address. The caller must ensure every copied pointer value remains valid for every graph instantiation, update, and launch that can execute the created node. Mutable pointer arguments must remain exclusive for the work ordered by those launches.
Sourcepub unsafe fn set_graph_node_params<'a, P>(
&self,
executable: &mut ExecutableGraph,
node: GraphNode,
config: &LaunchConfig,
params: P,
) -> Result<()>where
P: KernelLaunchArgs<'a>,
pub unsafe fn set_graph_node_params<'a, P>(
&self,
executable: &mut ExecutableGraph,
node: GraphNode,
config: &LaunchConfig,
params: P,
) -> Result<()>where
P: KernelLaunchArgs<'a>,
Updates this kernel’s parameters in an executable graph node.
§Safety
CUDA copies each kernel argument value during this call. Non-pointer
argument values may be borrowed from stack or temporary storage that
outlives this call. If an argument value is a pointer, CUDA stores only
the pointer address. The caller must ensure every copied pointer value
remains valid for every future launch that can execute node. Mutable
pointer arguments must remain exclusive for the work ordered by those
launches.
pub const fn module(&self) -> &Module
pub fn name(&self) -> Result<String>
pub fn attribute(&self, attribute: FunctionAttribute) -> Result<i32>
pub fn set_attribute( &self, attribute: FunctionAttribute, value: i32, ) -> Result<()>
pub fn attributes(&self) -> Result<FunctionAttributes>
pub fn occupancy_max_active_blocks_per_multiprocessor( &self, block_size: i32, dynamic_shared_memory_bytes: usize, ) -> Result<i32>
Sourcepub fn occupancy_max_active_blocks_per_multiprocessor_with_flags(
&self,
block_size: i32,
dynamic_shared_memory_bytes: usize,
flags: OccupancyFlags,
) -> Result<i32>
pub fn occupancy_max_active_blocks_per_multiprocessor_with_flags( &self, block_size: i32, dynamic_shared_memory_bytes: usize, flags: OccupancyFlags, ) -> Result<i32>
Returns the maximum number of active blocks per streaming multiprocessor.
flags controls how special cases are handled.
The valid flags are:
-
OccupancyFlags::DEFAULT, which maintains the default behavior assys::cuOccupancyMaxActiveBlocksPerMultiprocessor; -
OccupancyFlags::DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platforms where global caching affects occupancy. On such platforms, if caching is enabled, but per-block SM resource usage would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching is disabled. SettingOccupancyFlags::DISABLE_CACHING_OVERRIDEmakes the occupancy calculator return 0 in such cases. More information can be found about this feature in the “Unified L1/Texture Cache” section of the Maxwell tuning guide.
For context-less kernels queried via Library::kernel.
Here, this wrapper uses the current context for calculations.
§Errors
Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.
Returns dynamic shared memory available per block when launching num_blocks blocks on a streaming multiprocessor.
The returned value is the maximum size of dynamic shared memory that allows num_blocks blocks per streaming multiprocessor.
For context-less kernels queried via Library::kernel.
Here, this wrapper uses the current context for calculations.
§Errors
Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.
pub fn occupancy_max_potential_block_size( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, ) -> Result<OccupancyMaxPotentialBlockSize>
Sourcepub fn occupancy_max_potential_block_size_with_flags(
&self,
dynamic_shared_memory_bytes: usize,
block_size_limit: i32,
flags: OccupancyFlags,
) -> Result<OccupancyMaxPotentialBlockSize>
pub fn occupancy_max_potential_block_size_with_flags( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, flags: OccupancyFlags, ) -> Result<OccupancyMaxPotentialBlockSize>
An extended version of sys::cuOccupancyMaxPotentialBlockSize.
In addition to arguments passed to sys::cuOccupancyMaxPotentialBlockSize, KernelFunction::occupancy_max_potential_block_size_with_flags also takes flags.
flags controls how special cases are handled.
The valid flags are:
-
OccupancyFlags::DEFAULT, which maintains the default behavior assys::cuOccupancyMaxPotentialBlockSize; -
OccupancyFlags::DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platforms where global caching affects occupancy. On such platforms, the launch configurations that produce maximal occupancy might not support global caching. SettingOccupancyFlags::DISABLE_CACHING_OVERRIDEguarantees that the produced launch configuration is global caching compatible at a potential cost of occupancy. More information can be found about this feature in the “Unified L1/Texture Cache” section of the Maxwell tuning guide.
For context-less kernels queried via Library::kernel.
Here, this wrapper uses the current context for calculations.
§Errors
Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.
Sourcepub fn occupancy_max_potential_cluster_size(
&self,
config: ClusterLaunchConfig,
) -> Result<i32>
pub fn occupancy_max_potential_cluster_size( &self, config: ClusterLaunchConfig, ) -> Result<i32>
Given this kernel and launch configuration, returns the maximum cluster size.
The cluster dimensions in config are ignored.
If the kernel has a required cluster size set, the returned value reflects the required cluster size.
By default this returns a value that is portable on future hardware. A higher value may be returned if the kernel function allows non-portable cluster sizes.
Respects the compile-time launch bounds.
For context-less kernels queried via Library::kernel.
Here, this wrapper uses the current context for calculations.
§Errors
Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.
Sourcepub fn occupancy_max_active_clusters(
&self,
config: ClusterLaunchConfig,
) -> Result<i32>
pub fn occupancy_max_active_clusters( &self, config: ClusterLaunchConfig, ) -> Result<i32>
Given this kernel and launch configuration, returns the maximum number of clusters that could co-exist on the target device.
If the kernel already has a required cluster size set, the cluster size from config must either be unspecified or match the required size.
Without required sizes, the cluster size must be specified in config; otherwise this method returns an error.
Various kernel function attributes may affect occupancy calculation. Runtime environment may affect how the hardware schedules the clusters, so the calculated occupancy is not guaranteed to be achievable.
For context-less kernels queried via Library::kernel.
Here, this wrapper uses the current context for calculations.
§Errors
Returns an error if the module context cannot be bound, config does
not specify a valid cluster size for this kernel, CUDA rejects the
occupancy query, or a previous asynchronous launch reports an error.