Struct KernelFunction

Source

pub struct KernelFunction<'a> { /* private fields */ }

Implementations§

Source §

impl KernelFunction<'_>

Source

pub const unsafe fn from_raw( handle: DeviceFunction, module: &Module, ) -> KernelFunction<'_>

Source

pub const unsafe fn launch_operation<'kernel, 'config, P>( &'kernel self, config: &'config LaunchConfig, params: P, ) -> KernelLaunchOperation<'kernel, 'config, P>

Creates a stream operation that launches this kernel.

§Safety

If this operation is recorded during stream capture, CUDA copies kernel argument values into the captured graph. For pointer arguments, only the pointer address is copied. The caller must ensure every copied pointer value remains valid for every captured graph execution that can use this operation, and mutable pointer arguments must remain exclusive for the work ordered by those graph launches.

Source

pub fn launch<'a, P>(&self, config: &LaunchConfig, params: P) -> Result<()>
where P: KernelLaunchArgs<'a>,

Invokes this kernel function on a grid of blocks. Each block contains the threads specified by LaunchConfig::block_dim.

LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block.

Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.

Launching the kernel invalidates the persistent function state set through the following deprecated APIs: sys::cuFuncSetBlockShape, sys::cuFuncSetSharedSize, sys::cuParamSetSize, sys::cuParamSeti, sys::cuParamSetf, sys::cuParamSetv.

The kernel must either have been compiled with toolchain version 3.2 or later so that it contains kernel parameter information, or have no kernel parameters. If either of these conditions is not met, the launch returns crate::error::Status::InvalidImage.

§Errors

Returns crate::error::Status::InvalidImage if the kernel parameter metadata requirements above are not met. Also returns an error if the module context cannot be bound, CUDA rejects the launch, or a previous asynchronous launch reports an error.

Source

pub fn launch_on<'a, P>( &self, config: &LaunchConfig, params: P, stream: &Stream, ) -> Result<()>
where P: KernelLaunchArgs<'a>,

Invokes this kernel function on a grid of blocks using the given stream. Each block contains the threads specified by LaunchConfig::block_dim.

LaunchConfig::shared_memory_bytes sets the amount of dynamic shared memory available to each thread block.

Kernel parameters are passed with KernelParameters or tuples of shared or mutable references.

Launching the kernel invalidates the persistent function state set through the following deprecated APIs: sys::cuFuncSetBlockShape, sys::cuFuncSetSharedSize, sys::cuParamSetSize, sys::cuParamSeti, sys::cuParamSetf, sys::cuParamSetv.

The kernel must either have been compiled with toolchain version 3.2 or later so that it contains kernel parameter information, or have no kernel parameters. If either of these conditions is not met, the launch returns crate::error::Status::InvalidImage.

§Errors

Returns crate::error::Status::InvalidImage if the kernel parameter metadata requirements above are not met. Also returns an error if stream belongs to a different context, the module context cannot be bound, CUDA rejects the launch, or a previous asynchronous launch reports an error.

Source

pub unsafe fn add_to_graph<'a, P>( &self, graph: &mut Graph, dependencies: &[GraphNode], config: &LaunchConfig, params: P, ) -> Result<GraphNode>
where P: KernelLaunchArgs<'a>,

Adds this kernel to graph as a kernel node.

§Safety

CUDA copies each kernel argument value during this call. Non-pointer argument values may be borrowed from stack or temporary storage that outlives this call. If an argument value is a pointer, CUDA stores only the pointer address. The caller must ensure every copied pointer value remains valid for every graph instantiation, update, and launch that can execute the created node. Mutable pointer arguments must remain exclusive for the work ordered by those launches.

Source

pub unsafe fn set_graph_node_params<'a, P>( &self, executable: &mut ExecutableGraph, node: GraphNode, config: &LaunchConfig, params: P, ) -> Result<()>
where P: KernelLaunchArgs<'a>,

Updates this kernel’s parameters in an executable graph node.

§Safety

CUDA copies each kernel argument value during this call. Non-pointer argument values may be borrowed from stack or temporary storage that outlives this call. If an argument value is a pointer, CUDA stores only the pointer address. The caller must ensure every copied pointer value remains valid for every future launch that can execute node. Mutable pointer arguments must remain exclusive for the work ordered by those launches.

Source

pub fn set_attribute( &self, attribute: FunctionAttribute, value: i32, ) -> Result<()>

Source

pub fn set_max_dynamic_shared_memory_bytes(&self, bytes: i32) -> Result<()>

Source

pub fn set_preferred_shared_memory_carveout( &self, carveout: SharedMemoryCarveout, ) -> Result<()>

Source

pub fn attributes(&self) -> Result<FunctionAttributes>

Source

pub fn occupancy_max_active_blocks_per_multiprocessor( &self, block_size: i32, dynamic_shared_memory_bytes: usize, ) -> Result<i32>

Source

pub fn occupancy_max_active_blocks_per_multiprocessor_with_flags( &self, block_size: i32, dynamic_shared_memory_bytes: usize, flags: OccupancyFlags, ) -> Result<i32>

Returns the maximum number of active blocks per streaming multiprocessor.

flags controls how special cases are handled. The valid flags are:

OccupancyFlags::DEFAULT, which maintains the default behavior as sys::cuOccupancyMaxActiveBlocksPerMultiprocessor;
OccupancyFlags::DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platforms where global caching affects occupancy. On such platforms, if caching is enabled, but per-block SM resource usage would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching is disabled. Setting OccupancyFlags::DISABLE_CACHING_OVERRIDE makes the occupancy calculator return 0 in such cases. More information can be found about this feature in the “Unified L1/Texture Cache” section of the Maxwell tuning guide.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_available_dynamic_shared_memory_per_block( &self, num_blocks: i32, block_size: i32, ) -> Result<usize>

Returns dynamic shared memory available per block when launching num_blocks blocks on a streaming multiprocessor.

The returned value is the maximum size of dynamic shared memory that allows num_blocks blocks per streaming multiprocessor.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_max_potential_block_size( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, ) -> Result<OccupancyMaxPotentialBlockSize>

Source

pub fn occupancy_max_potential_block_size_with_flags( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, flags: OccupancyFlags, ) -> Result<OccupancyMaxPotentialBlockSize>

An extended version of sys::cuOccupancyMaxPotentialBlockSize. In addition to arguments passed to sys::cuOccupancyMaxPotentialBlockSize, KernelFunction::occupancy_max_potential_block_size_with_flags also takes flags.

flags controls how special cases are handled. The valid flags are:

OccupancyFlags::DEFAULT, which maintains the default behavior as sys::cuOccupancyMaxPotentialBlockSize;
OccupancyFlags::DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platforms where global caching affects occupancy. On such platforms, the launch configurations that produce maximal occupancy might not support global caching. Setting OccupancyFlags::DISABLE_CACHING_OVERRIDE guarantees that the produced launch configuration is global caching compatible at a potential cost of occupancy. More information can be found about this feature in the “Unified L1/Texture Cache” section of the Maxwell tuning guide.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_max_potential_cluster_size( &self, config: ClusterLaunchConfig, ) -> Result<i32>

Given this kernel and launch configuration, returns the maximum cluster size.

The cluster dimensions in config are ignored. If the kernel has a required cluster size set, the returned value reflects the required cluster size.

By default this returns a value that is portable on future hardware. A higher value may be returned if the kernel function allows non-portable cluster sizes.

Respects the compile-time launch bounds.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub fn occupancy_max_active_clusters( &self, config: ClusterLaunchConfig, ) -> Result<i32>

Given this kernel and launch configuration, returns the maximum number of clusters that could co-exist on the target device.

If the kernel already has a required cluster size set, the cluster size from config must either be unspecified or match the required size. Without required sizes, the cluster size must be specified in config; otherwise this method returns an error.

Various kernel function attributes may affect occupancy calculation. Runtime environment may affect how the hardware schedules the clusters, so the calculated occupancy is not guaranteed to be achievable.

For context-less kernels queried via Library::kernel. Here, this wrapper uses the current context for calculations.

§Errors

Returns an error if the module context cannot be bound, config does not specify a valid cluster size for this kernel, CUDA rejects the occupancy query, or a previous asynchronous launch reports an error.

Source

pub const fn as_raw(&self) -> DeviceFunction

Trait Implementations§

Source §

impl<'a> Debug for KernelFunction<'a>

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> !Send for KernelFunction<'a>

§

impl<'a> !Sync for KernelFunction<'a>

§

impl<'a> Freeze for KernelFunction<'a>

§

impl<'a> RefUnwindSafe for KernelFunction<'a>

§

impl<'a> Unpin for KernelFunction<'a>

§

impl<'a> UnsafeUnpin for KernelFunction<'a>

§

impl<'a> UnwindSafe for KernelFunction<'a>

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Struct KernelFunction Copy item path

Implementations§

impl KernelFunction<'_>

pub const unsafe fn from_raw( handle: DeviceFunction, module: &Module, ) -> KernelFunction<'_>

pub const unsafe fn launch_operation<'kernel, 'config, P>( &'kernel self, config: &'config LaunchConfig, params: P, ) -> KernelLaunchOperation<'kernel, 'config, P>

§Safety

pub fn launch<'a, P>(&self, config: &LaunchConfig, params: P) -> Result<()>where P: KernelLaunchArgs<'a>,

§Errors

pub fn launch_on<'a, P>( &self, config: &LaunchConfig, params: P, stream: &Stream, ) -> Result<()>where P: KernelLaunchArgs<'a>,

§Errors

pub unsafe fn add_to_graph<'a, P>( &self, graph: &mut Graph, dependencies: &[GraphNode], config: &LaunchConfig, params: P, ) -> Result<GraphNode>where P: KernelLaunchArgs<'a>,

§Safety

pub unsafe fn set_graph_node_params<'a, P>( &self, executable: &mut ExecutableGraph, node: GraphNode, config: &LaunchConfig, params: P, ) -> Result<()>where P: KernelLaunchArgs<'a>,

§Safety

pub const fn module(&self) -> &Module

pub fn name(&self) -> Result<String>

pub fn attribute(&self, attribute: FunctionAttribute) -> Result<i32>

pub fn set_attribute( &self, attribute: FunctionAttribute, value: i32, ) -> Result<()>

pub fn set_max_dynamic_shared_memory_bytes(&self, bytes: i32) -> Result<()>

pub fn set_preferred_shared_memory_carveout( &self, carveout: SharedMemoryCarveout, ) -> Result<()>

pub fn attributes(&self) -> Result<FunctionAttributes>

pub fn occupancy_max_active_blocks_per_multiprocessor( &self, block_size: i32, dynamic_shared_memory_bytes: usize, ) -> Result<i32>

pub fn occupancy_max_active_blocks_per_multiprocessor_with_flags( &self, block_size: i32, dynamic_shared_memory_bytes: usize, flags: OccupancyFlags, ) -> Result<i32>

§Errors

pub fn occupancy_available_dynamic_shared_memory_per_block( &self, num_blocks: i32, block_size: i32, ) -> Result<usize>

§Errors

pub fn occupancy_max_potential_block_size( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, ) -> Result<OccupancyMaxPotentialBlockSize>

pub fn occupancy_max_potential_block_size_with_flags( &self, dynamic_shared_memory_bytes: usize, block_size_limit: i32, flags: OccupancyFlags, ) -> Result<OccupancyMaxPotentialBlockSize>

§Errors

pub fn occupancy_max_potential_cluster_size( &self, config: ClusterLaunchConfig, ) -> Result<i32>

§Errors

pub fn occupancy_max_active_clusters( &self, config: ClusterLaunchConfig, ) -> Result<i32>

§Errors

pub const fn as_raw(&self) -> DeviceFunction

Trait Implementations§

impl<'a> Debug for KernelFunction<'a>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl<'a> !Send for KernelFunction<'a>

impl<'a> !Sync for KernelFunction<'a>

impl<'a> Freeze for KernelFunction<'a>

impl<'a> RefUnwindSafe for KernelFunction<'a>

impl<'a> Unpin for KernelFunction<'a>

impl<'a> UnsafeUnpin for KernelFunction<'a>

impl<'a> UnwindSafe for KernelFunction<'a>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DTwhere ST: ?Sized, DT: ?Sized,

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DTwhere ST: ?Sized, DT: ?Sized,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> Read<Exclusive, BecauseExclusive> for Twhere T: ?Sized,

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct KernelFunction

pub fn launch<'a, P>(&self, config: &LaunchConfig, params: P) -> Result<()>
where P: KernelLaunchArgs<'a>,

pub fn launch_on<'a, P>( &self, config: &LaunchConfig, params: P, stream: &Stream, ) -> Result<()>
where P: KernelLaunchArgs<'a>,

pub unsafe fn add_to_graph<'a, P>( &self, graph: &mut Graph, dependencies: &[GraphNode], config: &LaunchConfig, params: P, ) -> Result<GraphNode>
where P: KernelLaunchArgs<'a>,

pub unsafe fn set_graph_node_params<'a, P>( &self, executable: &mut ExecutableGraph, node: GraphNode, config: &LaunchConfig, params: P, ) -> Result<()>
where P: KernelLaunchArgs<'a>,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,