Struct Function

Source

pub struct Function<'a> { /* private fields */ }

Expand description

Handle to a global kernel function.

Implementations§

Source §

impl<'a> Function<'a>

Source

pub fn get_attribute(&self, attr: FunctionAttribute) -> CudaResult<i32>

Returns information about a function.

§Examples

use cust::function::FunctionAttribute;
let function = module.get_function("sum")?;
let shared_memory = function.get_attribute(FunctionAttribute::SharedMemorySizeBytes)?;
println!("This function uses {} bytes of shared memory", shared_memory);

Source

pub fn set_cache_config(&mut self, config: CacheConfig) -> CudaResult<()>

Sets the preferred cache configuration for this function.

On devices where L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for this function. This is only a preference. The driver will use the requested configuration if possible, but is free to choose a different configuration if required to execute the function. This setting will override the context-wide setting.

This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.

§Example

use cust::context::CacheConfig;
let mut function = module.get_function("sum")?;
function.set_cache_config(CacheConfig::PreferL1)?;

Source

pub fn set_shared_memory_config( &mut self, cfg: SharedMemoryConfig, ) -> CudaResult<()>

Sets the preferred shared memory configuration for this function.

On devices with configurable shared memory banks, this function will set this function’s shared memory bank size which is used for subsequent launches of this function. If not set, the context-wide setting will be used instead.

§Example

use cust::context::SharedMemoryConfig;
let mut function = module.get_function("sum")?;
function.set_shared_memory_config(SharedMemoryConfig::EightByteBankSize)?;

Source

pub fn to_raw(&self) -> CUfunction

Retrieves a raw handle to this function.

Source

pub fn available_dynamic_shared_memory_per_block( &self, blocks: GridSize, block_size: BlockSize, ) -> CudaResult<usize>

The amount of dynamic shared memory available per block when launching blocks on a streaming multiprocessor.

Source

pub fn max_active_blocks_per_multiprocessor( &self, block_size: BlockSize, dynamic_smem_size: usize, ) -> CudaResult<u32>

The maximum number of active blocks per streaming multiprocessor when this function is launched with a specific block_size with some amount of dynamic shared memory.

Source

pub fn suggested_launch_configuration( &self, dynamic_smem_size: usize, block_size_limit: BlockSize, ) -> CudaResult<(u32, u32)>

Returns a reasonable block and grid size to achieve the maximum capacity for the launch (the max number of active warps with the fewest blocks per multiprocessor).

§Params

dynamic_smem_size is the amount of dynamic shared memory required by this function. We currently do not expose a way of determining this dynamically based on block size due to safety concerns.

block_size_limit is the maximum block size that this function is designed to handle. if this is 0 CUDA will use the maximum block size permitted by the device/function instead.

Note: all panics by dynamic_smem_size will be ignored and the function will instead use 0.