pub struct Function<'a> { /* private fields */ }
Expand description
Handle to a global kernel function.
Implementations
sourceimpl<'a> Function<'a>
impl<'a> Function<'a>
sourcepub fn get_attribute(&self, attr: FunctionAttribute) -> CudaResult<i32>
pub fn get_attribute(&self, attr: FunctionAttribute) -> CudaResult<i32>
Returns information about a function.
Examples
use cust::function::FunctionAttribute;
let function = module.get_function("sum")?;
let shared_memory = function.get_attribute(FunctionAttribute::SharedMemorySizeBytes)?;
println!("This function uses {} bytes of shared memory", shared_memory);
sourcepub fn set_cache_config(&mut self, config: CacheConfig) -> CudaResult<()>
pub fn set_cache_config(&mut self, config: CacheConfig) -> CudaResult<()>
Sets the preferred cache configuration for this function.
On devices where L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for this function. This is only a preference. The driver will use the requested configuration if possible, but is free to choose a different configuration if required to execute the function. This setting will override the context-wide setting.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Example
use cust::context::CacheConfig;
let mut function = module.get_function("sum")?;
function.set_cache_config(CacheConfig::PreferL1)?;
Sets the preferred shared memory configuration for this function.
On devices with configurable shared memory banks, this function will set this function’s shared memory bank size which is used for subsequent launches of this function. If not set, the context-wide setting will be used instead.
Example
use cust::context::SharedMemoryConfig;
let mut function = module.get_function("sum")?;
function.set_shared_memory_config(SharedMemoryConfig::EightByteBankSize)?;
sourcepub fn to_raw(&self) -> CUfunction
pub fn to_raw(&self) -> CUfunction
Retrieves a raw handle to this function.
The amount of dynamic shared memory available per block when launching blocks
on
a streaming multiprocessor.
sourcepub fn max_active_blocks_per_multiprocessor(
&self,
block_size: BlockSize,
dynamic_smem_size: usize
) -> CudaResult<u32>
pub fn max_active_blocks_per_multiprocessor(
&self,
block_size: BlockSize,
dynamic_smem_size: usize
) -> CudaResult<u32>
The maximum number of active blocks per streaming multiprocessor when this function
is launched with a specific block_size
with some amount of dynamic shared memory.
sourcepub fn suggested_launch_configuration(
&self,
dynamic_smem_size: usize,
block_size_limit: BlockSize
) -> CudaResult<(u32, u32)>
pub fn suggested_launch_configuration(
&self,
dynamic_smem_size: usize,
block_size_limit: BlockSize
) -> CudaResult<(u32, u32)>
Returns a reasonable block and grid size to achieve the maximum capacity for the launch (the max number of active warps with the fewest blocks per multiprocessor).
Params
dynamic_smem_size
is the amount of dynamic shared memory required by this function. We currently do not expose
a way of determining this dynamically based on block size due to safety concerns.
block_size_limit
is the maximum block size that this function is designed to handle. if this is 0
CUDA will use the maximum
block size permitted by the device/function instead.
Note: all panics by dynamic_smem_size
will be ignored and the function will instead use 0
.
Trait Implementations
impl Send for Function<'_>
impl Sync for Function<'_>
Auto Trait Implementations
impl<'a> RefUnwindSafe for Function<'a>
impl<'a> Unpin for Function<'a>
impl<'a> UnwindSafe for Function<'a>
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcepub fn borrow_mut(&mut self) -> &mut T
pub fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more