Skip to main content

cuda_batch_step

Function cuda_batch_step 

Source
pub fn cuda_batch_step<A, S>(
    store: &S,
    ptx_source: &str,
    _module_name: &str,
    kernel_name: &str,
    block_size: u32,
) -> Result<AccelStepResult, String>
where A: SoaExtractable, S: AgentStore<A>,
Expand description

CUDA batch step over SoA columns.

Uploads agent columns to the GPU on a dedicated cudarc::driver::CudaStream, launches the named kernel from the provided PTX source, then downloads results back onto the host and writes them into the agent store.

Supports any number of SoA columns — the launch uses the stream-based cudarc::driver::CudaStream::launch_builder API and collects (col_0, col_1, …, col_{k-1}, n) as argument slots in order.

Failure surfaces returned as Err(String) include:

  • invalid block_size
  • CUDA context initialization
  • PTX compile/load or kernel lookup
  • host/device transfer failures
  • kernel launch or stream synchronization failures

§Safety requirements for the PTX kernel

  • the kernel signature must match the launched argument list
  • the kernel must bounds-check against n
  • the kernel must not read or write outside the provided column buffers

§Arguments

  • store – the agent store to extract from / write back to
  • ptx_source – PTX source string (compile your .cu to PTX offline or embed it)
  • module_name – name for the loaded module (unused with cudarc 0.19, kept for source-compatibility with the previous API)
  • kernel_name – the __global__ function name inside the PTX
  • block_size – CUDA threads per block (e.g. 256)