pub struct StreamCapture { /* private fields */ }Expand description
Records GPU operations submitted to a stream into a Graph.
Stream capture intercepts operations that would normally be submitted
to a CUDA stream and instead records them as graph nodes. The captured
operations can then be replayed efficiently via GraphExec.
§Usage
let mut capture = StreamCapture::begin(&stream)?;
capture.record_kernel("my_kernel", (4, 1, 1), (256, 1, 1), 0);
capture.record_memcpy(MemcpyDirection::DeviceToHost, 1024);
let graph = capture.end()?;
assert_eq!(graph.node_count(), 2);Implementations§
Source§impl StreamCapture
impl StreamCapture
Sourcepub fn begin(_stream: &Stream) -> CudaResult<Self>
pub fn begin(_stream: &Stream) -> CudaResult<Self>
Begins capturing operations on the given stream.
On a real CUDA system, this would call
cuStreamBeginCapture(stream, CU_STREAM_CAPTURE_MODE_GLOBAL).
§Errors
Returns CudaError::NotInitialized if the CUDA driver is not
available.
Sourcepub fn record_kernel(
&mut self,
function_name: &str,
grid: (u32, u32, u32),
block: (u32, u32, u32),
shared_mem: u32,
)
pub fn record_kernel( &mut self, function_name: &str, grid: (u32, u32, u32), block: (u32, u32, u32), shared_mem: u32, )
Records a kernel launch operation in the capture.
§Parameters
function_name- Name of the kernel function.grid- Grid dimensions(x, y, z).block- Block dimensions(x, y, z).shared_mem- Dynamic shared memory in bytes.
Sourcepub fn record_memcpy(&mut self, direction: MemcpyDirection, size: usize)
pub fn record_memcpy(&mut self, direction: MemcpyDirection, size: usize)
Records a memory copy operation in the capture.
§Parameters
direction- Direction of the memory copy.size- Size of the transfer in bytes.
Sourcepub fn record_memset(&mut self, size: usize, value: u8)
pub fn record_memset(&mut self, size: usize, value: u8)
Records a memset operation in the capture.
§Parameters
size- Number of bytes to set.value- Byte value to fill with.
Sourcepub fn recorded_count(&self) -> usize
pub fn recorded_count(&self) -> usize
Returns the number of operations recorded so far.
Sourcepub fn end(self) -> CudaResult<Graph>
pub fn end(self) -> CudaResult<Graph>
Ends the capture and returns the resulting Graph.
On a real CUDA system, this would call cuStreamEndCapture
and return the captured graph handle.
The captured nodes are connected in a linear chain (each node depends on the previous one) to preserve the order in which operations were recorded.
§Errors
Returns CudaError::StreamCaptureUnmatched if the capture
was already ended.