Struct Stream

Source

pub struct Stream { /* private fields */ }

Expand description

An asynchronous work queue on a CUDA device.

Work submitted to the same stream executes in order; work on different streams may run concurrently, subject to device scheduling. Streams are Send + Sync — CUDA explicitly permits concurrent submission from multiple host threads.

Implementations§

Source §

impl Stream

Source

pub fn begin_capture(&self, mode: CaptureMode) -> Result<()>

Begin recording operations submitted to this stream into a CUDA graph.

Call Stream::end_capture to retrieve the resulting Graph. Most operations (kernel launches, memcpys, event records) enqueued between these two calls are captured rather than executed.

Source

pub fn end_capture(&self) -> Result<Graph>

Stop capture and return the graph of everything that was recorded.

Source

pub fn capture<F>(&self, mode: CaptureMode, f: F) -> Result<Graph>
where F: FnOnce(&Stream) -> Result<()>,

Convenience wrapper: run f, capturing everything it submits to this stream, and return the resulting graph.

f should enqueue its work on self. If it errors out mid-capture, we still end the capture to avoid leaking the captured state.

Source

pub fn is_capturing(&self) -> Result<bool>

true if this stream is currently in capture mode.

Source §

impl Stream

Source

pub fn new(context: &Context) -> Result<Self>

Create a new stream on context with default flags (blocking wrt the legacy default stream).

Source

pub fn non_blocking(context: &Context) -> Result<Self>

Create a non-blocking stream — work on this stream does not synchronize with the legacy null stream.

Source

pub fn with_flags(context: &Context, flags: u32) -> Result<Self>

Create a stream with a raw flag bitmask (see CUstream_flags).

Source

pub fn with_priority( context: &Context, flags: u32, priority: i32, ) -> Result<Self>

Create a stream with a specific priority. Use Context::stream_priority_range to discover the legal range on this device (lower = higher priority).

Source

pub fn priority(&self) -> Result<i32>

This stream’s scheduling priority.

Source

pub fn flags(&self) -> Result<u32>

This stream’s flags bitmask.

Source

pub fn launch_host_func<F>(&self, f: F) -> Result<()>
where F: FnOnce() + Send + 'static,

Enqueue a host-side callback on this stream. The callback runs on a driver-owned thread after all prior stream work completes.

The closure is boxed and freed after it runs; a panic inside will abort the process (there’s no way to propagate it through the C callback). Keep the closure simple.

Source

pub fn synchronize(&self) -> Result<()>

Block the calling thread until all work previously enqueued on this stream has completed.

Source

pub fn is_complete(&self) -> Result<bool>

Ok(true) if the stream has completed all queued work, Ok(false) if work is still outstanding.

Source

pub fn context(&self) -> &Context

The Context this stream lives in.

Source

pub fn as_raw(&self) -> CUstream

Raw CUstream handle. Use with care.

Source

pub fn memcpy_dtod<T: DeviceRepr>( &self, src: &DeviceBuffer<T>, dst: &mut DeviceBuffer<T>, ) -> Result<()>

Device-to-device async copy scheduled on this stream.

Sugar over [DeviceBuffer::copy_to_device_async] with the borrows flipped the way the call-site usually wants them: the destination buffer is taken by &mut, so the borrow checker will catch aliasing bugs at compile time. src.len() must equal dst.len().

let ctx = Context::new(&Device::get(0)?)?;
let stream = Stream::new(&ctx)?;
let src: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
let mut dst: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
stream.memcpy_dtod(&src, &mut dst)?;

Source

pub fn id(&self) -> Result<u64>

Return the driver-assigned 64-bit ID for this stream. Useful for correlating CUPTI traces against baracuda streams.

Source

pub fn copy_attributes_from(&self, src: &Stream) -> Result<()>

Copy all CUDA-managed attributes (access policy window, sync policy) from src onto self. Does not copy priority or flags (those are set at stream creation time).

Source

pub fn wait_event(&self, event: &Event, flags: u32) -> Result<()>

Make this stream wait for event to complete before processing any subsequent work. flags is typically 0 (CU_EVENT_WAIT_DEFAULT). Use this for cross-stream dependencies — record an event on stream A, then have stream B wait on it.

Source

pub unsafe fn get_attribute( &self, attr: i32, value_out: *mut c_void, ) -> Result<()>

Read a CUstreamAttrValue for attr from this stream. The caller passes a writable buffer big enough for the largest attribute value (CUstreamAttrValue is up to 48 bytes). Use the CU_STREAM_ATTRIBUTE_* constants for attr.

§Safety

value_out must be a writable region matching the layout of the CUstreamAttrValue variant for attr.

Source

pub unsafe fn set_attribute( &self, attr: i32, value: *const c_void, ) -> Result<()>

Set a CUstreamAttrValue on this stream. See Self::get_attribute for the value layout.

§Safety

value must point at a properly-initialized CUstreamAttrValue variant for attr.

Source

pub fn attach_mem_async( &self, dptr: CUdeviceptr, length: usize, flags: u32, ) -> Result<()>

Associate a managed-memory region with this stream. Pass flags = 0 for the default (“one thread”).

Source

pub fn write_value_32( &self, addr: CUdeviceptr, value: u32, flags: u32, ) -> Result<()>

Enqueue a 32-bit write of value to device memory addr on this stream, ordered like any other stream op.

flags is a bitmask of baracuda_cuda_sys::types::CUstreamWriteValue_flags.

Source

pub fn write_value_64( &self, addr: CUdeviceptr, value: u64, flags: u32, ) -> Result<()>

Source

pub fn wait_value_32( &self, addr: CUdeviceptr, value: u32, flags: u32, ) -> Result<()>

Block the stream until the device memory at addr satisfies the condition specified by flags (see baracuda_cuda_sys::types::CUstreamWaitValue_flags — GEQ / EQ / AND / NOR, optionally OR’d with FLUSH).

Source

pub fn wait_value_64( &self, addr: CUdeviceptr, value: u64, flags: u32, ) -> Result<()>

Source

pub fn batch_mem_op( &self, ops: &mut [CUstreamBatchMemOpParams], flags: u32, ) -> Result<()>

Submit a batch of wait/write value ops atomically on this stream. ops is typically a small array built via baracuda_cuda_sys::types::CUstreamBatchMemOpParams::wait_value_32 etc.

Source

pub fn capture_info(&self) -> Result<(bool, u64, CUgraph)>

Query stream-capture state. Returns (active, capture_id, graph_handle) where active is true if the stream is currently capturing. The graph handle is only meaningful while capturing.