pub struct Stream { /* private fields */ }Expand description
An asynchronous work queue on a CUDA device.
Work submitted to the same stream executes in order; work on different
streams may run concurrently, subject to device scheduling. Streams are
Send + Sync — CUDA explicitly permits concurrent submission from
multiple host threads.
Implementations§
Source§impl Stream
impl Stream
Sourcepub fn begin_capture(&self, mode: CaptureMode) -> Result<()>
pub fn begin_capture(&self, mode: CaptureMode) -> Result<()>
Begin recording operations submitted to this stream into a CUDA graph.
Call Stream::end_capture to retrieve the resulting Graph.
Most operations (kernel launches, memcpys, event records) enqueued
between these two calls are captured rather than executed.
Sourcepub fn end_capture(&self) -> Result<Graph>
pub fn end_capture(&self) -> Result<Graph>
Stop capture and return the graph of everything that was recorded.
Sourcepub fn capture<F>(&self, mode: CaptureMode, f: F) -> Result<Graph>
pub fn capture<F>(&self, mode: CaptureMode, f: F) -> Result<Graph>
Convenience wrapper: run f, capturing everything it submits to
this stream, and return the resulting graph.
f should enqueue its work on self. If it errors out mid-capture,
we still end the capture to avoid leaking the captured state.
Sourcepub fn is_capturing(&self) -> Result<bool>
pub fn is_capturing(&self) -> Result<bool>
true if this stream is currently in capture mode.
Source§impl Stream
impl Stream
Sourcepub fn new(context: &Context) -> Result<Self>
pub fn new(context: &Context) -> Result<Self>
Create a new stream on context with default flags (blocking wrt the
legacy default stream).
Sourcepub fn non_blocking(context: &Context) -> Result<Self>
pub fn non_blocking(context: &Context) -> Result<Self>
Create a non-blocking stream — work on this stream does not synchronize with the legacy null stream.
Sourcepub fn with_flags(context: &Context, flags: u32) -> Result<Self>
pub fn with_flags(context: &Context, flags: u32) -> Result<Self>
Create a stream with a raw flag bitmask (see CUstream_flags).
Sourcepub fn with_priority(
context: &Context,
flags: u32,
priority: i32,
) -> Result<Self>
pub fn with_priority( context: &Context, flags: u32, priority: i32, ) -> Result<Self>
Create a stream with a specific priority. Use
Context::stream_priority_range to discover the legal range on
this device (lower = higher priority).
Sourcepub fn launch_host_func<F>(&self, f: F) -> Result<()>
pub fn launch_host_func<F>(&self, f: F) -> Result<()>
Enqueue a host-side callback on this stream. The callback runs on a driver-owned thread after all prior stream work completes.
The closure is boxed and freed after it runs; a panic inside will abort the process (there’s no way to propagate it through the C callback). Keep the closure simple.
Sourcepub fn synchronize(&self) -> Result<()>
pub fn synchronize(&self) -> Result<()>
Block the calling thread until all work previously enqueued on this stream has completed.
Sourcepub fn is_complete(&self) -> Result<bool>
pub fn is_complete(&self) -> Result<bool>
Ok(true) if the stream has completed all queued work, Ok(false)
if work is still outstanding.
Sourcepub fn memcpy_dtod<T: DeviceRepr>(
&self,
src: &DeviceBuffer<T>,
dst: &mut DeviceBuffer<T>,
) -> Result<()>
pub fn memcpy_dtod<T: DeviceRepr>( &self, src: &DeviceBuffer<T>, dst: &mut DeviceBuffer<T>, ) -> Result<()>
Device-to-device async copy scheduled on this stream.
Sugar over [DeviceBuffer::copy_to_device_async] with the borrows
flipped the way the call-site usually wants them: the destination
buffer is taken by &mut, so the borrow checker will catch
aliasing bugs at compile time. src.len() must equal dst.len().
let ctx = Context::new(&Device::get(0)?)?;
let stream = Stream::new(&ctx)?;
let src: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
let mut dst: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
stream.memcpy_dtod(&src, &mut dst)?;Sourcepub fn id(&self) -> Result<u64>
pub fn id(&self) -> Result<u64>
Return the driver-assigned 64-bit ID for this stream. Useful for correlating CUPTI traces against baracuda streams.
Sourcepub fn copy_attributes_from(&self, src: &Stream) -> Result<()>
pub fn copy_attributes_from(&self, src: &Stream) -> Result<()>
Copy all CUDA-managed attributes (access policy window, sync
policy) from src onto self. Does not copy priority or flags
(those are set at stream creation time).
Sourcepub fn wait_event(&self, event: &Event, flags: u32) -> Result<()>
pub fn wait_event(&self, event: &Event, flags: u32) -> Result<()>
Make this stream wait for event to complete before processing
any subsequent work. flags is typically 0
(CU_EVENT_WAIT_DEFAULT). Use this for cross-stream
dependencies — record an event on stream A, then have stream B
wait on it.
Sourcepub unsafe fn get_attribute(
&self,
attr: i32,
value_out: *mut c_void,
) -> Result<()>
pub unsafe fn get_attribute( &self, attr: i32, value_out: *mut c_void, ) -> Result<()>
Read a CUstreamAttrValue for attr from this stream. The
caller passes a writable buffer big enough for the largest
attribute value (CUstreamAttrValue is up to 48 bytes).
Use the CU_STREAM_ATTRIBUTE_* constants for attr.
§Safety
value_out must be a writable region matching the layout of the
CUstreamAttrValue variant for attr.
Sourcepub unsafe fn set_attribute(
&self,
attr: i32,
value: *const c_void,
) -> Result<()>
pub unsafe fn set_attribute( &self, attr: i32, value: *const c_void, ) -> Result<()>
Set a CUstreamAttrValue on this stream. See Self::get_attribute
for the value layout.
§Safety
value must point at a properly-initialized CUstreamAttrValue
variant for attr.
Sourcepub fn attach_mem_async(
&self,
dptr: CUdeviceptr,
length: usize,
flags: u32,
) -> Result<()>
pub fn attach_mem_async( &self, dptr: CUdeviceptr, length: usize, flags: u32, ) -> Result<()>
Associate a managed-memory region with this stream. Pass
flags = 0 for the default (“one thread”).
Sourcepub fn write_value_32(
&self,
addr: CUdeviceptr,
value: u32,
flags: u32,
) -> Result<()>
pub fn write_value_32( &self, addr: CUdeviceptr, value: u32, flags: u32, ) -> Result<()>
Enqueue a 32-bit write of value to device memory addr on this
stream, ordered like any other stream op.
flags is a bitmask of
baracuda_cuda_sys::types::CUstreamWriteValue_flags.
pub fn write_value_64( &self, addr: CUdeviceptr, value: u64, flags: u32, ) -> Result<()>
Sourcepub fn wait_value_32(
&self,
addr: CUdeviceptr,
value: u32,
flags: u32,
) -> Result<()>
pub fn wait_value_32( &self, addr: CUdeviceptr, value: u32, flags: u32, ) -> Result<()>
Block the stream until the device memory at addr satisfies the
condition specified by flags (see
baracuda_cuda_sys::types::CUstreamWaitValue_flags —
GEQ / EQ / AND / NOR, optionally OR’d with FLUSH).
pub fn wait_value_64( &self, addr: CUdeviceptr, value: u64, flags: u32, ) -> Result<()>
Sourcepub fn batch_mem_op(
&self,
ops: &mut [CUstreamBatchMemOpParams],
flags: u32,
) -> Result<()>
pub fn batch_mem_op( &self, ops: &mut [CUstreamBatchMemOpParams], flags: u32, ) -> Result<()>
Submit a batch of wait/write value ops atomically on this stream.
ops is typically a small array built via
baracuda_cuda_sys::types::CUstreamBatchMemOpParams::wait_value_32
etc.