Skip to main content

Stream

Struct Stream 

Source
pub struct Stream { /* private fields */ }
Expand description

An asynchronous work queue on a CUDA device.

Work submitted to the same stream executes in order; work on different streams may run concurrently, subject to device scheduling. Streams are Send + Sync — CUDA explicitly permits concurrent submission from multiple host threads.

Implementations§

Source§

impl Stream

Source

pub fn begin_capture(&self, mode: CaptureMode) -> Result<()>

Begin recording operations submitted to this stream into a CUDA graph.

Call Stream::end_capture to retrieve the resulting Graph. Most operations (kernel launches, memcpys, event records) enqueued between these two calls are captured rather than executed.

Source

pub fn end_capture(&self) -> Result<Graph>

Stop capture and return the graph of everything that was recorded.

Source

pub fn capture<F>(&self, mode: CaptureMode, f: F) -> Result<Graph>
where F: FnOnce(&Stream) -> Result<()>,

Convenience wrapper: run f, capturing everything it submits to this stream, and return the resulting graph.

f should enqueue its work on self. If it errors out mid-capture, we still end the capture to avoid leaking the captured state.

Source

pub fn is_capturing(&self) -> Result<bool>

true if this stream is currently in capture mode.

Source§

impl Stream

Source

pub fn new(context: &Context) -> Result<Self>

Create a new stream on context with default flags (blocking wrt the legacy default stream).

Source

pub fn non_blocking(context: &Context) -> Result<Self>

Create a non-blocking stream — work on this stream does not synchronize with the legacy null stream.

Source

pub fn with_flags(context: &Context, flags: u32) -> Result<Self>

Create a stream with a raw flag bitmask (see CUstream_flags).

Source

pub fn with_priority( context: &Context, flags: u32, priority: i32, ) -> Result<Self>

Create a stream with a specific priority. Use Context::stream_priority_range to discover the legal range on this device (lower = higher priority).

Source

pub fn priority(&self) -> Result<i32>

This stream’s scheduling priority.

Source

pub fn flags(&self) -> Result<u32>

This stream’s flags bitmask.

Source

pub fn launch_host_func<F>(&self, f: F) -> Result<()>
where F: FnOnce() + Send + 'static,

Enqueue a host-side callback on this stream. The callback runs on a driver-owned thread after all prior stream work completes.

The closure is boxed and freed after it runs; a panic inside will abort the process (there’s no way to propagate it through the C callback). Keep the closure simple.

Source

pub fn synchronize(&self) -> Result<()>

Block the calling thread until all work previously enqueued on this stream has completed.

Source

pub fn is_complete(&self) -> Result<bool>

Ok(true) if the stream has completed all queued work, Ok(false) if work is still outstanding.

Source

pub fn context(&self) -> &Context

The Context this stream lives in.

Source

pub fn as_raw(&self) -> CUstream

Raw CUstream handle. Use with care.

Source

pub fn memcpy_dtod<T: DeviceRepr>( &self, src: &DeviceBuffer<T>, dst: &mut DeviceBuffer<T>, ) -> Result<()>

Device-to-device async copy scheduled on this stream.

Sugar over [DeviceBuffer::copy_to_device_async] with the borrows flipped the way the call-site usually wants them: the destination buffer is taken by &mut, so the borrow checker will catch aliasing bugs at compile time. src.len() must equal dst.len().

let ctx = Context::new(&Device::get(0)?)?;
let stream = Stream::new(&ctx)?;
let src: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
let mut dst: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
stream.memcpy_dtod(&src, &mut dst)?;
Source

pub fn id(&self) -> Result<u64>

Return the driver-assigned 64-bit ID for this stream. Useful for correlating CUPTI traces against baracuda streams.

Source

pub fn copy_attributes_from(&self, src: &Stream) -> Result<()>

Copy all CUDA-managed attributes (access policy window, sync policy) from src onto self. Does not copy priority or flags (those are set at stream creation time).

Source

pub fn wait_event(&self, event: &Event, flags: u32) -> Result<()>

Make this stream wait for event to complete before processing any subsequent work. flags is typically 0 (CU_EVENT_WAIT_DEFAULT). Use this for cross-stream dependencies — record an event on stream A, then have stream B wait on it.

Source

pub unsafe fn get_attribute( &self, attr: i32, value_out: *mut c_void, ) -> Result<()>

Read a CUstreamAttrValue for attr from this stream. The caller passes a writable buffer big enough for the largest attribute value (CUstreamAttrValue is up to 48 bytes). Use the CU_STREAM_ATTRIBUTE_* constants for attr.

§Safety

value_out must be a writable region matching the layout of the CUstreamAttrValue variant for attr.

Source

pub unsafe fn set_attribute( &self, attr: i32, value: *const c_void, ) -> Result<()>

Set a CUstreamAttrValue on this stream. See Self::get_attribute for the value layout.

§Safety

value must point at a properly-initialized CUstreamAttrValue variant for attr.

Source

pub fn attach_mem_async( &self, dptr: CUdeviceptr, length: usize, flags: u32, ) -> Result<()>

Associate a managed-memory region with this stream. Pass flags = 0 for the default (“one thread”).

Source

pub fn write_value_32( &self, addr: CUdeviceptr, value: u32, flags: u32, ) -> Result<()>

Enqueue a 32-bit write of value to device memory addr on this stream, ordered like any other stream op.

flags is a bitmask of baracuda_cuda_sys::types::CUstreamWriteValue_flags.

Source

pub fn write_value_64( &self, addr: CUdeviceptr, value: u64, flags: u32, ) -> Result<()>

Source

pub fn wait_value_32( &self, addr: CUdeviceptr, value: u32, flags: u32, ) -> Result<()>

Block the stream until the device memory at addr satisfies the condition specified by flags (see baracuda_cuda_sys::types::CUstreamWaitValue_flags — GEQ / EQ / AND / NOR, optionally OR’d with FLUSH).

Source

pub fn wait_value_64( &self, addr: CUdeviceptr, value: u64, flags: u32, ) -> Result<()>

Source

pub fn batch_mem_op( &self, ops: &mut [CUstreamBatchMemOpParams], flags: u32, ) -> Result<()>

Submit a batch of wait/write value ops atomically on this stream. ops is typically a small array built via baracuda_cuda_sys::types::CUstreamBatchMemOpParams::wait_value_32 etc.

Source

pub fn capture_info(&self) -> Result<(bool, u64, CUgraph)>

Query stream-capture state. Returns (active, capture_id, graph_handle) where active is true if the stream is currently capturing. The graph handle is only meaningful while capturing.

Trait Implementations§

Source§

impl Clone for Stream

Source§

fn clone(&self) -> Stream

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Stream

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.