Struct DeviceBuffer

Source

pub struct DeviceBuffer<T: DeviceRepr> { /* private fields */ }

Expand description

Owned, typed allocation of device memory.

The underlying bytes are freed when the buffer drops. Clone/copy is deliberately not implemented — copying len bytes of device memory is not free, so baracuda makes the user spell it out as an explicit stream-ordered D2D memcpy.

Implementations§

Source §

impl<T: DeviceRepr> DeviceBuffer<T>

Source

pub fn new(context: &Context, len: usize) -> Result<Self>

Allocate an uninitialized buffer of len elements on the given context’s device.

len == 0 (or a zero-sized T) short-circuits: CUDA rejects 0-byte allocations with CUDA_ERROR_INVALID_VALUE, so we produce a sentinel null-pointer buffer. Drop knows to skip the free on such buffers, and every copy method below treats len == 0 as a no-op.

Source

pub fn new_async(context: &Context, len: usize, stream: &Stream) -> Result<Self>

Allocate len elements asynchronously on stream using the device’s default memory pool. Requires CUDA 11.2+.

Unlike new, this call doesn’t block — the allocation becomes usable for any subsequent operation on stream in stream order. Use free_async to reclaim on the same stream, or let Drop reclaim synchronously.

Source

pub fn free_async(self, stream: &Stream) -> Result<()>

Free self asynchronously on stream. The buffer becomes invalid stream-ordered-after this call completes on the device. Consumes self so Drop does not also try to free.

Requires CUDA 11.2+.

Source

pub fn zeros(context: &Context, len: usize) -> Result<Self>

Allocate and fill with zero bytes. Zero-length allocations are a no-op (no cuMemsetD8 call is issued).

Source

pub fn zero(&self) -> Result<()>

Synchronously fill this buffer with zero bytes via cuMemsetD8. Empty buffers are a no-op (no FFI call). Use this to reuse an existing allocation when you want zeroed contents without paying the allocation cost a second time.

Source

pub fn zero_async(&self, stream: &Stream) -> Result<()>

Stream-ordered zero-fill via cuMemsetD8Async. Empty buffers are a no-op. The fill is ordered with respect to other work submitted to stream; synchronize the stream before reading from the host.

Source

pub fn from_slice(context: &Context, src: &[T]) -> Result<Self>

Allocate and copy src synchronously from host memory. Empty slices produce a sentinel zero-length buffer (no CUDA calls).

Source

pub fn copy_from_host(&self, src: &[T]) -> Result<()>

Synchronous H2D copy. src.len() must equal self.len(). No-op when the buffer is empty — no cuMemcpy is issued.

Source

pub fn copy_to_host(&self, dst: &mut [T]) -> Result<()>

Synchronous D2H copy. dst.len() must equal self.len(). No-op on empty buffers.

Source

pub fn copy_from_host_async(&self, src: &[T], stream: &Stream) -> Result<()>

Asynchronous H2D copy on stream. No-op on empty buffers.

Source

pub fn copy_to_host_async(&self, dst: &mut [T], stream: &Stream) -> Result<()>

Asynchronous D2H copy on stream. No-op on empty buffers.

Source

pub fn copy_to_device(&self, dst: &DeviceBuffer<T>) -> Result<()>

Device-to-device copy into another buffer of the same length. No-op on empty buffers.

Source

pub fn copy_to_device_async( &self, dst: &DeviceBuffer<T>, stream: &Stream, ) -> Result<()>

Asynchronous device-to-device copy on stream. No-op on empty buffers.

Source

pub fn len(&self) -> usize

Number of elements in the buffer.

Source

pub fn byte_size(&self) -> usize

Size of the buffer in bytes.

Source

pub fn is_empty(&self) -> bool

true if the buffer has zero elements.

Source

pub fn context(&self) -> &Context

The Context this buffer was allocated in.

Source

pub fn as_raw(&self) -> CUdeviceptr

Raw device pointer. Use with care — baracuda still owns the allocation.

Source

pub fn as_slice(&self) -> DeviceSlice<'_, T>

Borrow the whole buffer as a DeviceSlice<'_, T>.

Source

pub fn as_slice_mut(&mut self) -> DeviceSliceMut<'_, T>

Borrow the whole buffer as a DeviceSliceMut<'_, T>.

Source

pub fn slice(&self, range: Range<usize>) -> DeviceSlice<'_, T>

Borrow a sub-range of the buffer as an immutable DeviceSlice.

Panics if the range is out of bounds or inverted. Element indices are used — the byte offset is range.start * size_of::<T>().

let ctx = Context::new(&Device::get(0)?)?;
let buf: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
let first_half = buf.slice(0..512);
let tail = buf.slice(512..1024);

Source

pub fn slice_mut(&mut self, range: Range<usize>) -> DeviceSliceMut<'_, T>

Mutable counterpart of slice.

Source §

impl DeviceBuffer<u8>

Source

pub fn view_as<U: DeviceRepr>(&self) -> DeviceSlice<'_, U>

Reinterpret the byte buffer as an immutable typed DeviceSlice<'_, U>.

The recommended primitive for layering safe typed APIs over a byte-shaped storage substrate — e.g. a unified-binding table that stores all device tensors as DeviceBuffer<u8> and only acquires element types at the edges where it calls into typed CUDA libraries.

Alignment is guaranteed: cuMemAlloc returns 256-byte-aligned pointers, which satisfies any U: DeviceRepr we ship today and any reasonable user type.