Skip to main content

DeviceBuffer

Struct DeviceBuffer 

Source
pub struct DeviceBuffer<T: DeviceRepr> { /* private fields */ }
Expand description

Owned, typed allocation of device memory.

The underlying bytes are freed when the buffer drops. Clone/copy is deliberately not implemented — copying len bytes of device memory is not free, so baracuda makes the user spell it out as an explicit stream-ordered D2D memcpy.

Implementations§

Source§

impl<T: DeviceRepr> DeviceBuffer<T>

Source

pub fn new(context: &Context, len: usize) -> Result<Self>

Allocate an uninitialized buffer of len elements on the given context’s device.

len == 0 (or a zero-sized T) short-circuits: CUDA rejects 0-byte allocations with CUDA_ERROR_INVALID_VALUE, so we produce a sentinel null-pointer buffer. Drop knows to skip the free on such buffers, and every copy method below treats len == 0 as a no-op.

Source

pub fn new_async(context: &Context, len: usize, stream: &Stream) -> Result<Self>

Allocate len elements asynchronously on stream using the device’s default memory pool. Requires CUDA 11.2+.

Unlike new, this call doesn’t block — the allocation becomes usable for any subsequent operation on stream in stream order. Use free_async to reclaim on the same stream, or let Drop reclaim synchronously.

Source

pub fn free_async(self, stream: &Stream) -> Result<()>

Free self asynchronously on stream. The buffer becomes invalid stream-ordered-after this call completes on the device. Consumes self so Drop does not also try to free.

Requires CUDA 11.2+.

Source

pub fn zeros(context: &Context, len: usize) -> Result<Self>

Allocate and fill with zero bytes. Zero-length allocations are a no-op (no cuMemsetD8 call is issued).

Source

pub fn zero(&self) -> Result<()>

Synchronously fill this buffer with zero bytes via cuMemsetD8. Empty buffers are a no-op (no FFI call). Use this to reuse an existing allocation when you want zeroed contents without paying the allocation cost a second time.

Source

pub fn zero_async(&self, stream: &Stream) -> Result<()>

Stream-ordered zero-fill via cuMemsetD8Async. Empty buffers are a no-op. The fill is ordered with respect to other work submitted to stream; synchronize the stream before reading from the host.

Source

pub fn from_slice(context: &Context, src: &[T]) -> Result<Self>

Allocate and copy src synchronously from host memory. Empty slices produce a sentinel zero-length buffer (no CUDA calls).

Source

pub fn copy_from_host(&self, src: &[T]) -> Result<()>

Synchronous H2D copy. src.len() must equal self.len(). No-op when the buffer is empty — no cuMemcpy is issued.

Source

pub fn copy_to_host(&self, dst: &mut [T]) -> Result<()>

Synchronous D2H copy. dst.len() must equal self.len(). No-op on empty buffers.

Source

pub fn copy_from_host_async(&self, src: &[T], stream: &Stream) -> Result<()>

Asynchronous H2D copy on stream. No-op on empty buffers.

Source

pub fn copy_to_host_async(&self, dst: &mut [T], stream: &Stream) -> Result<()>

Asynchronous D2H copy on stream. No-op on empty buffers.

Source

pub fn copy_to_device(&self, dst: &DeviceBuffer<T>) -> Result<()>

Device-to-device copy into another buffer of the same length. No-op on empty buffers.

Source

pub fn copy_to_device_async( &self, dst: &DeviceBuffer<T>, stream: &Stream, ) -> Result<()>

Asynchronous device-to-device copy on stream. No-op on empty buffers.

Source

pub fn len(&self) -> usize

Number of elements in the buffer.

Source

pub fn byte_size(&self) -> usize

Size of the buffer in bytes.

Source

pub fn is_empty(&self) -> bool

true if the buffer has zero elements.

Source

pub fn context(&self) -> &Context

The Context this buffer was allocated in.

Source

pub fn as_raw(&self) -> CUdeviceptr

Raw device pointer. Use with care — baracuda still owns the allocation.

Source

pub fn as_slice(&self) -> DeviceSlice<'_, T>

Borrow the whole buffer as a DeviceSlice<'_, T>.

Source

pub fn as_slice_mut(&mut self) -> DeviceSliceMut<'_, T>

Borrow the whole buffer as a DeviceSliceMut<'_, T>.

Source

pub fn slice(&self, range: Range<usize>) -> DeviceSlice<'_, T>

Borrow a sub-range of the buffer as an immutable DeviceSlice.

Panics if the range is out of bounds or inverted. Element indices are used — the byte offset is range.start * size_of::<T>().

let ctx = Context::new(&Device::get(0)?)?;
let buf: DeviceBuffer<f32> = DeviceBuffer::zeros(&ctx, 1024)?;
let first_half = buf.slice(0..512);
let tail = buf.slice(512..1024);
Source

pub fn slice_mut(&mut self, range: Range<usize>) -> DeviceSliceMut<'_, T>

Mutable counterpart of slice.

Source§

impl DeviceBuffer<u8>

Source

pub fn view_as<U: DeviceRepr>(&self) -> DeviceSlice<'_, U>

Reinterpret the byte buffer as an immutable typed DeviceSlice<'_, U>.

The recommended primitive for layering safe typed APIs over a byte-shaped storage substrate — e.g. a unified-binding table that stores all device tensors as DeviceBuffer<u8> and only acquires element types at the edges where it calls into typed CUDA libraries.

Alignment is guaranteed: cuMemAlloc returns 256-byte-aligned pointers, which satisfies any U: DeviceRepr we ship today and any reasonable user type.

§Panics

Panics if the buffer’s byte length isn’t an integer multiple of size_of::<U>(). Zero-sized U produces a zero-length view.

Source

pub fn view_as_mut<U: DeviceRepr>(&mut self) -> DeviceSliceMut<'_, U>

Mutable counterpart of view_as.

Trait Implementations§

Source§

impl<T: DeviceRepr> Debug for DeviceBuffer<T>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<T: DeviceRepr> DevicePtr<T> for DeviceBuffer<T>

Source§

fn device_ptr(&self) -> CUdeviceptr

Raw device pointer to element 0.
Source§

fn len(&self) -> usize

Number of T elements visible through this pointer.
Source§

fn is_empty(&self) -> bool

true if len is 0.
Source§

fn byte_size(&self) -> usize

Size in bytes (len * size_of::<T>()).
Source§

impl<T: DeviceRepr> DevicePtrMut<T> for DeviceBuffer<T>

Source§

fn device_ptr_mut(&mut self) -> CUdeviceptr

Raw mutable device pointer.
Source§

impl<T: DeviceRepr> Drop for DeviceBuffer<T>

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl<T: DeviceRepr> KernelArg for &DeviceBuffer<T>

Source§

impl<T: DeviceRepr> KernelArg for &mut DeviceBuffer<T>

Source§

impl<T: DeviceRepr + Send> Send for DeviceBuffer<T>

Auto Trait Implementations§

§

impl<T> Freeze for DeviceBuffer<T>

§

impl<T> RefUnwindSafe for DeviceBuffer<T>
where T: RefUnwindSafe,

§

impl<T> Sync for DeviceBuffer<T>
where T: Sync,

§

impl<T> Unpin for DeviceBuffer<T>
where T: Unpin,

§

impl<T> UnsafeUnpin for DeviceBuffer<T>

§

impl<T> UnwindSafe for DeviceBuffer<T>
where T: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.