Struct cust::memory::DeviceBuffer

source · [−]

#[repr(C)]
pub struct DeviceBuffer<T: DeviceCopy> { /* private fields */ }

Expand description

Fixed-size device-side buffer. Provides basic access to device memory.

Implementations

source

impl<T: DeviceCopy> DeviceBuffer<T>

source

pub unsafe fn uninitialized(size: usize) -> CudaResult<Self>

Allocate a new device buffer large enough to hold size T’s, but without initializing the contents.

Errors

If the allocation fails, returns the error from CUDA. If size is large enough that size * mem::sizeof::<T>() overflows usize, then returns InvalidMemoryAllocation.

Safety

The caller must ensure that the contents of the buffer are initialized before reading from the buffer.

pub unsafe fn uninitialized_async(
size: usize,
stream: &Stream
) -> CudaResult<Self>

Allocates device memory asynchronously on a stream, without initializing it.

This doesn’t actually allocate if T is zero sized.

The allocated memory retains all of the unsafety of DeviceBuffer::uninitialized, with the additional consideration that the memory cannot be used until it is actually allocated on the stream. This means proper stream ordering semantics must be followed, such as only enqueing kernel launches that use the memory AFTER the allocation call.

You can synchronize the stream to ensure the memory allocation operation is complete.

source

pub fn drop_async(self, stream: &Stream) -> CudaResult<()>

Enqueues an operation to free the memory backed by this DeviceBuffer on a particular stream. The stream will free the allocation as soon as it reaches the operation in the stream. You can ensure the memory is freed by synchronizing the stream.

This function uses internal memory pool semantics. Async allocations will reserve memory in the default memory pool in the stream, and async frees will release the memory back to the pool for further use by async allocations.

The memory inside of the pool is all freed back to the OS once the stream is synchronized unless a custom pool is configured to not do so.

Examples

use cust::{memory::*, stream::*};
let stream = Stream::new(StreamFlags::DEFAULT, None)?;
let mut host_vals = [1, 2, 3];
unsafe {
    let mut allocated = DeviceBuffer::from_slice_async(&[4u8, 5, 6], &stream)?;
    allocated.async_copy_to(&mut host_vals, &stream)?;
    allocated.drop_async(&stream)?;
}
// ensure all async ops are done before trying to access the value
stream.synchronize()?;
assert_eq!(host_vals, [4, 5, 6]);

source

pub unsafe fn from_raw_parts(
ptr: DevicePointer<T>,
capacity: usize
) -> DeviceBuffer<T>

Creates a DeviceBuffer<T> directly from the raw components of another device buffer.

Safety

This is highly unsafe, due to the number of invariants that aren’t checked:

ptr needs to have been previously allocated via DeviceBuffer or cuda_malloc.
ptr’s T needs to have the same size and alignment as it was allocated with.
capacity needs to be the capacity that the pointer was allocated with.

Violating these may cause problems like corrupting the CUDA driver’s internal data structures.

The ownership of ptr is effectively transferred to the DeviceBuffer<T> which may then deallocate, reallocate or change the contents of memory pointed to by the pointer at will. Ensure that nothing else uses the pointer after calling this function.

Examples

use std::mem;
use cust::memory::*;

let mut buffer = DeviceBuffer::from_slice(&[0u64; 5]).unwrap();
let ptr = buffer.as_device_ptr();
let size = buffer.len();

mem::forget(buffer);

let buffer = unsafe { DeviceBuffer::from_raw_parts(ptr, size) };

source

pub fn drop(dev_buf: DeviceBuffer<T>) -> DropResult<DeviceBuffer<T>>

Destroy a DeviceBuffer, returning an error.

Deallocating device memory can return errors from previous asynchronous work. This function destroys the given buffer and returns the error and the un-destroyed buffer on failure.

Example

use cust::memory::*;
let x = DeviceBuffer::from_slice(&[10, 20, 30]).unwrap();
match DeviceBuffer::drop(x) {
    Ok(()) => println!("Successfully destroyed"),
    Err((e, buf)) => {
        println!("Failed to destroy buffer: {:?}", e);
        // Do something with buf
    },
}

source

impl<T: DeviceCopy + Zeroable> DeviceBuffer<T>

source

pub fn zeroed(size: usize) -> CudaResult<Self>

This is supported on crate feature bytemuck only.

Allocate device memory and fill it with zeroes (0u8).

This doesn’t actually allocate if T is zero-sized.

Examples

use cust::memory::*;
let mut zero = DeviceBuffer::zeroed(4).unwrap();
let mut values = [1u8, 2, 3, 4];
zero.copy_to(&mut values).unwrap();
assert_eq!(values, [0; 4]);

source

pub unsafe fn zeroed_async(size: usize, stream: &Stream) -> CudaResult<Self>

This is supported on crate feature bytemuck only.

Allocates device memory asynchronously and asynchronously fills it with zeroes (0u8).

This doesn’t actually allocate if T is zero-sized.

Safety

This method enqueues two operations on the stream: An async allocation and an async memset. Because of this, you must ensure that:

The memory is not used in any way before it is actually allocated on the stream. You can ensure this happens by synchronizing the stream explicitly or using events.

Examples

use cust::{memory::*, stream::*};
let stream = Stream::new(StreamFlags::DEFAULT, None)?;
let mut values = [1u8, 2, 3, 4];
unsafe {
    let mut zero = DeviceBuffer::zeroed_async(4, &stream)?;
    zero.async_copy_to(&mut values, &stream)?;
    zero.drop_async(&stream)?;
}
stream.synchronize()?;
assert_eq!(values, [0; 4]);

source

impl<A: DeviceCopy + Pod> DeviceBuffer<A>

source

pub fn cast<B: Pod + DeviceCopy>(self) -> DeviceBuffer

This is supported on crate feature bytemuck only.

Same as DeviceBuffer::try_cast but panics if the cast fails.

Panics

See DeviceBuffer::try_cast.

source

pub fn try_cast<B: Pod + DeviceCopy>(
self
) -> Result<DeviceBuffer, PodCastError>

This is supported on crate feature bytemuck only.

Tries to convert a DeviceBuffer of type A to a DeviceBuffer of type B. Returning an error if it failed.

The length of the buffer after the conversion may have changed.

Failure

If the target type has a greater alignment requirement.
If the target element type is a different size and the output buffer wouldn’t have a whole number of elements. Such as 3 x u16 -> 1.5 x u32.
If either type is a ZST (but not both).

source

impl<T: DeviceCopy> DeviceBuffer<T>

source

pub fn from_slice(slice: &[T]) -> CudaResult<Self>

Allocate a new device buffer of the same size as slice, initialized with a clone of the data in slice.

Errors

If the allocation fails, returns the error from CUDA.

Examples

use cust::memory::*;
let values = [0u64; 5];
let mut buffer = DeviceBuffer::from_slice(&values).unwrap();

source

pub unsafe fn from_slice_async(slice: &[T], stream: &Stream) -> CudaResult<Self>

Asynchronously allocate a new buffer of the same size as slice, initialized with a clone of the data in slice.

Safety

For why this function is unsafe, see AsyncCopyDestination

Errors

If the allocation fails, returns the error from CUDA.

Examples

use cust::memory::*;
use cust::stream::{Stream, StreamFlags};

let stream = Stream::new(StreamFlags::NON_BLOCKING, None).unwrap();
let values = [0u64; 5];
unsafe {
    let mut buffer = DeviceBuffer::from_slice_async(&values, &stream).unwrap();
    stream.synchronize();
    // Perform some operation on the buffer
}

source

pub fn as_slice(&self) -> &DeviceSlice<T>

Explicitly creates a DeviceSlice from this buffer.

Methods from Deref<Target = DeviceSlice<T>>

source

pub fn as_host_vec(&self) -> CudaResult<Vec<T>>

source

pub fn len(&self) -> usize

Returns the number of elements in the slice.

Examples

use cust::memory::*;
let a = DeviceBuffer::from_slice(&[1, 2, 3]).unwrap();
assert_eq!(a.len(), 3);

source

pub fn is_empty(&self) -> bool

Returns true if the slice has a length of 0.

Examples

use cust::memory::*;
let a : DeviceBuffer<u64> = unsafe { DeviceBuffer::uninitialized(0).unwrap() };
assert!(a.is_empty());

source

pub fn as_device_ptr(&self) -> DevicePointer<T>

Return a raw device-pointer to the slice’s buffer.

The caller must ensure that the slice outlives the pointer this function returns, or else it will end up pointing to garbage. The caller must also ensure that the pointer is not dereferenced by the CPU.

Examples:

use cust::memory::*;
let a = DeviceBuffer::from_slice(&[1, 2, 3]).unwrap();
println!("{:p}", a.as_ptr());

source

pub fn set_8(&mut self, value: u8) -> CudaResult<()>

This is supported on crate feature bytemuck only.

Sets the memory range of this buffer to contiguous 8-bit values of value.

In total it will set sizeof<T> * len values of value contiguously.

source

pub unsafe fn set_8_async(
 &mut self,
 value: u8,
 stream: &Stream
) -> CudaResult<()>

This is supported on crate feature bytemuck only.

Sets the memory range of this buffer to contiguous 8-bit values of value asynchronously.

In total it will set sizeof<T> * len values of value contiguously.

Safety

This operation is async so it does not complete immediately, it uses stream-ordering semantics. Therefore you should not read/write from/to the memory range until the operation is complete.

source

pub fn set_16(&mut self, value: u16) -> CudaResult<()>

This is supported on crate feature bytemuck only.

Sets the memory range of this buffer to contiguous 16-bit values of value.

In total it will set (sizeof<T> / 2) * len values of value contiguously.

Panics

Panics if:

self.ptr % 2 != 0 (the pointer is not aligned to at least 2 bytes).
(size_of::<T>() * self.len) % 2 != 0 (the data size is not a multiple of 2 bytes)

source

pub unsafe fn set_16_async(
 &mut self,
 value: u16,
 stream: &Stream
) -> CudaResult<()>

This is supported on crate feature bytemuck only.

Sets the memory range of this buffer to contiguous 16-bit values of value asynchronously.

In total it will set (sizeof<T> / 2) * len values of value contiguously.

Panics

Panics if:

self.ptr % 2 != 0 (the pointer is not aligned to at least 2 bytes).
(size_of::<T>() * self.len) % 2 != 0 (the data size is not a multiple of 2 bytes)

Safety

This operation is async so it does not complete immediately, it uses stream-ordering semantics. Therefore you should not read/write from/to the memory range until the operation is complete.

source

pub fn set_32(&mut self, value: u32) -> CudaResult<()>

This is supported on crate feature bytemuck only.

Sets the memory range of this buffer to contiguous 32-bit values of value.

In total it will set (sizeof<T> / 4) * len values of value contiguously.

Panics

Panics if:

self.ptr % 4 != 0 (the pointer is not aligned to at least 4 bytes).
(size_of::<T>() * self.len) % 4 != 0 (the data size is not a multiple of 4 bytes)

source

pub unsafe fn set_32_async(
 &mut self,
 value: u32,
 stream: &Stream
) -> CudaResult<()>

This is supported on crate feature bytemuck only.

Sets the memory range of this buffer to contiguous 32-bit values of value asynchronously.

In total it will set (sizeof<T> / 4) * len values of value contiguously.

Panics

Panics if:

self.ptr % 4 != 0 (the pointer is not aligned to at least 4 bytes).
(size_of::<T>() * self.len) % 4 != 0 (the data size is not a multiple of 4 bytes)

Safety

This operation is async so it does not complete immediately, it uses stream-ordering semantics. Therefore you should not read/write from/to the memory range until the operation is complete.

source

pub fn set_zero(&mut self) -> CudaResult<()>

Sets this slice’s data to zero.

source

pub unsafe fn set_zero_async(&mut self, stream: &Stream) -> CudaResult<()>

Sets this slice’s data to zero asynchronously.

Safety

This operation is async so it does not complete immediately, it uses stream-ordering semantics. Therefore you should not read/write from/to the memory range until the operation is complete.