pub struct DeviceMemory<T> { /* private fields */ }Expand description
Represents a region of owned CUDA device memory for elements of type T.
Implementations§
Source§impl<T> DeviceMemory<T>
Associated utility functions.
impl<T> DeviceMemory<T>
Associated utility functions.
Sourcepub unsafe fn alloc(count: usize) -> Result<*mut T>
pub unsafe fn alloc(count: usize) -> Result<*mut T>
Allocates size bytes of linear memory on the device and returns a pointer to the allocated memory.
The allocated memory is suitably aligned for any kind of variable.
The memory is not cleared.
DeviceMemory::alloc returns crate::error::Status::OutOfMemory on allocation failure.
The device version of DeviceMemory::free cannot be used with a pointer allocated using the host API, and vice versa.
§Errors
Returns an error if the requested byte size overflows, CUDA cannot
allocate device memory, a previous asynchronous launch reports an error,
or CUDA reports runtime initialization diagnostics such as
crate::error::Status::NotInitialized, crate::error::Status::CallRequiresNewerDriver,
or crate::error::Status::NoDevice.
§Safety
The returned pointer is uninitialized device memory. The caller must use
it only for count elements of T and eventually free it with a
compatible CUDA free function.
pub unsafe fn alloc_managed( count: usize, flags: MemoryAttachFlags, ) -> Result<*mut T>
Sourcepub unsafe fn free(ptr: *mut T) -> Result<()>
pub unsafe fn free(ptr: *mut T) -> Result<()>
Frees the memory space pointed to by ptr, which must have been returned by a previous call to one of these allocation functions: DeviceMemory::alloc, sys::cudaMallocPitch, DeviceMemory::alloc_managed, DeviceMemory::alloc_async, or sys::cudaMallocFromPoolAsync.
This does not perform implicit synchronization when the pointer was allocated with DeviceMemory::alloc_async or sys::cudaMallocFromPoolAsync.
Callers must ensure that all accesses to this pointer have completed before invoking DeviceMemory::free.
For best performance and memory reuse, use DeviceMemory::free_async to free memory allocated via the stream ordered memory allocator.
For all other pointers, this call may perform implicit synchronization.
If DeviceMemory::free has already been called before, an error is returned.
If ptr is null, no operation is performed.
DeviceMemory::free returns an error on failure.
The device version of DeviceMemory::free cannot be used with a pointer allocated using the host API, and vice versa.
§Errors
Returns an error if CUDA cannot free ptr, ptr has already been
freed, a previous asynchronous launch reports an error, or CUDA reports
runtime initialization diagnostics.
§Safety
ptr must be null or a live allocation returned by a compatible CUDA
device allocation function, and no work may access it after it is freed.
Sourcepub unsafe fn copy(
dst: *mut T,
src: *const T,
count: usize,
kind: MemoryCopyKind,
) -> Result<()>
pub unsafe fn copy( dst: *mut T, src: *const T, count: usize, kind: MemoryCopyKind, ) -> Result<()>
Copies count elements from src to dst.
The transfer direction is specified by MemoryCopyKind.
MemoryCopyKind::Default is recommended when unified virtual addressing is available, in which case the transfer direction is inferred from the pointer values.
Calling DeviceMemory::copy with dst and src pointers that do not match the direction of the copy results in undefined behavior.
- Exhibits
synchronousbehavior for most use cases. - Memory regions requested must be either entirely registered with CUDA, or in the case of host pageable transfers, not registered
at all.
Memory regions spanning over allocations that are both registered and not registered with CUDA are not supported and
return
crate::error::Status::InvalidValue.
§Errors
Returns an error if the requested byte count overflows, CUDA rejects the pointer combination or copy kind, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.
§Safety
src and dst must be valid for count elements of T according to
kind, and the source and destination regions must not overlap unless
CUDA permits that transfer.
Sourcepub unsafe fn set(dst: *mut T, value: u8, count: usize) -> Result<()>
pub unsafe fn set(dst: *mut T, value: u8, count: usize) -> Result<()>
Fills the first count bytes of the memory area pointed to by ptr with the constant byte value.
This call is asynchronous with respect to the host unless ptr refers to pinned host memory.
See the CUDA memset synchronization rules for when this operation blocks the host.
§Errors
Returns an error if the requested byte count overflows, CUDA rejects the pointer or size, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.
§Safety
dst must be valid for writes of count * size_of::<T>() bytes and
must refer to memory that CUDA can memset.
pub unsafe fn alloc_host(size: usize) -> Result<*mut ()>
Sourcepub unsafe fn free_host(ptr: *mut ()) -> Result<()>
pub unsafe fn free_host(ptr: *mut ()) -> Result<()>
Frees host memory returned by DeviceMemory::alloc_host or DeviceMemory::alloc_pinned.
§Errors
Returns an error if CUDA cannot free the host allocation, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.
§Safety
ptr must be null or a live host allocation returned by a compatible
CUDA host allocation function.
Sourcepub unsafe fn alloc_pinned(
size: usize,
flags: HostAllocationFlags,
) -> Result<*mut ()>
pub unsafe fn alloc_pinned( size: usize, flags: HostAllocationFlags, ) -> Result<*mut ()>
Allocates size bytes of host memory that is page-locked and accessible to the device.
The driver tracks the allocated virtual memory ranges and automatically accelerates calls such as DeviceMemory::copy.
Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc().
Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging.
As a result, use this sparingly to allocate staging areas for data exchange between host and device.
flags selects allocation options:
HostAllocationFlags::DEFAULT: equivalent toDeviceMemory::alloc_host.HostAllocationFlags::PORTABLE: the memory returned by this call is considered pinned memory by all CUDA contexts, not just the one that performed the allocation.HostAllocationFlags::MAPPED: maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by callingsys::cudaHostGetDevicePointer.HostAllocationFlags::WRITE_COMBINED: allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers written by the CPU and read by the device via mapped pinned memory or host->device transfers.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
For HostAllocationFlags::MAPPED to have any effect, the CUDA context must support ContextFlags::MAP_HOST, which can be checked via Device::flags.
ContextFlags::MAP_HOST is implicitly set for contexts created via the runtime API.
HostAllocationFlags::MAPPED may be specified on CUDA contexts for devices that do not support mapped pinned memory.
The failure is deferred to sys::cudaHostGetDevicePointer because the memory may be mapped into other CUDA contexts via HostAllocationFlags::PORTABLE.
Memory allocated by this method must be freed with DeviceMemory::free_host.
§Errors
Returns an error if CUDA cannot allocate pinned host memory, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.
§Safety
The returned pointer is uninitialized host memory. The caller must ensure
it is accessed within size bytes and freed with DeviceMemory::free_host.
Sourcepub unsafe fn register_host(
ptr: *mut (),
size: usize,
flags: HostRegisterFlags,
) -> Result<()>
pub unsafe fn register_host( ptr: *mut (), size: usize, flags: HostRegisterFlags, ) -> Result<()>
Page-locks the memory range specified by ptr and size, and maps it for the devices selected by flags.
This memory range also is added to the same tracking mechanism as DeviceMemory::alloc_pinned to automatically accelerate calls to functions such as DeviceMemory::copy.
Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory that has not been registered.
Page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of memory available to the system for paging.
As a result, use this sparingly to register staging areas for data exchange between host and device.
On systems where DeviceProperties::pageable_memory_access_uses_host_page_tables is enabled, DeviceMemory::register_host does not page-lock the memory range specified by ptr and instead only populates unpopulated pages.
DeviceMemory::register_host is supported only on I/O coherent devices where DeviceProperties::host_register_supported is enabled.
flags selects registration options:
-
HostRegisterFlags::DEFAULT: on a system with unified virtual addressing, the memory is both mapped and portable. On a system with no unified virtual addressing, the memory is neither mapped nor portable. -
HostRegisterFlags::PORTABLE: the memory returned by this call is considered pinned memory by all CUDA contexts, not just the one that performed the allocation. -
HostRegisterFlags::MAPPED: maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by callingsys::cudaHostGetDevicePointer. -
HostRegisterFlags::IO_MEMORY: the passed memory pointer is treated as pointing to some memory-mapped I/O space, for example belonging to a third-party PCIe device, and it is marked as non-cache-coherent and contiguous. -
HostRegisterFlags::READ_ONLY: the passed memory pointer is treated as pointing to memory that is considered read-only by the device. On platforms withoutDeviceProperties::pageable_memory_access_uses_host_page_tables, this flag is required to register memory mapped to the CPU as read-only. Query support withDeviceProperties::host_register_read_only_supported. Using this flag with a current context associated with a device that does not have this attribute set makesDeviceMemory::register_hostreturncrate::error::Status::NotSupported.
All of these flags are orthogonal to one another: a developer may page-lock memory that is portable or mapped with no restrictions.
The CUDA context must have been created with ContextFlags::MAP_HOST for HostRegisterFlags::MAPPED to have any effect.
HostRegisterFlags::MAPPED may be specified on CUDA contexts for devices that do not support mapped pinned memory.
The failure is deferred to sys::cudaHostGetDevicePointer because the memory may be mapped into other CUDA contexts via HostRegisterFlags::PORTABLE.
On devices where DeviceProperties::can_use_host_pointer_for_registered_mem is enabled, the memory can also be accessed from the device using the original host pointer.
The device pointer returned by sys::cudaHostGetDevicePointer may or may not match the original host pointer and depends on the devices visible to the application.
If all devices visible to the application have a non-zero value for the device attribute, the device pointer returned by sys::cudaHostGetDevicePointer matches the original pointer.
If any device visible to the application has a zero value for the device attribute, the device pointer returned by sys::cudaHostGetDevicePointer does not match the original host pointer, but is suitable for use on all devices provided Unified Virtual Addressing is enabled.
In such systems, it is valid to access the memory using either pointer on devices that have a non-zero value for the device attribute.
Such devices must access the memory through only one of the two pointers, not both.
The memory page-locked by this method must be unregistered with DeviceMemory::unregister_host.
§Errors
Returns an error if CUDA cannot register the host range, the pointer, size, or flags are invalid, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.
§Safety
ptr..ptr + size must be a valid host memory range and must remain valid
until it is unregistered.
Sourcepub unsafe fn unregister_host(ptr: *mut ()) -> Result<()>
pub unsafe fn unregister_host(ptr: *mut ()) -> Result<()>
Unmaps the memory range whose base address is specified by ptr, and makes it pageable again.
The base address must be the same one specified to DeviceMemory::register_host.
§Errors
Returns an error if CUDA cannot unregister the host range, ptr is not
the base address of a registered range, a previous asynchronous launch
reports an error, or CUDA reports runtime initialization diagnostics.
§Safety
ptr must be the base address of a host range registered with
DeviceMemory::register_host and must not be unregistered twice.
Sourcepub fn memory_info() -> Result<(usize, usize)>
pub fn memory_info() -> Result<(usize, usize)>
Returns the total amount of memory available to the current context and the amount of memory free on the device. CUDA is not guaranteed to be able to allocate all of the memory that the OS reports as free. In a multi-tenant situation, the free-memory estimate is prone to a race condition: an allocation or free by another process or thread between estimation and reporting can make the reported free value differ from actual free memory.
The integrated GPU on Tegra shares memory with CPU and other component of the SoC. The free and total values returned by this call exclude the SWAP memory space maintained by the OS on some platforms. The OS may move some of the memory pages into swap area as the GPU or CPU allocate or access memory. See Tegra app note on how to calculate total and free memory on Tegra.
§Errors
Returns an error if CUDA cannot query memory information, a previous asynchronous launch reports an error, or CUDA reports runtime initialization diagnostics.
Sourcepub fn pointer_attributes(ptr: *const T) -> Result<PointerAttributes>
pub fn pointer_attributes(ptr: *const T) -> Result<PointerAttributes>
Returns the attributes of ptr.
If ptr was not allocated in, mapped by, or registered with a context that supports unified addressing, crate::error::Status::InvalidValue is returned.
In CUDA 11.0 and later, passing a host pointer reports MemoryType::Unregistered in PointerAttributes::memory_type.
-
PointerAttributes::memory_typeidentifies the type of memory. It can beMemoryType::Unregisteredfor unregistered host memory,MemoryType::Hostfor registered host memory,MemoryType::Devicefor device memory, orMemoryType::Managedfor managed memory. -
PointerAttributes::deviceis the device against whichptrwas allocated. Ifptrhas memory typeMemoryType::Device, this identifies the device on which the memory physically resides. Ifptrhas memory typeMemoryType::Host, this identifies the device that was current when the allocation was made, and if that device is deinitialized then this allocation will vanish with that device’s state. -
PointerAttributes::device_pointeris the device pointer alias through which the memory referred to byptrmay be accessed on the current device. If the memory referred to byptrcannot be accessed directly by the current device then this is null. -
PointerAttributes::host_pointeris the host pointer alias through which the memory referred to byptrmay be accessed on the host. If the memory referred to byptrcannot be accessed directly by the host then this is null.
§Errors
Returns an error if CUDA cannot query attributes for ptr, ptr is not
known to a unified-addressing context, or CUDA reports runtime
initialization diagnostics.
pub unsafe fn alloc_async(count: usize, stream: &Stream) -> Result<*mut T>
Sourcepub unsafe fn free_async(ptr: *mut T, stream: &Stream) -> Result<()>
pub unsafe fn free_async(ptr: *mut T, stream: &Stream) -> Result<()>
Inserts a free operation into stream.
The allocation must not be accessed after stream execution reaches the free.
After this call returns, accessing the memory from any subsequent work launched on the GPU or querying its pointer attributes results in undefined behavior.
During stream capture, this creates a free node and must therefore be passed the address of a graph allocation.
§Errors
Returns an error if CUDA cannot enqueue the free on stream, ptr is
invalid for asynchronous freeing, a previous asynchronous launch reports
an error, or CUDA reports runtime initialization diagnostics.
§Safety
ptr must be null or a live stream-ordered CUDA allocation. No work may
access it after stream reaches the enqueued free.
pub unsafe fn copy_async( dst: *mut T, src: *const T, count: usize, kind: MemoryCopyKind, stream: &Stream, ) -> Result<()>
Sourcepub unsafe fn set_async(
dst: *mut T,
value: u8,
count: usize,
stream: &Stream,
) -> Result<()>
pub unsafe fn set_async( dst: *mut T, value: u8, count: usize, stream: &Stream, ) -> Result<()>
Fills the first count bytes of the memory area pointed to by ptr with the constant byte value.
DeviceMemory::set_async is asynchronous with respect to the host, so the call may return before the memset is complete.
The operation can optionally be associated with a stream by passing a non-zero stream argument.
If stream is non-zero, the operation may overlap with operations in other streams.
The device version only handles device-to-device copies and cannot be given local or shared pointers.
See the CUDA memset synchronization rules for when this operation blocks the host.
§Errors
Returns an error if the requested byte count overflows, CUDA cannot
enqueue the memset on stream, a previous asynchronous launch reports an
error, or CUDA reports runtime initialization diagnostics.
§Safety
dst must be valid for writes of count * size_of::<T>() bytes until
stream reaches the enqueued memset.
Sourcepub fn prefetch_async(
ptr: DevicePtr,
count: usize,
location: MemoryLocation,
stream: &Stream,
) -> Result<()>
pub fn prefetch_async( ptr: DevicePtr, count: usize, location: MemoryLocation, stream: &Stream, ) -> Result<()>
Prefetches memory to the specified destination location.
ptr is the base device pointer of the memory to be prefetched, location specifies the destination location, count specifies the number of bytes to copy, and stream is the stream in which the operation is enqueued.
The memory range must refer to managed memory allocated via DeviceMemory::alloc_managed or declared via __managed__ variables. It may also refer to memory allocated from a managed memory pool, or to system-allocated memory on systems where DeviceProperties::pageable_memory_access is enabled.
Setting MemoryLocation::kind to MemoryLocationKind::Device prefetches memory to the GPU identified by MemoryLocation::id. That device, and the device associated with stream, must support concurrent managed access.
Setting MemoryLocation::kind to MemoryLocationKind::Host prefetches data to host memory.
Applications can request prefetching memory to a specific host NUMA node by using MemoryLocationKind::Numa with a valid NUMA node identifier, or to the NUMA node closest to the current thread’s CPU by using MemoryLocationKind::NumaCurrent.
When MemoryLocation::kind is MemoryLocationKind::Host or MemoryLocationKind::NumaCurrent, MemoryLocation::id is ignored.
The start and end addresses of the memory range are rounded down and up, respectively, to CPU page-size alignment before the prefetch operation is enqueued in the stream.
If no physical memory has been allocated for this region, CUDA populates and maps it on the destination device.
If there is insufficient memory to prefetch the desired region, the Unified Memory driver may evict pages from other DeviceMemory::alloc_managed allocations to host memory to make room.
Device memory allocated using DeviceMemory::alloc or sys::cudaMallocArray is not evicted.
By default, mappings to the previous location of the migrated pages are removed and mappings for the new location are only set up at the destination.
The exact behavior also depends on the settings applied to this memory range via cuMemAdvise as described below:
If read-mostly advice was set on any subset of this memory range, then that subset will create a read-only copy of the pages at the destination location. If the destination location is a host NUMA node, any pages of that subset that are already in another host NUMA node are transferred to the destination.
If preferred-location advice was set on any subset of this memory range, then the pages will migrate to location even if it is not the preferred location of every page in the range.
If accessed-by advice was set on any subset of this memory range, then mappings to those pages from all appropriate processors are updated to refer to the new location if establishing such a mapping is possible. Otherwise, those mappings are cleared.
This is not required for correctness; it improves performance by allowing the application to migrate data to a suitable location before access. Memory accesses to this range are always coherent and are allowed even when the data is actively being migrated.
This call is asynchronous with respect to the host and all work on other devices.
§Errors
Returns an error if CUDA cannot enqueue the prefetch on stream, the
memory range or destination location is invalid, a previous asynchronous
launch reports an error, or CUDA reports runtime initialization
diagnostics.
Source§impl<T> DeviceMemory<T>
impl<T> DeviceMemory<T>
Sourcepub unsafe fn from_raw_parts(ptr: *mut T, length: usize) -> Self
pub unsafe fn from_raw_parts(ptr: *mut T, length: usize) -> Self
Takes ownership of an existing device allocation.
§Safety
ptr must be null for an empty allocation or point to length live
elements allocated by cudaMallocManaged or another CUDA allocation
function compatible with cudaFree. length * size_of::<T>() must fit
in usize.
No other owner may free the pointer while the returned value is alive.
pub fn into_raw_parts(self) -> (*mut T, usize)
pub fn create(length: usize) -> Result<Self>
pub fn zeroes(length: usize) -> Result<Self>
pub fn from_slice(v: &[T]) -> Result<Self>
Sourcepub unsafe fn from_slice_async(v: &[T], stream: &Stream) -> Result<Self>
pub unsafe fn from_slice_async(v: &[T], stream: &Stream) -> Result<Self>
pub const fn len(&self) -> usize
pub const fn is_empty(&self) -> bool
pub fn byte_len(&self) -> usize
pub const fn as_ptr(&self) -> *const T
pub const fn as_mut_ptr(&self) -> *mut T
pub fn copy_from_host(&mut self, host_slice: &[T]) -> Result<()>
pub fn copy_from_host_async<'scope, 'env>( &mut self, host_slice: &'env [T], stream: &StreamScope<'scope, 'env>, ) -> Result<()>
Sourcepub unsafe fn copy_from_host_async_unchecked(
&mut self,
host_slice: &[T],
stream: &Stream,
) -> Result<()>
pub unsafe fn copy_from_host_async_unchecked( &mut self, host_slice: &[T], stream: &Stream, ) -> Result<()>
§Safety
The caller must ensure self and host_slice both remain valid until
stream has completed the transfer.
Sourcepub unsafe fn copy_from_host_operation<'a>(
&'a mut self,
host_slice: &'a [T],
) -> Result<MemoryCopyOperation<'a, T>>
pub unsafe fn copy_from_host_operation<'a>( &'a mut self, host_slice: &'a [T], ) -> Result<MemoryCopyOperation<'a, T>>
Returns a capture operation that copies from host memory into this device allocation.
§Safety
Capturing this operation stores the host and device pointer addresses in
the resulting CUDA graph. The caller must ensure self and host_slice
remain valid whenever a captured graph using this operation is launched.
The destination allocation must remain exclusive for the work ordered by
those launches.
pub fn copy_to_host(&self, host_slice: &mut [T]) -> Result<()>
pub fn copy_to_host_async<'scope, 'env>( &self, host_slice: &'env mut [T], stream: &StreamScope<'scope, 'env>, ) -> Result<()>
Sourcepub unsafe fn copy_to_host_async_unchecked(
&self,
host_slice: &mut [T],
stream: &Stream,
) -> Result<()>
pub unsafe fn copy_to_host_async_unchecked( &self, host_slice: &mut [T], stream: &Stream, ) -> Result<()>
§Safety
The caller must ensure self and host_slice both remain valid until
stream has completed the transfer.
Sourcepub unsafe fn copy_to_host_operation<'a>(
&'a self,
host_slice: &'a mut [T],
) -> Result<MemoryCopyOperation<'a, T>>
pub unsafe fn copy_to_host_operation<'a>( &'a self, host_slice: &'a mut [T], ) -> Result<MemoryCopyOperation<'a, T>>
Returns a capture operation that copies this allocation into host memory.
§Safety
Capturing this operation stores the device and host pointer addresses in
the resulting CUDA graph. The caller must ensure self and host_slice
remain valid whenever a captured graph using this operation is launched.
The host destination must remain exclusive for the work ordered by those
launches.
pub fn copy_to_host_vec(&self) -> Result<Vec<T>>
pub fn copy_from_device(&mut self, src: &Self) -> Result<()>
pub fn copy_from_device_async<'scope, 'env>( &mut self, src: &Self, stream: &StreamScope<'scope, 'env>, ) -> Result<()>
Sourcepub unsafe fn copy_from_device_async_unchecked(
&mut self,
src: &Self,
stream: &Stream,
) -> Result<()>
pub unsafe fn copy_from_device_async_unchecked( &mut self, src: &Self, stream: &Stream, ) -> Result<()>
§Safety
The caller must ensure self and src both remain valid until
stream has completed the transfer.
Sourcepub unsafe fn copy_from_device_operation<'a>(
&'a mut self,
src: &'a Self,
) -> Result<MemoryCopyOperation<'a, T>>
pub unsafe fn copy_from_device_operation<'a>( &'a mut self, src: &'a Self, ) -> Result<MemoryCopyOperation<'a, T>>
Returns a capture operation that copies from another device allocation into this allocation.
§Safety
Capturing this operation stores both device pointer addresses in the
resulting CUDA graph. The caller must ensure self and src remain
valid whenever a captured graph using this operation is launched. The
destination allocation must remain exclusive for the work ordered by
those launches.
pub fn set_zeroes(&mut self) -> Result<()>
pub fn set_value(&mut self, value: u8) -> Result<()>
pub fn set_value_async<'scope, 'env>( &mut self, value: u8, stream: &StreamScope<'scope, 'env>, ) -> Result<()>
Sourcepub unsafe fn set_value_async_unchecked(
&mut self,
value: u8,
stream: &Stream,
) -> Result<()>
pub unsafe fn set_value_async_unchecked( &mut self, value: u8, stream: &Stream, ) -> Result<()>
Sourcepub unsafe fn set_value_operation<'a>(
&'a mut self,
value: u8,
) -> MemorySetOperation<'a, T>
pub unsafe fn set_value_operation<'a>( &'a mut self, value: u8, ) -> MemorySetOperation<'a, T>
Returns a capture operation that fills this device allocation with value.
§Safety
Capturing this operation stores this allocation’s pointer address in the
resulting CUDA graph. The caller must ensure self remains valid and
exclusive whenever a captured graph using this operation is launched.
Sourcepub fn ipc_handle(&self) -> Result<IpcMemoryHandle>
pub fn ipc_handle(&self) -> Result<IpcMemoryHandle>
Takes a pointer to the base of an existing device memory allocation created with DeviceMemory::alloc and exports it for use in another process.
This is a lightweight operation and may be called multiple times on an allocation without adverse effects.
If a region of memory is freed with DeviceMemory::free and a subsequent call to DeviceMemory::alloc returns memory with the same device address, DeviceMemory::ipc_handle returns a unique handle for the new memory.
IPC is restricted to devices with unified-addressing support on Linux and Windows.
IPC on Windows is supported for compatibility but is not recommended because of its performance cost.
Check device IPC support through the device properties exposed by this crate, for example DeviceProperties::ipc_event_supported.
§Errors
Returns an error if the allocation is empty, CUDA cannot export an IPC handle for the allocation, or CUDA reports runtime initialization diagnostics.