Function rcudnn::cudaMemPrefetchAsync[][src]

pub unsafe extern "C" fn cudaMemPrefetchAsync(
    devPtr: *const c_void,
    count: usize,
    dstDevice: i32,
    stream: *mut CUstream_st
) -> cudaError
Expand description

\brief Prefetches memory to the specified destination device

Prefetches memory to the specified destination device. \p devPtr is the base device pointer of the memory to be prefetched and \p dstDevice is the destination device. \p count specifies the number of bytes to copy. \p stream is the stream in which the operation is enqueued. The memory range must refer to managed memory allocated via ::cudaMallocManaged or declared via managed variables.

Passing in cudaCpuDeviceId for \p dstDevice will prefetch the data to host memory. If \p dstDevice is a GPU, then the device attribute ::cudaDevAttrConcurrentManagedAccess must be non-zero. Additionally, \p stream must be associated with a device that has a non-zero value for the device attribute ::cudaDevAttrConcurrentManagedAccess.

The start address and end address of the memory range will be rounded down and rounded up respectively to be aligned to CPU page size before the prefetch operation is enqueued in the stream.

If no physical memory has been allocated for this region, then this memory region will be populated and mapped on the destination device. If there’s insufficient memory to prefetch the desired region, the Unified Memory driver may evict pages from other ::cudaMallocManaged allocations to host memory in order to make room. Device memory allocated using ::cudaMalloc or ::cudaMallocArray will not be evicted.

By default, any mappings to the previous location of the migrated pages are removed and mappings for the new location are only setup on \p dstDevice. The exact behavior however also depends on the settings applied to this memory range via ::cudaMemAdvise as described below:

If ::cudaMemAdviseSetReadMostly was set on any subset of this memory range, then that subset will create a read-only copy of the pages on \p dstDevice.

If ::cudaMemAdviseSetPreferredLocation was called on any subset of this memory range, then the pages will be migrated to \p dstDevice even if \p dstDevice is not the preferred location of any pages in the memory range.

If ::cudaMemAdviseSetAccessedBy was called on any subset of this memory range, then mappings to those pages from all the appropriate processors are updated to refer to the new location if establishing such a mapping is possible. Otherwise, those mappings are cleared.

Note that this API is not required for functionality and only serves to improve performance by allowing the application to migrate data to a suitable location before it is accessed. Memory accesses to this range are always coherent and are allowed even when the data is actively being migrated.

Note that this function is asynchronous with respect to the host and all work on other devices.

\param devPtr - Pointer to be prefetched \param count - Size in bytes \param dstDevice - Destination device to prefetch to \param stream - Stream to enqueue prefetch operation

\return ::cudaSuccess, ::cudaErrorInvalidValue, ::cudaErrorInvalidDevice \notefnerr \note_async \note_null_stream \note_init_rt \note_callback

\sa ::cudaMemcpy, ::cudaMemcpyPeer, ::cudaMemcpyAsync, ::cudaMemcpy3DPeerAsync, ::cudaMemAdvise, ::cuMemPrefetchAsync