Function rcudnn::cudaSetDeviceFlags[][src]

pub unsafe extern "C" fn cudaSetDeviceFlags(flags: u32) -> cudaError
Expand description

\brief Sets flags to be used for device executions

Records \p flags as the flags for the current device. If the current device has been set and that device has already been initialized, the previous flags are overwritten. If the current device has not been initialized, it is initialized with the provided flags. If no device has been made current to the calling thread, a default device is selected and initialized with the provided flags.

The two LSBs of the \p flags parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device.

  • ::cudaDeviceScheduleAuto: The default value if the \p flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process \p C and the number of logical processors in the system \p P. If \p C > \p P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor. Additionally, on Tegra devices, ::cudaDeviceScheduleAuto uses a heuristic based on the power profile of the platform and may choose ::cudaDeviceScheduleBlockingSync for low-powered devices.
  • ::cudaDeviceScheduleSpin: Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
  • ::cudaDeviceScheduleYield: Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.
  • ::cudaDeviceScheduleBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
  • ::cudaDeviceBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
    \ref deprecated “Deprecated:” This flag was deprecated as of CUDA 4.0 and replaced with ::cudaDeviceScheduleBlockingSync.
  • ::cudaDeviceMapHost: This flag enables allocating pinned host memory that is accessible to the device. It is implicit for the runtime but may be absent if a context is created using the driver API. If this flag is not set, ::cudaHostGetDevicePointer() will always return a failure code.
  • ::cudaDeviceLmemResizeToMax: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
    \ref deprecated “Deprecated:” This flag is deprecated and the behavior enabled by this flag is now the default and cannot be disabled.

\param flags - Parameters for device operation

\return ::cudaSuccess, ::cudaErrorInvalidValue, \notefnerr \note_init_rt \note_callback

\sa ::cudaGetDeviceFlags, ::cudaGetDeviceCount, ::cudaGetDevice, ::cudaGetDeviceProperties, ::cudaSetDevice, ::cudaSetValidDevices, ::cudaChooseDevice, ::cuDevicePrimaryCtxSetFlags