Function rcudnn_sys::cudaGraphAddKernelNode[−][src]

pub unsafe extern "C" fn cudaGraphAddKernelNode(
    pGraphNode: *mut cudaGraphNode_t, 
    graph: cudaGraph_t, 
    pDependencies: *const cudaGraphNode_t, 
    numDependencies: usize, 
    pNodeParams: *const cudaKernelNodeParams
) -> cudaError_t

Expand description

\brief Creates a kernel execution node and adds it to a graph

Creates a new kernel execution node and adds it to \p graph with \p numDependencies dependencies specified via \p pDependencies and arguments specified in \p pNodeParams. It is possible for \p numDependencies to be 0, in which case the node will be placed at the root of the graph. \p pDependencies may not have any duplicate entries. A handle to the new node will be returned in \p pGraphNode.

The cudaKernelNodeParams structure is defined as:

\code struct cudaKernelNodeParams { void* func; dim3 gridDim; dim3 blockDim; unsigned int sharedMemBytes; void **kernelParams; void **extra; }; \endcode

When the graph is launched, the node will invoke kernel \p func on a (\p gridDim.x x \p gridDim.y x \p gridDim.z) grid of blocks. Each block contains (\p blockDim.x x \p blockDim.y x \p blockDim.z) threads.

\p sharedMem sets the amount of dynamic shared memory that will be available to each thread block.

Kernel parameters to \p func can be specified in one of two ways:

Kernel parameters can be specified via \p kernelParams. If the kernel has N parameters, then \p kernelParams needs to be an array of N pointers. Each pointer, from \p kernelParams[0] to \p kernelParams[N-1], points to the region of memory from which the actual parameter will be copied. The number of kernel parameters and their offsets and sizes do not need to be specified as that information is retrieved directly from the kernel’s image.
Kernel parameters can also be packaged by the application into a single buffer that is passed in via \p extra. This places the burden on the application of knowing each kernel parameter’s size and alignment/padding within the buffer. The \p extra parameter exists to allow this function to take additional less commonly used arguments. \p extra specifies a list of names of extra settings and their corresponding values. Each extra setting name is immediately followed by the corresponding value. The list must be terminated with either NULL or CU_LAUNCH_PARAM_END.

::CU_LAUNCH_PARAM_END, which indicates the end of the \p extra array;
::CU_LAUNCH_PARAM_BUFFER_POINTER, which specifies that the next value in \p extra will be a pointer to a buffer containing all the kernel parameters for launching kernel \p func;
::CU_LAUNCH_PARAM_BUFFER_SIZE, which specifies that the next value in \p extra will be a pointer to a size_t containing the size of the buffer specified with ::CU_LAUNCH_PARAM_BUFFER_POINTER;

The error ::cudaErrorInvalidValue will be returned if kernel parameters are specified with both \p kernelParams and \p extra (i.e. both \p kernelParams and \p extra are non-NULL).

The \p kernelParams or \p extra array, as well as the argument values it points to, are copied during this call.

\note Kernels launched using graphs must not use texture and surface references. Reading or writing through any texture or surface reference is undefined behavior. This restriction does not apply to texture and surface objects.

\param pGraphNode - Returns newly created node \param graph - Graph to which to add the node \param pDependencies - Dependencies of the node \param numDependencies - Number of dependencies \param pNodeParams - Parameters for the GPU execution node

\return ::cudaSuccess, ::cudaErrorInvalidValue, ::cudaErrorInvalidDeviceFunction \note_graph_thread_safety \notefnerr \note_init_rt \note_callback

\sa ::cudaLaunchKernel, ::cudaGraphKernelNodeGetParams, ::cudaGraphKernelNodeSetParams, ::cudaGraphCreate, ::cudaGraphDestroyNode, ::cudaGraphAddChildGraphNode, ::cudaGraphAddEmptyNode, ::cudaGraphAddHostNode, ::cudaGraphAddMemcpyNode, ::cudaGraphAddMemsetNode