Expand description
Functions for dealing with the parallel thread execution model employed by CUDA.
§CUDA Thread model
The CUDA thread model is based on 3 main structures:
- Threads
- Thread Blocks
- Grids
§Threads
Threads are the fundamental element of GPU computing. Threads execute the same kernel at the same time, controlling their task by retrieving their corresponding global thread ID.
§Thread Blocks
The most important structure after threads, thread blocks arrange
Functions§
- block_
dim - Gets the 3d layout of the thread blocks executing this kernel. In other words, how many threads exist in each thread block in every direction.
- block_
dim_ x - block_
dim_ y - block_
dim_ z - block_
idx - Gets the 3d index of the block that the thread currently executing the kernel is located in.
- block_
idx_ x - block_
idx_ y - block_
idx_ z - device_
fence - Acts as a memory fence at the device level.
- first
- Whether this is the first thread (not the first thread to be executing). This function is guaranteed to only return true in a single thread that is invoking it. This is useful for only doing something once.
- grid_
dim - Gets the 3d layout of the block grids executing this kernel. In other words, how many thread blocks exist in each grid in every direction.
- grid_
dim_ x - grid_
dim_ y - grid_
dim_ z - grid_
fence - Acts as a memory fence at the grid level (all threads inside of a kernel execution).
- index
- Gets the overall thread index, accounting for 1d/2d/3d block/grid dimensions. This value is most commonly used for indexing into data and this index is guaranteed to be unique for every single thread executing this kernel no matter the launch configuration.
- index_
1d - index_
2d - index_
3d - nanosleep
- Suspends the calling thread for a duration (in nanoseconds) approximately close to
nanos
. - sync_
threads - Waits until all threads in the thread block have reached this point. This guarantees that any global or shared mem accesses are visible to every thread after this call.
- sync_
threads_ and - Identical to
sync_threads
but with the additional feature that it evaluates the predicate for every thread and returns a non-zero integer if every predicate evaluates to non-zero for all threads. - sync_
threads_ count - Identical to
sync_threads
but with the additional feature that it evaluates the predicate for every thread and returns the number of threads in which it evaluated to a non-zero number. - sync_
threads_ or - Identical to
sync_threads
but with the additional feature that it evaluates the predicate for every thread and returns a non-zero integer if at least one predicate in a thread evaluates to non-zero. - system_
fence - Acts as a memory fence at the system level.
- thread_
idx - Gets the 3d index of the thread currently executing the kernel.
- thread_
idx_ x - thread_
idx_ y - thread_
idx_ z - warp_
size - Gets the number of threads inside of a warp. Currently 32 threads on every GPU architecture.