Expand description
Rust DSL functions for writing CUDA kernels.
This module provides Rust functions that map to CUDA intrinsics during transpilation. These functions have CPU fallback implementations for testing but are transpiled to the corresponding CUDA operations when used in kernel code.
§Thread/Block Index Access
ⓘ
use ringkernel_cuda_codegen::dsl::*;
fn my_kernel(...) {
let tx = thread_idx_x(); // -> threadIdx.x
let bx = block_idx_x(); // -> blockIdx.x
let idx = bx * block_dim_x() + tx; // Global thread index
}§Thread Synchronization
ⓘ
sync_threads(); // -> __syncthreads()Functions§
- block_
dim_ x - Get the block dimension (x dimension).
Transpiles to:
blockDim.x - block_
dim_ y - Get the block dimension (y dimension).
Transpiles to:
blockDim.y - block_
dim_ z - Get the block dimension (z dimension).
Transpiles to:
blockDim.z - block_
idx_ x - Get the block index within a grid (x dimension).
Transpiles to:
blockIdx.x - block_
idx_ y - Get the block index within a grid (y dimension).
Transpiles to:
blockIdx.y - block_
idx_ z - Get the block index within a grid (z dimension).
Transpiles to:
blockIdx.z - grid_
dim_ x - Get the grid dimension (x dimension).
Transpiles to:
gridDim.x - grid_
dim_ y - Get the grid dimension (y dimension).
Transpiles to:
gridDim.y - grid_
dim_ z - Get the grid dimension (z dimension).
Transpiles to:
gridDim.z - sync_
threads - Synchronize all threads in a block.
Transpiles to:
__syncthreads() - thread_
fence - Thread memory fence.
Transpiles to:
__threadfence() - thread_
fence_ block - Block-level memory fence.
Transpiles to:
__threadfence_block() - thread_
idx_ x - Get the thread index within a block (x dimension).
Transpiles to:
threadIdx.x - thread_
idx_ y - Get the thread index within a block (y dimension).
Transpiles to:
threadIdx.y - thread_
idx_ z - Get the thread index within a block (z dimension).
Transpiles to:
threadIdx.z