Expand description
Shared memory support for CUDA code generation.
This module provides types and utilities for working with CUDA shared memory
(__shared__) in the Rust DSL. Shared memory is fast on-chip memory that is
shared among all threads in a block.
§Overview
Shared memory is crucial for efficient GPU programming:
- Much faster than global memory (~100x lower latency)
- Shared among all threads in a block
- Limited size (typically 48KB-164KB per SM)
- Requires explicit synchronization after writes
§Usage in DSL
ⓘ
use ringkernel_cuda_codegen::shared::SharedTile;
fn kernel(data: &[f32], out: &mut [f32], width: i32) {
// Declare a 16x16 shared memory tile
let tile = SharedTile::<f32, 16, 16>::new();
// Load from global memory
let gx = block_idx_x() * 16 + thread_idx_x();
let gy = block_idx_y() * 16 + thread_idx_y();
tile.set(thread_idx_x(), thread_idx_y(), data[gy * width + gx]);
// Synchronize before reading
sync_threads();
// Read from shared memory
let val = tile.get(thread_idx_x(), thread_idx_y());
out[gy * width + gx] = val * 2.0;
}§Generated CUDA
The above DSL generates:
__shared__ float tile[16][16];
int gx = blockIdx.x * 16 + threadIdx.x;
int gy = blockIdx.y * 16 + threadIdx.y;
tile[threadIdx.y][threadIdx.x] = data[gy * width + gx];
__syncthreads();
float val = tile[threadIdx.y][threadIdx.x];
out[gy * width + gx] = val * 2.0f;Structs§
- Shared
Array - A 1D shared memory array.
- Shared
Memory Config - Shared memory configuration for a kernel.
- Shared
Memory Decl - Information about a shared memory declaration for transpilation.
- Shared
Tile - A 2D shared memory tile.
Functions§
- parse_
shared_ array_ type - Parse a SharedArray type to extract size.
- parse_
shared_ tile_ type - Parse a SharedTile type to extract dimensions.
- rust_
to_ cuda_ element_ type - Map Rust element type to CUDA type for shared memory.