Skip to main content

Module shared

Module shared 

Source
Expand description

Shared memory support for CUDA code generation.

This module provides types and utilities for working with CUDA shared memory (__shared__) in the Rust DSL. Shared memory is fast on-chip memory that is shared among all threads in a block.

§Overview

Shared memory is crucial for efficient GPU programming:

  • Much faster than global memory (~100x lower latency)
  • Shared among all threads in a block
  • Limited size (typically 48KB-164KB per SM)
  • Requires explicit synchronization after writes

§Usage in DSL

use ringkernel_cuda_codegen::shared::SharedTile;

fn kernel(data: &[f32], out: &mut [f32], width: i32) {
    // Declare a 16x16 shared memory tile
    let tile = SharedTile::<f32, 16, 16>::new();

    // Load from global memory
    let gx = block_idx_x() * 16 + thread_idx_x();
    let gy = block_idx_y() * 16 + thread_idx_y();
    tile.set(thread_idx_x(), thread_idx_y(), data[gy * width + gx]);

    // Synchronize before reading
    sync_threads();

    // Read from shared memory
    let val = tile.get(thread_idx_x(), thread_idx_y());
    out[gy * width + gx] = val * 2.0;
}

§Generated CUDA

The above DSL generates:

__shared__ float tile[16][16];
int gx = blockIdx.x * 16 + threadIdx.x;
int gy = blockIdx.y * 16 + threadIdx.y;
tile[threadIdx.y][threadIdx.x] = data[gy * width + gx];
__syncthreads();
float val = tile[threadIdx.y][threadIdx.x];
out[gy * width + gx] = val * 2.0f;

Structs§

SharedArray
A 1D shared memory array.
SharedMemoryConfig
Shared memory configuration for a kernel.
SharedMemoryDecl
Information about a shared memory declaration for transpilation.
SharedTile
A 2D shared memory tile.

Functions§

parse_shared_array_type
Parse a SharedArray type to extract size.
parse_shared_tile_type
Parse a SharedTile type to extract dimensions.
rust_to_cuda_element_type
Map Rust element type to CUDA type for shared memory.