Expand description
Shared memory management for CUDA kernel emulation
Emulates CUDA shared memory (__shared__) on the CPU, including:
- Static shared memory allocation (known size at compile time)
- Dynamic (extern) shared memory allocation (size provided at launch)
- Bank conflict detection for profiling/debugging
In CUDA, shared memory is per-block SRAM accessible by all threads in a block. On the CPU we emulate this with heap-allocated buffers shared among threads in the same block.
Structs§
- Bank
Conflict Detector - Tracks shared memory access patterns to detect bank conflicts.
- Dynamic
Shared Memory - Dynamic shared memory allocation.
- Shared
Memory - Static shared memory allocation.
Constants§
- BANK_
WIDTH_ BYTES - Size of each bank in bytes (4 bytes = 32 bits, matching CUDA).
- NUM_
BANKS - Number of memory banks (matches NVIDIA GPU shared memory banks).