Module module_cache

Expand description

Global cache for compiled CUDA modules and kernel functions.

Without caching, every call to a GPU kernel (e.g. gpu_add, gpu_conv2d_f32, gpu_flash_attention_f32) recompiles PTX source into a CUBIN via CudaContext::load_module(Ptx::from_src(...)). This compilation takes ~1700 us per call – far longer than the actual kernel execution.

This module provides get_or_compile, which compiles the PTX only on first use and returns a cached CudaFunction on subsequent calls. The cache is keyed by the static kernel name string, which is unique per kernel entry point in this crate.

§Thread safety

The cache uses a global Mutex-protected HashMap. The critical section is short (a hash lookup + optional insert), so contention is negligible in practice.

Functions§

get_or_compile: Get a compiled kernel function, compiling the PTX only on first use.