Module kernel_cache

Expand description

Global kernel deduplication cache.

This module provides a global concurrent cache that maps (UOp ID, device) pairs to compiled kernels. Uses papaya’s lock-free HashMap for thread-safe access across parallel tensor operations.

§Thread Safety

All operations are thread-safe. Multiple threads can look up and compile kernels concurrently without explicit synchronization.

§Deduplication

Thanks to hash consing in ir/src/uop/hash_consing.rs, identical ASTs automatically have identical IDs, making kernel deduplication trivial. The key includes both the AST ID and the device string to support multi-GPU systems where the same kernel might be compiled differently for different devices.

Structs§

CachedKernel: Cached kernel that can be reused across tensors.

Functions§

clear_all: Clear all cached kernels.
gc_unused_kernels: Remove kernels whose AST IDs are no longer in the live UOp set.
get_or_compile_kernel: Get or compile a kernel by UOp ID and device.