Expand description
Global kernel deduplication cache.
This module provides a global concurrent cache that maps (UOp ID, device) pairs to compiled kernels. Uses papaya’s lock-free HashMap for thread-safe access across parallel tensor operations.
§Thread Safety
All operations are thread-safe. Multiple threads can look up and compile kernels concurrently without explicit synchronization.
§Deduplication
Thanks to hash consing in ir/src/uop/hash_consing.rs, identical ASTs automatically
have identical IDs, making kernel deduplication trivial. The key includes both the
AST ID and the device string to support multi-GPU systems where the same kernel
might be compiled differently for different devices.
Structs§
- Cached
Kernel - Cached kernel that can be reused across tensors.
Functions§
- clear_
all - Clear all cached kernels.
- gc_
unused_ kernels - Remove kernels whose AST IDs are no longer in the live UOp set.
- get_
or_ compile_ kernel - Get or compile a kernel by UOp ID and device.