//! Hand-optimized compute kernels for performance-critical operations.
//!
//! Each submodule contains kernel source code as `const &str` for a specific
//! dialect (MSL or CUDA). These are compiled at runtime and cached by the
//! respective backend.