pub fn matmul_f16w_kernel(device: &Device) -> Option<&'static Kernel>Expand description
f16-weight matmul (f32 compute). Returns Some only when the device
exposes the SHADER_F16 feature. EXPERIMENTAL: currently slower
than the f32 baseline on Apple Silicon — kept as foundation; see
matmul_f16w.wgsl for the empirical analysis.