pub fn bf16_dot(a: &[BFloat16], b: &[BFloat16]) -> f32
Dot product of two bf16 slices, accumulated in f32.