pub fn dot_simd(a: &GpuMultivector, b: &GpuMultivector) -> f32
SIMD-optimized dot product (inner product).