pub fn transpose(m: [u32x8; 8]) -> [u32x8; 8]
Transpose an 8x8 matrix of 8 u32x8 SIMD elements. https://stackoverflow.com/questions/25622745/transpose-an-8x8-float-using-avx-avx2
u32x8