Crate fastlanes

Modules§

scalar: Portable scalar transpose using a 64-bit gather and the classic 8x8 bit-matrix transpose. Used as the fallback when no SIMD implementation is available.
x86: x86-64 transpose implementations: BMI2 (PEXT/PDEP) and AVX-512 VBMI.

transpose: Return the corresponding index in a transposed FastLanes vector.
transpose_bits: Transpose 1024 bits into FastLanes layout, dispatching to the best implementation.
untranspose_bits: Untranspose a T-width comparison mask (1024 bits) from FastLanes layout into logical row order, dispatching to the best implementation. For T = u64 this is the canonical FastLanes bit untranspose; narrower T undo the per-lane packing produced by unpack_cmp for that width.