SIMD distance kernels for Quiver — cosine, squared-L2, and inner product over
f32 and i8, plus Hamming distance over packed-bit (u64) vectors, with
runtime CPU-feature dispatch and a scalar fallback.
Each public function selects the best available implementation once per call
(is_x86_feature_detected! results are cached by std) and always has a
correct scalar fallback. The SIMD paths are differential-tested against the
scalar reference. Design: docs/index/distance-kernels.md, ADR-0009.