cosine-fast 0.1.0

Hot-loop cosine similarity for f32 slices. Auto-vectorized scalar core, optional precompute-norms helper. Zero deps.
Documentation