Hot-loop cosine similarity for f32 slices. Scalar core that the compiler auto-vectorizes well; precompute_norm for cheap repeated queries.
precompute_norm
use cosine_fast::{cosine, batch_cosine}; let a = vec![1.0f32, 2.0, 3.0]; let b = vec![2.0, 4.0, 6.0]; assert!((cosine(&a, &b) - 1.0).abs() < 1e-5);
Zero deps. MIT or Apache-2.0.