pub fn simhash_from_sparse_vector(entries: &[(u32, f32)]) -> u64Expand description
Compute SimHash from a sparse vector’s dimension IDs and weights.
Each (dim_id, weight) pair contributes to a 64-bit fingerprint:
- Hash dim_id with Stafford mix13 to get a 64-bit pseudo-random mask
- For each bit, accumulate +weight or -weight based on that bit
- Final hash: bit i = 1 iff accumulator[i] > 0
Documents with similar dimension sets produce similar hashes (small Hamming distance), enabling block-reorder clustering.