Skip to main content

simhash_from_sparse_vector

Function simhash_from_sparse_vector 

Source
pub fn simhash_from_sparse_vector(entries: &[(u32, f32)]) -> u64
Expand description

Compute SimHash from a sparse vector’s dimension IDs and weights.

Each (dim_id, weight) pair contributes to a 64-bit fingerprint:

  • Hash dim_id with Stafford mix13 to get a 64-bit pseudo-random mask
  • For each bit, accumulate +weight or -weight based on that bit
  • Final hash: bit i = 1 iff accumulator[i] > 0

Documents with similar dimension sets produce similar hashes (small Hamming distance), enabling block-reorder clustering.