Skip to main content

jaccard_similarity

Function jaccard_similarity 

Source
pub fn jaccard_similarity(a: &str, b: &str) -> f64
Expand description

Computes the trigram-Jaccard similarity between two strings.

The score is |A ∩ B| / |A ∪ B| where A and B are the sets of character-trigrams extracted from each input. The trigrams are taken over Unicode scalar values via char_indices, so the function is safe to call on multi-byte UTF-8 inputs without byte-boundary errors.

§Edge cases

  • Both inputs empty: returns 1.0 (the empty trigram set is trivially contained in itself).
  • One input empty, the other non-empty: returns 0.0 (no overlap).
  • Identical inputs: returns 1.0.

The function is pure: no I/O, no allocation beyond the two trigram sets, deterministic for a given pair of inputs. It is safe to call in hot paths.