Module probabilistic_collections::similarity
source · Expand description
Locality-sensitive hashing schemes for measuring similarities between sets.
Structs
MinHash
is a locality sensitive hashing scheme that can estimate the Jaccard Similarity
measure between two sets s1
and s2
. It uses multiple hash functions and for each hash
function h
, finds the minimum hash value obtained from the hashing an item in s1
using h
and hashing an item in s2
using h
. Our estimate for the Jaccard Similarity is the number of
minimum hash values that are equal divided by the number of total hash functions used.A w-shingle iterator for an list of items.
SimHash
is a locality sensitive hashing scheme. If two sets s1
and s2
are similar,
SimHash
will generate hashes for s1
and s2
that has a small Hamming Distance between
them.Functions
Computes the Jaccard Similarity between two iterators. The Jaccard Similarity is the quotient
between the intersection and the union.