Expand description
UMASH is an almost-universal family of hash functions. Each
Params
struct defines a specific hash function; when the
parameters are generated pseudorandomly, the probability that two
different inputs of up to s
bytes collide (for an independently
generated set of parameters) is at most ceil(s / 4096) 2^-55
for
the 64-bit hash. The 128-bit fingerprint reduces that probability
to less than 2^-70
for inputs of 1 GB or less.
See the reference repo for more details and proofs.
Structs§
- A 128-bit
Fingerprint
is constructed by UMASH as if we had computed theUmashComponent::Hash
andUmashComponent::Secondary
functions, and stored them in theFingerprint::hash
array in order. - A
Fingerprinter
implements the 128-bit fingerprinting function defined by a specificParams
struct, further tweaked by a seed. ConstructFingerprinter
s withParams::fingerprinter
. - A
Hasher
implements one of the two hash 64-bit functions defined by a specificParams
struct, further tweaked by a seed. ConstructHasher
s withParams::hasher
,Params::secondary_hasher
, orParams::component_hasher
. - A
Params
stores a set of hashing parameters that define a specific UMASH function.
Enums§
- A given
Params
struct defines a pair of 64-bit hash functions. TheUmashComponent::Hash
is the primary hash value; we find a 128-bit fingerprint by combining that primary value with theUmashComponent::Secondary
hash (in practice, it’s more efficient to compute both hash values concurrently, when one knows they want a fingerprint).