[−][src]Function gear_fingerprinter::fingerprint
pub fn fingerprint(data: &[u8]) -> Fingerprint
Perform GEAR based MinHash fingerprinting of the provided data, using the fastest implementation available on the current hardware.
GEAR Hash
This method derives a fingerprint based on a GEAR hash, a rolling hash where each byte's hash is calculated as:
hash = (hash << 1) + table[byte]
Where table
is a pre-calculated list of 256 random values, one for each
possible value of a byte. The with of the rolling hash is determined by the
bit width of the integer type used, in this case u32
, corresponding to a
32-byte hash window. The GEAR hash function used can thus also be stated as
such:
hash = (hash * 2) + table[byte] mod 2^32
Derived Hashes
This method uses 24 different hashes derived from the internal GEAR hash, based
on the following formula, where X
is in the range of 0..24
, and hash
is
the original gear hash:
hash_X = (hash * N[X]) + M[X]
Here, N
and M
are separate, arbitrarily chosen tables of integers.
The method keeps track of the minimum value encountered for each of the derived hashes as it consumes the bytes of the data, and those minimum values compose the fingerprint.