[][src]Function gear_fingerprinter::fingerprint

pub fn fingerprint(data: &[u8]) -> Fingerprint

Perform GEAR based MinHash fingerprinting of the provided data, using the fastest implementation available on the current hardware.

GEAR Hash

This method derives a fingerprint based on a GEAR hash, a rolling hash where each byte's hash is calculated as:

hash = (hash << 1) + table[byte]

Where table is a pre-calculated list of 256 random values, one for each possible value of a byte. The with of the rolling hash is determined by the bit width of the integer type used, in this case u32, corresponding to a 32-byte hash window. The GEAR hash function used can thus also be stated as such:

hash = (hash * 2) + table[byte] mod 2^32

Derived Hashes

This method uses 24 different hashes derived from the internal GEAR hash, based on the following formula, where X is in the range of 0..24, and hash is the original gear hash:

hash_X = (hash * N[X]) + M[X]

Here, N and M are separate, arbitrarily chosen tables of integers.

The method keeps track of the minimum value encountered for each of the derived hashes as it consumes the bytes of the data, and those minimum values compose the fingerprint.