Crate seahash [] [src]

SeaHash: A blazingly fast, portable hash function with proven statistical guarantees.

SeaHash is a hash function with performance similar (within around ±7% difference) to XXHash and MetroHash, but with stronger guarantees.

This implementation only allows hashing buffers of 4096 bytes, but the algorithm itself can relatively simply be extended for arbitrary length buffers.

SeaHash is a portable hash function, meaning that the output is not dependent on the hosting architecture, and makes no assumptions on endianness or the alike. This stable layout allows it to be used for on-disk/permanent storage (e.g. checksums).

Achieving the performance

Like any good general-purpose hash function, SeaHash reads 8 bytes at once effectively reducing the running time by an order of ~5.

Secondly, SeaHash achieves the performance by heavily exploiting Instruction-Level Parallelism. In particular, it fetches 4 integers in every round and independently diffuses them. This yields four different states, which are finally combined.

Statistical guarantees

SeaHash comes with certain proven guarantees about the statistical properties of the output:

  1. Pick some n-byte sequence, s. The number of n-byte sequence colliding with s is independent of the choice of s (all equivalence class have equal size).
  2. If you flip any bit in the input, the probability for any bit in the output to be flipped is 0.5.

The first guarantee can be derived through deduction, by proving that the diffusion function is bijective (reverse the XORs and find the congruence inverses to the primes).

The second guarantee requires more complex calculations: Construct a matrix of probabilities and set one to certain (1), then apply transformations through the respective operations. The proof is a bit long, but relatively simple.

Specification

The hasher's state is 4-tuple of 4 64-bit integers, starting at (0x16f11fe89b0d677c, 0xb480a793d8e6c86c, 0x6fe2e5aaf078ebc9, 0x14f994a4c5259381). The input is split into blocks of the size of the tuple. The block is split into 4 64-bit integers, each of which is added (modulo 2⁶⁴) to the state, in matching order. The integers are read in little-endian.

When the block is read and the state is updated, the diffusion is applied. The diffusion function, diffuse(x), is as follows:

x ← x ⊕ (x >> 32)
x ← px
x ← x ⊕ (x >> 32)
x ← px
x ← x ⊕ (x >> 32)

with p = 0x7ed0e9fa0d94a33.

When the whole buffer (all blocks) is read, the state tuple, (a, b, c, d) is mixed into a single number:

h = a + diffuse(b) + diffuse(c + diffuse(d))

Functions

hash

Hash a 4 KiB buffer.