hash-iter
Implementation of the enhanced double hashing technique based on the Bloom Filters in Probabilistic Verification paper (Dillinger and Manolios, 2004).
Motivation
This crate is very simple: given a key key
, one can hash it to a sequence of k
hashes
[hash_1, hash_2, .., hash_k]
, instead of a single hash.
This is useful for implementing many hash-based algorithms, such as:
- Hash tables with open addressing, where double hashing is used to resolve collisions.
- Bloom filters, since as shown in
Less Hashing, Same Performance: Building a Better Bloom Filter
(Kirsch and Mitzenmacher, 2006) instead of using
k
different hashers, we can rely on double hashing to producek
filter positions. - Consistent hashing, where double hashing is used to map keys to a ring of servers (for example in Multi-probe consistent hashing).
Usage
Create and configure a hasher state (basically a reusable configuration), use it to create a hasher
object. The hasher object then can be used to hash keys into sequence of k
hash points.
Basic usage
Hasher state allows you to configure how hash iterators are produced. The only required parameter is
number of hashes per key input, k
.
// Create a hasher state with 3 hashes per key.
let state = new;
// Create a hasher object.
// It holds state and can be used to produce hash iterators.
let hasher = state.build_hash_iter_hasher;
// Hash keys to several hash points (`hash_iter()` returns an iterator).
let key = "hello";
let hashes = hasher.hash_iter.;
When we are relying on default parameters (for seed values, max hash value etc), and do not need to
keep state around (which is normally the case, as a single generated hasher is often enough), we can
use the DoubleHasher
directly:
let hasher = new;
let hashes = hasher.hash_iter.;
let hashes = hasher.hash_iter.;
Configuring hasher state
In addition to k
, there are several optional parameters that can be configured: n
(max hash
value produced, by default it is usize::MAX
, so that array indexing is safe), seed1
and seed2
(seeds for the two hash functions, by default they are 12345
and 67890
respectively).
// Specify default values explicitly.
let hasher = new
.with_seed1
.with_seed2
.with_n
.build_hash_iter_hasher;
let hashes = hasher.hash_iter.;
Custom hash functions
One can specify which hash functions to use when creating the first two hash values used to produce
the sequence. All you need to do is to supply DoubleHasher::with_hash_builders()
function with two
structs that implement hash::BuildHasher
:
use Xxh3Builder;
let hasher = with_hash_builders;
let hashes = hasher.hash_iter.;