Crate ptr_hash

Source
Expand description

§PtrHash: Minimal Perfect Hashing at RAM Throughput

PtrHash builds a minimal perfect hash function, that is, a hash function that maps a fixed set of keys to {0, ..., n-1}.

PtrHash was developed for large key sets of at least 1 million keys, and has been tested up to 10^11 keys. It only uses 2.4 bits per key.

It can also be used for arbitrary small sets. In this case, the space efficiency will be less due to a relatively large constant overhead.

See the GitHub readme or paper (arXiv, blog version) for details on the algorithm and performance.

Usage example:

use ptr_hash::{PtrHash, PtrHashParams};

// Generate some random keys.
let n = 1_000_000;
let keys = ptr_hash::util::generate_keys(n);

// Build the datastructure.
// NOTE: For small sets, say <1M keys, use `PtrHashParams::default_fast()` instead.
// The default parameters are optimized for large sets, and need large (>100k or so) inputs
// to ensure internal bucket sizes don't deviate too much from their expectation.
let mphf = <PtrHash>::new(&keys, PtrHashParams::default());

// Get the index of a key.
let key = 0;
let idx = mphf.index(&key);
assert!(idx < n);

// Get the non-minimal index of a key.
// Can be slightly faster returns keys up to `n/alpha ~ 1.01*n`.
let _idx = mphf.index_no_remap(&key);

// An iterator over the indices of the keys.
// 32: number of iterations ahead to prefetch.
// true: remap to a minimal key in [0, n).
// _: placeholder to infer the type of keys being iterated.
let indices = mphf.index_stream::<32, true, _>(&keys);
assert_eq!(indices.sum::<usize>(), (n * (n - 1)) / 2);

// Query a batch of keys.
let keys = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
let mut indices = mphf.index_batch::<16, true, _>(keys);
indices.sort();
for i in 0..indices.len()-1 {
    assert!(indices[i] != indices[i+1]);
}

// Test that all items map to different indices
let mut taken = vec![false; n];
for key in keys {
    let idx = mphf.index(&key);
    assert!(!taken[idx]);
    taken[idx] = true;
}

§Partitioning

By default, PtrHash partitions the keys into multiple parts. This speeds up construction in two ways:

  • smaller parts have better cache locality, and
  • parts can be constructed in parallel.

However, at query time there is a small overhead to compute the part of each key. To achieve slightly faster queries, set PtrHashParams::single_part to true, and then use PtrHash::index_single_part() instead of PtrHash::index().

Modules§

bucket_fn
Various “bucket functions” are implemented here.
hash
Customizable Hasher trait. Implementations of the Hash trait that abstracts over 64 and 128-bit hashes.
pack
Extendable backing storage trait and types. The Packed and MutPacked traits are used for the underlying storage of the remap vector.
util
Some internal logging and testing utilities. Internal utilities that are only exposed for testing/benchmarking purposes. Do not use externally.

Structs§

PtrHash
PtrHash datastructure. It is recommended to use PtrHash with default types.
PtrHashParams
Parameters for PtrHash construction.

Enums§

Sharding
The sharding method to use.

Traits§

KeyT
Trait that keys must satisfy.

Type Aliases§

DefaultPtrHash
Type alias to simplify construction.