rapidhash - rust implementation
A rust implementation of the rapidhash function, the official successor to wyhash.
- High quality, the fastest hash passing all tests in the SMHasher and SMHasher3 benchmark. Collision-based study showed a collision probability lower than wyhash, foldhash, and close to ideal.
- Very fast, the fastest passing hash in SMHasher3. Significant peak throughput improvement over wyhash and foldhash. Fastest memory-safe hash. Fastest platform-independent hash. Fastest const hash.
- Platform independent, works on all platforms, no dependency on machine-specific vectorized or cryptographic hardware instructions. Optimised for both AMD64 and AArch64.
- Memory safe, when the
unsafefeature is disabled (default). This implementation has also been fuzz-tested withcargo fuzz. - No dependencies and no-std compatible when disabling default features.
- Official successor to wyhash with improved speed, quality, and compatibility.
- Inline variants that use
#[inline(always)]onRapidInlineHashandRapidInlineHashBuilderto force compiler optimisations on specific input types (can double the hash performance depending on the hashed type). - Run-time and compile-time hashing as the hash implementation is fully
const. - Idiomatic
std::hash::Hashercompatible hasher forHashMapandHashSetusage. - Non-cryptographic hash function that's "minimally DoS resistant" in the same manner as foldhash.
Usage
Persistent Hashing
Full compatibility with C++ rapidhash algorithms, methods are provided for all rapidhash V1, V2, and V3 (with micro/nano) variants. These are stable functions whose output will not change between crate versions.
use ;
// rapidhash V3 algorithm, for fast, high-quality, persistent hashing
use rapidhash_v3;
assert_eq!;
In-Memory Hashing
Following rust's std::hash traits, the underlying hash function may change between minor versions, and is only suitable for in-memory hashing. These types are optimised for speed and minimal DoS resistance.
RapidHasher: Astd::hash::Hashercompatible hasher that uses the rapidhash algorithm.RapidHashBuilder: Astd::hash::BuildHasherfor initialising the hasher with the default seed and secrets.RandomState: Astd::hash::BuildHasherfor initialising the hasher with a random seed and secrets.RapidHashMapandRapidHashSet: Helper types for usingRapidHasherwithHashMapandHashSet.
use RapidHashMap;
// std HashMap with the RapidHashBuilder hasher.
let mut map = default;
map.insert;
CLI
Rapidhash can also be installed as a CLI tool to hash files or stdin. This is not a cryptographic hash, but should be much faster than cryptographic hashes. This is fully compatible with the C++ rapidhash V1, V2, and V3 algorithms.
Output is the decimal string of the u64 hash value.
# install
cargo install rapidhash
# hash a file (output: 8543579700415218186)
rapidhash --v3 example.txt
# hash stdin (output: 8543579700415218186)
echo "example" | rapidhash --v3
Features
default:stdstd: Enables theRapidHashMapandRapidHashSethelper types.rand: Enables using therandlibrary to more securely initialiseRandomState. Includes therandcrate dependency.rng: EnablesRapidRng, a fast, non-cryptographic PRNG based on rapidrng. Includes therand_corecrate dependency.unsafe: Uses unsafe pointer arithmetic to skip some unnecessary bounds checks for a small 3-4% performance improvement.
Benchmarks
Initial benchmarks on M1 Max (aarch64) for various input sizes.
Hashing Benchmarks
Comparison to foldhash
- Rapidhash is generally faster with string and byte inputs.
- Foldhash is generally faster with integer tuples by using a 128bit buffer for integer inputs.
┌────────────────┬─────────────┬────────────┐
│ metric ┆ rapidhash-f ┆ foldhash-f │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞════════════════╪═════════════╪════════════╡
│ avg_rank ┆ 1.50 ┆ 1.50 │
│ geometric_mean ┆ 4.72 ┆ 4.83 │
└────────────────┴─────────────┴────────────┘
┌────────────────┬────────────┬─────────────┬────────────┐
│ distr ┆ bench ┆ rapidhash-f ┆ foldhash-f │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 │
╞════════════════╪════════════╪═════════════╪════════════╡
│ u32 ┆ hashonly ┆ 0.56 ┆ 0.62 │
│ u32 ┆ lookupmiss ┆ 1.40 ┆ 1.51 │
│ u32 ┆ lookuphit ┆ 1.76 ┆ 1.84 │
│ u32 ┆ setbuild ┆ 3.86 ┆ 4.07 │
│ u32pair ┆ hashonly ┆ 0.85 ┆ 0.62 │
│ u32pair ┆ lookupmiss ┆ 1.88 ┆ 1.59 │
│ u32pair ┆ lookuphit ┆ 2.25 ┆ 1.88 │
│ u32pair ┆ setbuild ┆ 4.52 ┆ 4.28 │
│ u64 ┆ hashonly ┆ 0.81 ┆ 0.62 │
│ u64 ┆ lookupmiss ┆ 1.77 ┆ 1.46 │
│ u64 ┆ lookuphit ┆ 2.08 ┆ 1.83 │
│ u64 ┆ setbuild ┆ 4.32 ┆ 4.10 │
│ u64lobits ┆ hashonly ┆ 0.81 ┆ 0.62 │
│ u64lobits ┆ lookupmiss ┆ 1.73 ┆ 1.46 │
│ u64lobits ┆ lookuphit ┆ 2.00 ┆ 1.81 │
│ u64lobits ┆ setbuild ┆ 4.18 ┆ 4.02 │
│ u64hibits ┆ hashonly ┆ 0.81 ┆ 0.62 │
│ u64hibits ┆ lookupmiss ┆ 1.71 ┆ 1.46 │
│ u64hibits ┆ lookuphit ┆ 2.12 ┆ 1.80 │
│ u64hibits ┆ setbuild ┆ 4.04 ┆ 4.05 │
│ u64pair ┆ hashonly ┆ 1.31 ┆ 0.78 │
│ u64pair ┆ lookupmiss ┆ 2.52 ┆ 1.84 │
│ u64pair ┆ lookuphit ┆ 2.91 ┆ 2.14 │
│ u64pair ┆ setbuild ┆ 5.18 ┆ 4.33 │
│ ipv4 ┆ hashonly ┆ 0.55 ┆ 0.62 │
│ ipv4 ┆ lookupmiss ┆ 1.45 ┆ 1.52 │
│ ipv4 ┆ lookuphit ┆ 1.77 ┆ 1.83 │
│ ipv4 ┆ setbuild ┆ 4.02 ┆ 4.05 │
│ ipv6 ┆ hashonly ┆ 0.83 ┆ 0.78 │
│ ipv6 ┆ lookupmiss ┆ 1.81 ┆ 1.74 │
│ ipv6 ┆ lookuphit ┆ 2.55 ┆ 2.39 │
│ ipv6 ┆ setbuild ┆ 4.44 ┆ 4.32 │
│ rgba ┆ hashonly ┆ 1.25 ┆ 0.63 │
│ rgba ┆ lookupmiss ┆ 2.52 ┆ 1.71 │
│ rgba ┆ lookuphit ┆ 3.28 ┆ 2.51 │
│ rgba ┆ setbuild ┆ 5.90 ┆ 4.72 │
│ strenglishword ┆ hashonly ┆ 1.38 ┆ 6.29 │
│ strenglishword ┆ lookupmiss ┆ 4.26 ┆ 6.65 │
│ strenglishword ┆ lookuphit ┆ 8.73 ┆ 10.92 │
│ strenglishword ┆ setbuild ┆ 14.78 ┆ 17.22 │
│ struuid ┆ hashonly ┆ 2.75 ┆ 5.51 │
│ struuid ┆ lookupmiss ┆ 6.42 ┆ 8.01 │
│ struuid ┆ lookuphit ┆ 9.95 ┆ 12.05 │
│ struuid ┆ setbuild ┆ 13.82 ┆ 16.29 │
│ strurl ┆ hashonly ┆ 5.01 ┆ 7.38 │
│ strurl ┆ lookupmiss ┆ 8.01 ┆ 9.89 │
│ strurl ┆ lookuphit ┆ 14.45 ┆ 16.15 │
│ strurl ┆ setbuild ┆ 21.27 ┆ 22.78 │
│ strdate ┆ hashonly ┆ 1.33 ┆ 5.47 │
│ strdate ┆ lookupmiss ┆ 4.40 ┆ 6.29 │
│ strdate ┆ lookuphit ┆ 6.42 ┆ 8.01 │
│ strdate ┆ setbuild ┆ 9.76 ┆ 12.54 │
│ accesslog ┆ hashonly ┆ 1.55 ┆ 1.16 │
│ accesslog ┆ lookupmiss ┆ 2.93 ┆ 2.26 │
│ accesslog ┆ lookuphit ┆ 4.06 ┆ 3.21 │
│ accesslog ┆ setbuild ┆ 6.46 ┆ 5.48 │
│ kilobyte ┆ hashonly ┆ 31.78 ┆ 31.93 │
│ kilobyte ┆ lookupmiss ┆ 35.09 ┆ 33.73 │
│ kilobyte ┆ lookuphit ┆ 72.32 ┆ 76.98 │
│ kilobyte ┆ setbuild ┆ 101.35 ┆ 114.84 │
│ tenkilobyte ┆ hashonly ┆ 235.44 ┆ 314.27 │
│ tenkilobyte ┆ lookupmiss ┆ 243.00 ┆ 317.11 │
│ tenkilobyte ┆ lookuphit ┆ 608.65 ┆ 683.77 │
│ tenkilobyte ┆ setbuild ┆ 1034.19 ┆ 1079.70 │
└────────────────┴────────────┴─────────────┴────────────┘
Benchmark notes
- Hash throughput/latency does not measure hash "quality", and many of the benchmarked functions fail SMHasher3 quality tests. Hash quality affects hashmap performance, as well as algorithms that benefit from high quality hash functions such as HyperLogLog and MinHash.
- Most hash functions will be affected heavily by whether the compiler has inlined them. Rapidhash tries very hard to always be inlined by the compiler, but the larger a program or benchmark gets, the less likely it is to be inlined due to Rust's
BuildHasher::hash_onemethod not being#[inline(always)]. gxhashhas high throughput by using AES instructions. It's a great hash function, but is not a portable hash function (often requirestarget-cpu=nativeto compile), uses unsafe code, and is not minimally DoS resistant.- Benchmark your own use case, with your real world dataset! We suggest experimenting with rapidhash, foldhash, and gxhash to see what works for you. Different inputs will benefit from different hash functions.
Rapidhash Versions
Rapidhash has multiple versions of the algorithm.
Persistent Hashing
Fixed versioning with C++ compatibility is presented in rapidhash::v1, rapidhash::v2, and rapidhash::v3 modules.
Rapidhash V3 is the recommended, fastest, and most recent version of the hash. Others are provided for backwards compatibility.
In-Memory Hashing
Rust hasing traits (RapidHasher, RapidBuildHasher, etc.) are implemented in rapidhash::fast, rapidhash::quality, and rapidhash::inner modules. These are not guaranteed to give a consistent hash output between crate versions as the rust Hasher trait is not designed for this.
- Use
rapidhash::fastfor optimal hashing speed with a slightly lower hash quality. Best for most datastructures such as HashMap and HashSet usage. - Use
rapidhash::qualitywhere statistical hash quality is the priority, such as HyperLogLog or MinHash algorithms. - Use
rapidhash::innerto configure advanced parameters to configure the hash function specifically to your use case. This allows tweaking the following compile time parameters, which all change the hash output:AVALANCHE: Enables the final avalanche mixing step to improve hash quality. Enabled on quality, disabled on fast.FNV: Hash integer types using FNV instead of the rapidhash algorithm. Reduces DoS resistance on integer types. Disabled on quality, enabled on fast.COMPACT: Generates fewer instructions at compile time with less manual loop unrolling, but may be slower on some platforms. Disabled by default.PROTECTED: Slightly stronger hash quality and DoS resistance by performing two extra XOR instructions on every mix step. Disabled by default.
Versioning
The minimum supported Rust version (MSRV) is 1.77.0.
The rapidhash crate follows the following versioning scheme:
- Major for breaking API changes and MSRV version bumps or any changes to
rapidhash_v*method output. - Minor for significant API additions/deprecations or any changes to
RapidHasheroutput. - Patch for bug fixes and performance improvements.
Persistent hash outputs (eg. rapidhash_v3) are guaranteed to be stable. In-memory hash outputs (eg. RapidHasher) may change between minor versions to allow us to freely improve performance.
License and Acknowledgements
This project is licensed under both the MIT and Apache-2.0 licenses. You are free to choose either license.
With thanks to Nicolas De Carli for the original rapidhash C++ implementation, which is licensed under the MIT License.
With thanks to Justin Bradford for letting us use the rapidhash crate name 🍻