Crate simplehash

Source
Expand description

§SimpleHash

A simple, fast Rust library implementing common non-cryptographic hash functions.

§Overview

This library provides implementations of several widely-used non-cryptographic hash functions:

  • FNV-1 (32-bit and 64-bit variants)
  • FNV-1a (32-bit and 64-bit variants)
  • MurmurHash3 (32-bit, 64-bit, and 128-bit variants)
  • CityHash (64-bit variant)
  • Rendezvous hashing (Highest Random Weight hashing)

Non-cryptographic hash functions are designed for fast computation and good distribution properties, making them suitable for hash tables, checksums, and other general-purpose hashing needs. They are NOT suitable for cryptographic purposes.

All hash functions in this library implement the std::hash::Hasher trait, making them compatible with Rust’s standard collections like HashMap and HashSet.

The CityHash implementation is based on the algorithm developed by Google and available at https://github.com/google/cityhash. CityHash was created by Geoff Pike and Jyrki Alakuijala and is licensed under the MIT License.

§When to use each hash function:

  • FNV: Good for small inputs (< 32 bytes), particularly for integer keys. Simple implementation.
  • MurmurHash3: Excellent general-purpose hash with good distribution across all input sizes.
  • CityHash: Optimized for short strings (< 64 bytes) with excellent performance on modern processors.
  • Rendezvous: For distributed systems when keys need to be consistently mapped to nodes.

§When NOT to use these hash functions:

None of the hash functions in this library should be used for:

  • Cryptographic purposes
  • Password hashing
  • Security-sensitive applications
  • Protection against malicious inputs that could cause collisions

§Example Usage

use simplehash::{fnv1_32, fnv1a_32, fnv1_64, fnv1a_64, murmurhash3_32, murmurhash3_64, murmurhash3_128};

let input = "hello world";
let bytes = input.as_bytes();

// Computing various hashes
let fnv1_32_hash = fnv1_32(bytes);
let fnv1a_32_hash = fnv1a_32(bytes);
let fnv1_64_hash = fnv1_64(bytes);
let fnv1a_64_hash = fnv1a_64(bytes);
let murmur3_32_hash = murmurhash3_32(bytes, 0);
let murmur3_64_hash = murmurhash3_64(bytes, 0);
let murmur3_128_hash = murmurhash3_128(bytes, 0);

println!("FNV1-32: 0x{:x}", fnv1_32_hash);
println!("FNV1a-32: 0x{:x}", fnv1a_32_hash);
println!("FNV1-64: 0x{:x}", fnv1_64_hash);
println!("FNV1a-64: 0x{:x}", fnv1a_64_hash);
println!("MurmurHash3-32: 0x{:x}", murmur3_32_hash);
println!("MurmurHash3-64: 0x{:x}", murmur3_64_hash);
println!("MurmurHash3-128: 0x{:x}", murmur3_128_hash);

§Rendezvous Hashing Example

use simplehash::rendezvous::RendezvousHasher;
use std::collections::hash_map::RandomState;
use std::hash::BuildHasherDefault;
use simplehash::fnv::Fnv1aHasher64;

// Create a RendezvousHasher with the standard hasher
let std_hasher = RendezvousHasher::<_, RandomState>::new(RandomState::new());

// Create a RendezvousHasher with FNV-1a 64-bit hasher
let fnv_hasher = RendezvousHasher::<_, BuildHasherDefault<Fnv1aHasher64>>::new(
    BuildHasherDefault::<Fnv1aHasher64>::default()
);

// Define some nodes (servers, cache instances, etc.)
let nodes = vec!["server1", "server2", "server3", "server4", "server5"];

// Select the preferred node for a key
let key = "user_12345";
let selected_node = fnv_hasher.select(&key, &nodes).unwrap();
println!("Key '{}' is assigned to node '{}'", key, selected_node);

// Get the index of the selected node
let selected_idx = fnv_hasher.select_index(&key, &nodes).unwrap();
println!("Index of selected node: {}", selected_idx);

// Get all nodes ranked by preference for this key
let ranked_nodes = fnv_hasher.rank(&key, &nodes);
println!("Nodes ranked by preference for key '{}':", key);
for (i, node) in ranked_nodes.iter().enumerate() {
    println!("  {}. {}", i+1, node);
}

§Choosing a Hash Function

  • FNV-1a: Good general-purpose hash function. Simple to implement with reasonable performance and distribution properties. The FNV-1a variant is generally preferred over FNV-1.

  • MurmurHash3: Offers excellent distribution properties and performance, especially for larger inputs. The 128-bit variant provides better collision resistance.

  • CityHash: Developed by Google specifically for string hashing, with excellent performance for short to medium-length strings. It’s a good choice for hash tables where the keys are typically strings.

  • Rendezvous Hashing: Not a raw hash function but rather a consistent hashing algorithm that uses an underlying hash function. It’s designed for distributed systems where keys need to be consistently mapped to servers, with minimal redistribution when servers are added or removed. This library’s implementation works with any hasher implementing the std::hash::Hasher trait.

§Using with HashMap and HashSet

All hashers in this library implement the std::hash::Hasher trait, making them compatible with standard collections. This allows you to use these faster, non-cryptographic hashers in place of the default SipHash implementation when performance is a priority.

use simplehash::fnv::Fnv1aHasher64;
use simplehash::murmur::{MurmurHasher32, MurmurHasher64};
use std::collections::{HashMap, HashSet};
use std::hash::BuildHasherDefault;

// Using FNV-1a with HashMap
let mut map: HashMap<String, u32, BuildHasherDefault<Fnv1aHasher64>> =
    HashMap::with_hasher(BuildHasherDefault::<Fnv1aHasher64>::default());
map.insert("key".to_string(), 42);

// Using FNV-1a with HashSet
let mut set: HashSet<String, BuildHasherDefault<Fnv1aHasher64>> =
    HashSet::with_hasher(BuildHasherDefault::<Fnv1aHasher64>::default());
set.insert("value".to_string());

// Create a BuildHasher for MurmurHash3 32-bit
#[derive(Default, Clone)]
struct MurmurHash3_32BuildHasher;

impl std::hash::BuildHasher for MurmurHash3_32BuildHasher {
    type Hasher = MurmurHasher32;

    fn build_hasher(&self) -> Self::Hasher {
        MurmurHasher32::new(0) // Using seed 0
    }
}

// Create a BuildHasher for MurmurHash3 64-bit
#[derive(Default, Clone)]
struct MurmurHash3_64BuildHasher;

impl std::hash::BuildHasher for MurmurHash3_64BuildHasher {
    type Hasher = MurmurHasher64;

    fn build_hasher(&self) -> Self::Hasher {
        MurmurHasher64::new(0) // Using seed 0
    }
}

// Using MurmurHash3 32-bit with HashMap
let mut murmur32_map: HashMap<String, u32, MurmurHash3_32BuildHasher> =
    HashMap::with_hasher(MurmurHash3_32BuildHasher);
murmur32_map.insert("key".to_string(), 42);

// Using MurmurHash3 64-bit with HashMap
let mut murmur64_map: HashMap<String, u32, MurmurHash3_64BuildHasher> =
    HashMap::with_hasher(MurmurHash3_64BuildHasher);
murmur64_map.insert("key".to_string(), 42);

§Performance Characteristics

  • FNV-1a:

    • Very fast for small keys (like short strings, integers)
    • Uses less memory than SipHash (Rust’s default)
    • Particularly effective for HashMaps with small keys
    • Not recommended for untrusted inputs (e.g., network data) due to lack of DoS protection
  • MurmurHash3:

    • Excellent performance for medium to large inputs
    • Better distribution properties than FNV
    • Good compromise between speed and collision resistance
    • The 32-bit version is faster, while the 128-bit version has better collision resistance
    • When developers need a 64-bit hash, they can use the 64-bit variant which uses half of the 128-bit output

§Implementation Notes

All hash functions in this library:

  • Accept a byte slice (&[u8]) as input
  • Return an unsigned integer of the appropriate size
  • Are deterministic (same input always produces same output)
  • Are endian-agnostic (produce same result regardless of platform endianness)

The MurmurHash3 implementations are compatible with reference implementations in other languages (Python’s mmh3 package and the original C++ implementation).

Re-exports§

pub use city::*;
pub use fnv::*;
pub use murmur::*;
pub use rendezvous::*;

Modules§

city
fnv
murmur
rendezvous

Functions§

fnv1_32
Computes the FNV-1 hash (32-bit) of the provided data.
fnv1_64
Computes the FNV-1 hash (64-bit) of the provided data.
fnv1a_32
Computes the FNV-1a hash (32-bit) of the provided data.
fnv1a_64
Computes the FNV-1a hash (64-bit) of the provided data.
murmurhash3_32
Computes the MurmurHash3 32-bit hash of the provided data.
murmurhash3_64
Computes the MurmurHash3 64-bit hash of the provided data.
murmurhash3_128
Computes the MurmurHash3 128-bit hash of the provided data.