fasthash_fork/city.rs
1//! `CityHash`, a family of hash functions for strings.
2//!
3//! by Geoff Pike and Jyrki Alakuijala
4//!
5//! https://github.com/google/cityhash
6//!
7//! Introduction
8//! ============
9//! `CityHash` provides hash functions for strings. The functions mix the
10//! input bits thoroughly but are not suitable for cryptography. See
11//! "Hash Quality," below, for details on how `CityHash` was tested and so on.
12//!
13//! We provide reference implementations in C++, with a friendly MIT license.
14//!
15//! `CityHash32` returns a 32-bit hash.
16//!
17//! `CityHash64` and similar return a 64-bit hash.
18//!
19//! `CityHash128` and similar return a 128-bit hash and are tuned for
20//! strings of at least a few hundred bytes. Depending on your compiler
21//! and hardware, it's likely faster than `CityHash64` on sufficiently long
22//! strings. It's slower than necessary on shorter strings, but we expect
23//! that case to be relatively unimportant.
24//!
25//! `CityHashCrc128` and similar are variants of `CityHash128` that depend
26//! on `_mm_crc32_u64()`, an intrinsic that compiles to a CRC32 instruction
27//! on some CPUs. However, none of the functions we provide are CRCs.
28//!
29//! `CityHashCrc256` is a variant of `CityHashCrc128` that also depends
30//! on `_mm_crc32_u64()`. It returns a 256-bit hash.
31//!
32//! All members of the `CityHash` family were designed with heavy reliance
33//! on previous work by Austin Appleby, Bob Jenkins, and others.
34//! For example, `CityHash32` has many similarities with `Murmur3a`.
35//!
36//! Performance on long strings: 64-bit CPUs
37//! ========================================
38//!
39//! We are most excited by the performance of `CityHash64` and its variants on
40//! short strings, but long strings are interesting as well.
41//!
42//! `CityHash` is intended to be fast, under the constraint that it hash very
43//! well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't
44//! designed as a hash function and shouldn't be used as one. `CityHashCrc128`
45//! is not a CRC, but it uses the CRC32 machinery.
46//!
47//! On a single core of a 2.67GHz Intel Xeon X5550, `CityHashCrc256` peaks at about
48//! 5 to 5.5 bytes/cycle. The other `CityHashCrc` functions are wrappers around
49//! `CityHashCrc256` and should have similar performance on long strings.
50//! (`CityHashCrc256` in v1.0.3 was even faster, but we decided it wasn't as thorough
51//! as it should be.) `CityHash128` peaks at about 4.3 bytes/cycle. The fastest
52//! Murmur variant on that hardware, `Murmur3F`, peaks at about 2.4 bytes/cycle.
53//! We expect the peak speed of `CityHash128` to dominate `CityHash64`, which is
54//! aimed more toward short strings or use in hash tables.
55//!
56//! For long strings, a new function by Bob Jenkins, `SpookyHash`, is just
57//! slightly slower than `CityHash128` on Intel x86-64 CPUs, but noticeably
58//! faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs
59//! and/or CPUs without the CRC instruction, `SpookyHash` may be just as
60//! good or better than any of the `CityHash` variants.
61//!
62//! Performance on short strings: 64-bit CPUs
63//! =========================================
64//!
65//! For short strings, e.g., most hash table keys, `CityHash64` is faster than
66//! `CityHash128`, and probably faster than all the aforementioned functions,
67//! depending on the mix of string lengths. Here are a few results from that
68//! same hardware, where we (unrealistically) tested a single string length over
69//! and over again:
70//!
71//! Hash Results
72//! ------------------------------------------------------------------------------
73//! `CityHash64` v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
74//! `Murmur2` (64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes
75//! `Murmur3F` 14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
76//!
77//! We don't have `CityHash64` benchmarks results for v1.1, but we expect the
78//! numbers to be similar.
79//!
80//! Performance: 32-bit CPUs
81//! ========================
82//!
83//! `CityHash32` is the newest variant of `CityHash`. It is intended for
84//! 32-bit hardware in general but has been mostly tested on x86. Our benchmarks
85//! suggest that `Murmur3` is the nearest competitor to `CityHash32` on x86.
86//! We don't know of anything faster that has comparable quality. The speed rankings
87//! in our testing: `CityHash32` > `Murmur3`f > `Murmur3`a (for long strings), and
88//! `CityHash32` > `Murmur3a` > `Murmur3f` (for short strings).
89//!
90//! Limitations
91//! ===========
92//!
93//! 1) `CityHash32` is intended for little-endian 32-bit code, and everything else in
94//! the current version of `CityHash` is intended for little-endian 64-bit CPUs.
95//!
96//! All functions that don't use the CRC32 instruction should work in
97//! little-endian 32-bit or 64-bit code. `CityHash` should work on big-endian CPUs
98//! as well, but we haven't tested that very thoroughly yet.
99//!
100//! 2) `CityHash` is fairly complex. As a result of its complexity, it may not
101//! perform as expected on some compilers. For example, preliminary reports
102//! suggest that some Microsoft compilers compile `CityHash` to assembly that's
103//! 10-20% slower than it could be.
104//!
105//! # Example
106//!
107//! ```
108//! use std::hash::{Hash, Hasher};
109//!
110//! use fasthash::{city, CityHasher};
111//!
112//! fn hash<T: Hash>(t: &T) -> u64 {
113//! let mut s: CityHasher = Default::default();
114//! t.hash(&mut s);
115//! s.finish()
116//! }
117//!
118//! let h = city::hash64(b"hello world\xff");
119//!
120//! assert_eq!(h, hash(&"hello world"));
121//! ```
122//!
123use std::mem;
124
125use crate::ffi;
126
127use crate::hasher::FastHash;
128
129/// `CityHash` 32-bit hash functions
130///
131/// # Example
132///
133/// ```
134/// use fasthash::{city::Hash32, FastHash};
135///
136/// assert_eq!(Hash32::hash(b"hello"), 2039911270);
137/// assert_eq!(Hash32::hash_with_seed(b"hello", 123), 3366460263);
138/// assert_eq!(Hash32::hash(b"helloworld"), 4037657980);
139/// ```
140#[derive(Clone, Default)]
141pub struct Hash32;
142
143impl FastHash for Hash32 {
144 type Hash = u32;
145 type Seed = u32;
146
147 #[inline(always)]
148 fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u32) -> u32 {
149 unsafe {
150 ffi::CityHash32WithSeed(
151 bytes.as_ref().as_ptr() as *const i8,
152 bytes.as_ref().len(),
153 seed,
154 )
155 }
156 }
157}
158
159trivial_hasher! {
160 /// # Example
161 ///
162 /// ```
163 /// use std::hash::Hasher;
164 ///
165 /// use fasthash::{city::Hasher32, FastHasher};
166 ///
167 /// let mut h = Hasher32::new();
168 ///
169 /// h.write(b"hello");
170 /// assert_eq!(h.finish(), 2039911270);
171 ///
172 /// h.write(b"world");
173 /// assert_eq!(h.finish(), 4037657980);
174 /// ```
175 Hasher32(Hash32) -> u32
176}
177
178/// `CityHash` 64-bit hash functions
179///
180/// # Example
181///
182/// ```
183/// use fasthash::{city::Hash64, FastHash};
184///
185/// assert_eq!(Hash64::hash(b"hello"), 2578220239953316063);
186/// assert_eq!(
187/// Hash64::hash_with_seed(b"hello", 123),
188/// 11802079543206271427
189/// );
190/// assert_eq!(
191/// Hash64::hash_with_seeds(b"hello", 123, 456),
192/// 13699505624668345539
193/// );
194/// assert_eq!(Hash64::hash(b"helloworld"), 16622738483577116029);
195/// ```
196#[derive(Clone, Default)]
197pub struct Hash64;
198
199impl Hash64 {
200 /// Hash functions for a byte array.
201 /// For convenience, seeds are also hashed into the result.
202 #[inline(always)]
203 pub fn hash_with_seeds<T: AsRef<[u8]>>(bytes: T, seed0: u64, seed1: u64) -> u64 {
204 unsafe {
205 ffi::CityHash64WithSeeds(
206 bytes.as_ref().as_ptr() as *const i8,
207 bytes.as_ref().len(),
208 seed0,
209 seed1,
210 )
211 }
212 }
213}
214
215impl FastHash for Hash64 {
216 type Hash = u64;
217 type Seed = u64;
218
219 #[inline(always)]
220 fn hash<T: AsRef<[u8]>>(bytes: T) -> u64 {
221 unsafe { ffi::CityHash64(bytes.as_ref().as_ptr() as *const i8, bytes.as_ref().len()) }
222 }
223
224 /// Hash functions for a byte array.
225 /// For convenience, a seed is also hashed into the result.
226 #[inline(always)]
227 fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u64) -> u64 {
228 unsafe {
229 ffi::CityHash64WithSeed(
230 bytes.as_ref().as_ptr() as *const i8,
231 bytes.as_ref().len(),
232 seed,
233 )
234 }
235 }
236}
237
238trivial_hasher! {
239 /// # Example
240 ///
241 /// ```
242 /// use std::hash::Hasher;
243 ///
244 /// use fasthash::{city::Hasher64, FastHasher};
245 ///
246 /// let mut h = Hasher64::new();
247 ///
248 /// h.write(b"hello");
249 /// assert_eq!(h.finish(), 2578220239953316063);
250 ///
251 /// h.write(b"world");
252 /// assert_eq!(h.finish(), 16622738483577116029);
253 /// ```
254 Hasher64(Hash64) -> u64
255}
256
257/// `CityHash` 128-bit hash functions
258///
259/// # Example
260///
261/// ```
262/// use fasthash::{city::Hash128, FastHash};
263///
264/// assert_eq!(
265/// Hash128::hash(b"hello"),
266/// 321050694807308650239948771137913318383,
267/// );
268/// assert_eq!(
269/// Hash128::hash_with_seed(b"hello", 123),
270/// 191203071519574338941297548675763958113
271/// );
272/// assert_eq!(
273/// Hash128::hash(b"helloworld"),
274/// 137438709495761624905137796394169174828
275/// );
276/// ```
277#[derive(Clone, Default)]
278pub struct Hash128;
279
280impl FastHash for Hash128 {
281 type Hash = u128;
282 type Seed = u128;
283
284 #[inline(always)]
285 fn hash<T: AsRef<[u8]>>(bytes: T) -> u128 {
286 unsafe {
287 mem::transmute(ffi::CityHash128(
288 bytes.as_ref().as_ptr() as *const i8,
289 bytes.as_ref().len(),
290 ))
291 }
292 }
293
294 #[inline(always)]
295 fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u128) -> u128 {
296 unsafe {
297 mem::transmute(ffi::CityHash128WithSeed(
298 bytes.as_ref().as_ptr() as *const i8,
299 bytes.as_ref().len(),
300 mem::transmute(seed),
301 ))
302 }
303 }
304}
305
306trivial_hasher! {
307 /// # Example
308 ///
309 /// ```
310 /// use std::hash::Hasher;
311 ///
312 /// use fasthash::{city::Hasher128, FastHasher, HasherExt};
313 ///
314 /// let mut h = Hasher128::new();
315 ///
316 /// h.write(b"hello");
317 /// assert_eq!(h.finish_ext(), 321050694807308650239948771137913318383);
318 ///
319 /// h.write(b"world");
320 /// assert_eq!(h.finish_ext(), 137438709495761624905137796394169174828);
321 /// ```
322 Hasher128(Hash128) -> u128
323}
324
325/// `CityHash` hash functions using HW CRC instruction.
326#[cfg(any(feature = "sse42", target_feature = "sse4.2"))]
327pub mod crc {
328 use std::mem;
329
330 use crate::FastHash;
331
332 /// `CityHash` 128-bit hash functions using HW CRC instruction.
333 ///
334 /// # Example
335 ///
336 /// ```
337 /// use fasthash::{city::crc::Hash128, FastHash};
338 ///
339 /// assert_eq!(
340 /// Hash128::hash(b"hello"),
341 /// 321050694807308650239948771137913318383
342 /// );
343 /// assert_eq!(
344 /// Hash128::hash_with_seed(b"hello", 123),
345 /// 191203071519574338941297548675763958113
346 /// );
347 /// assert_eq!(
348 /// Hash128::hash(b"helloworld"),
349 /// 137438709495761624905137796394169174828
350 /// );
351 /// ```
352
353 #[derive(Clone, Default)]
354 pub struct Hash128;
355
356 impl FastHash for Hash128 {
357 type Hash = u128;
358 type Seed = u128;
359
360 #[inline(always)]
361 fn hash<T: AsRef<[u8]>>(bytes: T) -> u128 {
362 unsafe {
363 mem::transmute(ffi::CityHashCrc128(
364 bytes.as_ref().as_ptr() as *const i8,
365 bytes.as_ref().len(),
366 ))
367 }
368 }
369
370 #[inline(always)]
371 fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u128) -> u128 {
372 unsafe {
373 mem::transmute(ffi::CityHashCrc128WithSeed(
374 bytes.as_ref().as_ptr() as *const i8,
375 bytes.as_ref().len(),
376 mem::transmute(seed),
377 ))
378 }
379 }
380 }
381
382 trivial_hasher! {
383 /// # Example
384 ///
385 /// ```
386 /// use std::hash::Hasher;
387 ///
388 /// use fasthash::{city::crc::Hasher128, FastHasher, HasherExt};
389 ///
390 /// let mut h = Hasher128::new();
391 ///
392 /// h.write(b"hello");
393 /// assert_eq!(h.finish_ext(), 321050694807308650239948771137913318383);
394 ///
395 /// h.write(b"world");
396 /// assert_eq!(h.finish_ext(), 137438709495761624905137796394169174828);
397 /// ```
398 Hasher128(Hash128) -> u128
399 }
400}
401
402/// `CityHash` 32-bit hash functions for a byte array.
403#[inline(always)]
404pub fn hash32<T: AsRef<[u8]>>(v: T) -> u32 {
405 Hash32::hash(v)
406}
407
408/// `CityHash` 32-bit hash function for a byte array.
409///
410/// For convenience, a 32-bit seed is also hashed into the result.
411#[inline(always)]
412pub fn hash32_with_seed<T: AsRef<[u8]>>(v: T, seed: u32) -> u32 {
413 Hash32::hash_with_seed(v, seed)
414}
415
416/// `CityHash` 64-bit hash functions for a byte array.
417#[inline(always)]
418pub fn hash64<T: AsRef<[u8]>>(v: T) -> u64 {
419 Hash64::hash(v)
420}
421
422/// `CityHash` 64-bit hash function for a byte array.
423///
424/// For convenience, a 64-bit seed is also hashed into the result.
425#[inline(always)]
426pub fn hash64_with_seed<T: AsRef<[u8]>>(v: T, seed: u64) -> u64 {
427 Hash64::hash_with_seed(v, seed)
428}
429
430/// `CityHash` 64-bit hash function for a byte array.
431///
432/// For convenience, two seeds are also hashed into the result.
433#[inline(always)]
434pub fn hash64_with_seeds<T: AsRef<[u8]>>(v: T, seed0: u64, seed1: u64) -> u64 {
435 Hash64::hash_with_seeds(v, seed0, seed1)
436}
437
438cfg_if! {
439 if #[cfg(any(feature = "sse42", target_feature = "sse4.2"))] {
440 /// `CityHash` 128-bit hash function for a byte array using HW CRC instruction.
441 ///
442 /// That require SSE4.2 instructions to be available.
443 #[inline(always)]
444 pub fn hash128<T: AsRef<[u8]>>(v: T) -> u128 {
445 crc::Hash128::hash(v)
446 }
447
448 /// `CityHash` 128-bit hash function for a byte array using HW CRC instruction.
449 ///
450 /// For convenience, a 128-bit seed is also hashed into the result.
451 /// That require SSE4.2 instructions to be available.
452 #[inline(always)]
453 pub fn hash128_with_seed<T: AsRef<[u8]>>(v: T, seed: u128) -> u128 {
454 crc::Hash128::hash_with_seed(v, seed)
455 }
456 } else {
457 /// `CityHash` 128-bit hash function for a byte array.
458 #[inline(always)]
459 pub fn hash128<T: AsRef<[u8]>>(v: T) -> u128 {
460 Hash128::hash(v)
461 }
462
463 /// `CityHash` 128-bit hash function for a byte array.
464 ///
465 /// For convenience, a 128-bit seed is also hashed into the result.
466 #[inline(always)]
467 pub fn hash128_with_seed<T: AsRef<[u8]>>(v: T, seed: u128) -> u128 {
468 Hash128::hash_with_seed(v, seed)
469 }
470 }
471}