fasthash_fork/
city.rs

1//! `CityHash`, a family of hash functions for strings.
2//!
3//! by Geoff Pike and Jyrki Alakuijala
4//!
5//! https://github.com/google/cityhash
6//!
7//! Introduction
8//! ============
9//! `CityHash` provides hash functions for strings.  The functions mix the
10//! input bits thoroughly but are not suitable for cryptography.  See
11//! "Hash Quality," below, for details on how `CityHash` was tested and so on.
12//!
13//! We provide reference implementations in C++, with a friendly MIT license.
14//!
15//! `CityHash32` returns a 32-bit hash.
16//!
17//! `CityHash64` and similar return a 64-bit hash.
18//!
19//! `CityHash128` and similar return a 128-bit hash and are tuned for
20//! strings of at least a few hundred bytes.  Depending on your compiler
21//! and hardware, it's likely faster than `CityHash64` on sufficiently long
22//! strings.  It's slower than necessary on shorter strings, but we expect
23//! that case to be relatively unimportant.
24//!
25//! `CityHashCrc128` and similar are variants of `CityHash128` that depend
26//! on `_mm_crc32_u64()`, an intrinsic that compiles to a CRC32 instruction
27//! on some CPUs.  However, none of the functions we provide are CRCs.
28//!
29//! `CityHashCrc256` is a variant of `CityHashCrc128` that also depends
30//! on `_mm_crc32_u64()`.  It returns a 256-bit hash.
31//!
32//! All members of the `CityHash` family were designed with heavy reliance
33//! on previous work by Austin Appleby, Bob Jenkins, and others.
34//! For example, `CityHash32` has many similarities with `Murmur3a`.
35//!
36//! Performance on long strings: 64-bit CPUs
37//! ========================================
38//!
39//! We are most excited by the performance of `CityHash64` and its variants on
40//! short strings, but long strings are interesting as well.
41//!
42//! `CityHash` is intended to be fast, under the constraint that it hash very
43//! well.  For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't
44//! designed as a hash function and shouldn't be used as one.  `CityHashCrc128`
45//! is not a CRC, but it uses the CRC32 machinery.
46//!
47//! On a single core of a 2.67GHz Intel Xeon X5550, `CityHashCrc256` peaks at about
48//! 5 to 5.5 bytes/cycle.  The other `CityHashCrc` functions are wrappers around
49//! `CityHashCrc256` and should have similar performance on long strings.
50//! (`CityHashCrc256` in v1.0.3 was even faster, but we decided it wasn't as thorough
51//! as it should be.)  `CityHash128` peaks at about 4.3 bytes/cycle.  The fastest
52//! Murmur variant on that hardware, `Murmur3F`, peaks at about 2.4 bytes/cycle.
53//! We expect the peak speed of `CityHash128` to dominate `CityHash64`, which is
54//! aimed more toward short strings or use in hash tables.
55//!
56//! For long strings, a new function by Bob Jenkins, `SpookyHash`, is just
57//! slightly slower than `CityHash128` on Intel x86-64 CPUs, but noticeably
58//! faster on AMD x86-64 CPUs.  For hashing long strings on AMD CPUs
59//! and/or CPUs without the CRC instruction, `SpookyHash` may be just as
60//! good or better than any of the `CityHash` variants.
61//!
62//! Performance on short strings: 64-bit CPUs
63//! =========================================
64//!
65//! For short strings, e.g., most hash table keys, `CityHash64` is faster than
66//! `CityHash128`, and probably faster than all the aforementioned functions,
67//! depending on the mix of string lengths.  Here are a few results from that
68//! same hardware, where we (unrealistically) tested a single string length over
69//! and over again:
70//!
71//! Hash              Results
72//! ------------------------------------------------------------------------------
73//! `CityHash64` v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
74//! `Murmur2` (64-bit)  6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes
75//! `Murmur3F`          14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
76//!
77//! We don't have `CityHash64` benchmarks results for v1.1, but we expect the
78//! numbers to be similar.
79//!
80//! Performance: 32-bit CPUs
81//! ========================
82//!
83//! `CityHash32` is the newest variant of `CityHash`.  It is intended for
84//! 32-bit hardware in general but has been mostly tested on x86.  Our benchmarks
85//! suggest that `Murmur3` is the nearest competitor to `CityHash32` on x86.
86//! We don't know of anything faster that has comparable quality.  The speed rankings
87//! in our testing: `CityHash32` > `Murmur3`f > `Murmur3`a (for long strings), and
88//! `CityHash32` > `Murmur3a` > `Murmur3f` (for short strings).
89//!
90//! Limitations
91//! ===========
92//!
93//! 1) `CityHash32` is intended for little-endian 32-bit code, and everything else in
94//! the current version of `CityHash` is intended for little-endian 64-bit CPUs.
95//!
96//! All functions that don't use the CRC32 instruction should work in
97//! little-endian 32-bit or 64-bit code.  `CityHash` should work on big-endian CPUs
98//! as well, but we haven't tested that very thoroughly yet.
99//!
100//! 2) `CityHash` is fairly complex.  As a result of its complexity, it may not
101//! perform as expected on some compilers.  For example, preliminary reports
102//! suggest that some Microsoft compilers compile `CityHash` to assembly that's
103//! 10-20% slower than it could be.
104//!
105//! # Example
106//!
107//! ```
108//! use std::hash::{Hash, Hasher};
109//!
110//! use fasthash::{city, CityHasher};
111//!
112//! fn hash<T: Hash>(t: &T) -> u64 {
113//!     let mut s: CityHasher = Default::default();
114//!     t.hash(&mut s);
115//!     s.finish()
116//! }
117//!
118//! let h = city::hash64(b"hello world\xff");
119//!
120//! assert_eq!(h, hash(&"hello world"));
121//! ```
122//!
123use std::mem;
124
125use crate::ffi;
126
127use crate::hasher::FastHash;
128
129/// `CityHash` 32-bit hash functions
130///
131/// # Example
132///
133/// ```
134/// use fasthash::{city::Hash32, FastHash};
135///
136/// assert_eq!(Hash32::hash(b"hello"), 2039911270);
137/// assert_eq!(Hash32::hash_with_seed(b"hello", 123), 3366460263);
138/// assert_eq!(Hash32::hash(b"helloworld"), 4037657980);
139/// ```
140#[derive(Clone, Default)]
141pub struct Hash32;
142
143impl FastHash for Hash32 {
144    type Hash = u32;
145    type Seed = u32;
146
147    #[inline(always)]
148    fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u32) -> u32 {
149        unsafe {
150            ffi::CityHash32WithSeed(
151                bytes.as_ref().as_ptr() as *const i8,
152                bytes.as_ref().len(),
153                seed,
154            )
155        }
156    }
157}
158
159trivial_hasher! {
160    /// # Example
161    ///
162    /// ```
163    /// use std::hash::Hasher;
164    ///
165    /// use fasthash::{city::Hasher32, FastHasher};
166    ///
167    /// let mut h = Hasher32::new();
168    ///
169    /// h.write(b"hello");
170    /// assert_eq!(h.finish(), 2039911270);
171    ///
172    /// h.write(b"world");
173    /// assert_eq!(h.finish(), 4037657980);
174    /// ```
175    Hasher32(Hash32) -> u32
176}
177
178/// `CityHash` 64-bit hash functions
179///
180/// # Example
181///
182/// ```
183/// use fasthash::{city::Hash64, FastHash};
184///
185/// assert_eq!(Hash64::hash(b"hello"), 2578220239953316063);
186/// assert_eq!(
187///     Hash64::hash_with_seed(b"hello", 123),
188///     11802079543206271427
189/// );
190/// assert_eq!(
191///     Hash64::hash_with_seeds(b"hello", 123, 456),
192///     13699505624668345539
193/// );
194/// assert_eq!(Hash64::hash(b"helloworld"), 16622738483577116029);
195/// ```
196#[derive(Clone, Default)]
197pub struct Hash64;
198
199impl Hash64 {
200    /// Hash functions for a byte array.
201    /// For convenience, seeds are also hashed into the result.
202    #[inline(always)]
203    pub fn hash_with_seeds<T: AsRef<[u8]>>(bytes: T, seed0: u64, seed1: u64) -> u64 {
204        unsafe {
205            ffi::CityHash64WithSeeds(
206                bytes.as_ref().as_ptr() as *const i8,
207                bytes.as_ref().len(),
208                seed0,
209                seed1,
210            )
211        }
212    }
213}
214
215impl FastHash for Hash64 {
216    type Hash = u64;
217    type Seed = u64;
218
219    #[inline(always)]
220    fn hash<T: AsRef<[u8]>>(bytes: T) -> u64 {
221        unsafe { ffi::CityHash64(bytes.as_ref().as_ptr() as *const i8, bytes.as_ref().len()) }
222    }
223
224    /// Hash functions for a byte array.
225    /// For convenience, a seed is also hashed into the result.
226    #[inline(always)]
227    fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u64) -> u64 {
228        unsafe {
229            ffi::CityHash64WithSeed(
230                bytes.as_ref().as_ptr() as *const i8,
231                bytes.as_ref().len(),
232                seed,
233            )
234        }
235    }
236}
237
238trivial_hasher! {
239    /// # Example
240    ///
241    /// ```
242    /// use std::hash::Hasher;
243    ///
244    /// use fasthash::{city::Hasher64, FastHasher};
245    ///
246    /// let mut h = Hasher64::new();
247    ///
248    /// h.write(b"hello");
249    /// assert_eq!(h.finish(), 2578220239953316063);
250    ///
251    /// h.write(b"world");
252    /// assert_eq!(h.finish(), 16622738483577116029);
253    /// ```
254    Hasher64(Hash64) -> u64
255}
256
257/// `CityHash` 128-bit hash functions
258///
259/// # Example
260///
261/// ```
262/// use fasthash::{city::Hash128, FastHash};
263///
264/// assert_eq!(
265///     Hash128::hash(b"hello"),
266///     321050694807308650239948771137913318383,
267/// );
268/// assert_eq!(
269///     Hash128::hash_with_seed(b"hello", 123),
270///     191203071519574338941297548675763958113
271/// );
272/// assert_eq!(
273///     Hash128::hash(b"helloworld"),
274///     137438709495761624905137796394169174828
275/// );
276/// ```
277#[derive(Clone, Default)]
278pub struct Hash128;
279
280impl FastHash for Hash128 {
281    type Hash = u128;
282    type Seed = u128;
283
284    #[inline(always)]
285    fn hash<T: AsRef<[u8]>>(bytes: T) -> u128 {
286        unsafe {
287            mem::transmute(ffi::CityHash128(
288                bytes.as_ref().as_ptr() as *const i8,
289                bytes.as_ref().len(),
290            ))
291        }
292    }
293
294    #[inline(always)]
295    fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u128) -> u128 {
296        unsafe {
297            mem::transmute(ffi::CityHash128WithSeed(
298                bytes.as_ref().as_ptr() as *const i8,
299                bytes.as_ref().len(),
300                mem::transmute(seed),
301            ))
302        }
303    }
304}
305
306trivial_hasher! {
307    /// # Example
308    ///
309    /// ```
310    /// use std::hash::Hasher;
311    ///
312    /// use fasthash::{city::Hasher128, FastHasher, HasherExt};
313    ///
314    /// let mut h = Hasher128::new();
315    ///
316    /// h.write(b"hello");
317    /// assert_eq!(h.finish_ext(), 321050694807308650239948771137913318383);
318    ///
319    /// h.write(b"world");
320    /// assert_eq!(h.finish_ext(), 137438709495761624905137796394169174828);
321    /// ```
322    Hasher128(Hash128) -> u128
323}
324
325/// `CityHash` hash functions using HW CRC instruction.
326#[cfg(any(feature = "sse42", target_feature = "sse4.2"))]
327pub mod crc {
328    use std::mem;
329
330    use crate::FastHash;
331
332    /// `CityHash` 128-bit hash functions using HW CRC instruction.
333    ///
334    /// # Example
335    ///
336    /// ```
337    /// use fasthash::{city::crc::Hash128, FastHash};
338    ///
339    /// assert_eq!(
340    ///     Hash128::hash(b"hello"),
341    ///     321050694807308650239948771137913318383
342    /// );
343    /// assert_eq!(
344    ///     Hash128::hash_with_seed(b"hello", 123),
345    ///     191203071519574338941297548675763958113
346    /// );
347    /// assert_eq!(
348    ///     Hash128::hash(b"helloworld"),
349    ///     137438709495761624905137796394169174828
350    /// );
351    /// ```
352
353    #[derive(Clone, Default)]
354    pub struct Hash128;
355
356    impl FastHash for Hash128 {
357        type Hash = u128;
358        type Seed = u128;
359
360        #[inline(always)]
361        fn hash<T: AsRef<[u8]>>(bytes: T) -> u128 {
362            unsafe {
363                mem::transmute(ffi::CityHashCrc128(
364                    bytes.as_ref().as_ptr() as *const i8,
365                    bytes.as_ref().len(),
366                ))
367            }
368        }
369
370        #[inline(always)]
371        fn hash_with_seed<T: AsRef<[u8]>>(bytes: T, seed: u128) -> u128 {
372            unsafe {
373                mem::transmute(ffi::CityHashCrc128WithSeed(
374                    bytes.as_ref().as_ptr() as *const i8,
375                    bytes.as_ref().len(),
376                    mem::transmute(seed),
377                ))
378            }
379        }
380    }
381
382    trivial_hasher! {
383        /// # Example
384        ///
385        /// ```
386        /// use std::hash::Hasher;
387        ///
388        /// use fasthash::{city::crc::Hasher128, FastHasher, HasherExt};
389        ///
390        /// let mut h = Hasher128::new();
391        ///
392        /// h.write(b"hello");
393        /// assert_eq!(h.finish_ext(), 321050694807308650239948771137913318383);
394        ///
395        /// h.write(b"world");
396        /// assert_eq!(h.finish_ext(), 137438709495761624905137796394169174828);
397        /// ```
398        Hasher128(Hash128) -> u128
399    }
400}
401
402/// `CityHash` 32-bit hash functions for a byte array.
403#[inline(always)]
404pub fn hash32<T: AsRef<[u8]>>(v: T) -> u32 {
405    Hash32::hash(v)
406}
407
408/// `CityHash` 32-bit hash function for a byte array.
409///
410/// For convenience, a 32-bit seed is also hashed into the result.
411#[inline(always)]
412pub fn hash32_with_seed<T: AsRef<[u8]>>(v: T, seed: u32) -> u32 {
413    Hash32::hash_with_seed(v, seed)
414}
415
416/// `CityHash` 64-bit hash functions for a byte array.
417#[inline(always)]
418pub fn hash64<T: AsRef<[u8]>>(v: T) -> u64 {
419    Hash64::hash(v)
420}
421
422/// `CityHash` 64-bit hash function for a byte array.
423///
424/// For convenience, a 64-bit seed is also hashed into the result.
425#[inline(always)]
426pub fn hash64_with_seed<T: AsRef<[u8]>>(v: T, seed: u64) -> u64 {
427    Hash64::hash_with_seed(v, seed)
428}
429
430/// `CityHash` 64-bit hash function for a byte array.
431///
432/// For convenience, two seeds are also hashed into the result.
433#[inline(always)]
434pub fn hash64_with_seeds<T: AsRef<[u8]>>(v: T, seed0: u64, seed1: u64) -> u64 {
435    Hash64::hash_with_seeds(v, seed0, seed1)
436}
437
438cfg_if! {
439    if #[cfg(any(feature = "sse42", target_feature = "sse4.2"))] {
440        /// `CityHash` 128-bit hash function for a byte array using HW CRC instruction.
441        ///
442        /// That require SSE4.2 instructions to be available.
443        #[inline(always)]
444        pub fn hash128<T: AsRef<[u8]>>(v: T) -> u128 {
445            crc::Hash128::hash(v)
446        }
447
448        /// `CityHash` 128-bit hash function for a byte array using HW CRC instruction.
449        ///
450        /// For convenience, a 128-bit seed is also hashed into the result.
451        /// That require SSE4.2 instructions to be available.
452        #[inline(always)]
453        pub fn hash128_with_seed<T: AsRef<[u8]>>(v: T, seed: u128) -> u128 {
454            crc::Hash128::hash_with_seed(v, seed)
455        }
456    } else {
457        /// `CityHash` 128-bit hash function for a byte array.
458        #[inline(always)]
459        pub fn hash128<T: AsRef<[u8]>>(v: T) -> u128 {
460            Hash128::hash(v)
461        }
462
463        /// `CityHash` 128-bit hash function for a byte array.
464        ///
465        /// For convenience, a 128-bit seed is also hashed into the result.
466        #[inline(always)]
467        pub fn hash128_with_seed<T: AsRef<[u8]>>(v: T, seed: u128) -> u128 {
468            Hash128::hash_with_seed(v, seed)
469        }
470    }
471}