Module fasthash::city [−][src]
CityHash
, a family of hash functions for strings.
by Geoff Pike and Jyrki Alakuijala
https://github.com/google/cityhash
Introduction
CityHash
provides hash functions for strings. The functions mix the
input bits thoroughly but are not suitable for cryptography. See
"Hash Quality," below, for details on how CityHash
was tested and so on.
We provide reference implementations in C++, with a friendly MIT license.
CityHash32
returns a 32-bit hash.
CityHash64
and similar return a 64-bit hash.
CityHash128
and similar return a 128-bit hash and are tuned for
strings of at least a few hundred bytes. Depending on your compiler
and hardware, it's likely faster than CityHash64
on sufficiently long
strings. It's slower than necessary on shorter strings, but we expect
that case to be relatively unimportant.
CityHashCrc128
and similar are variants of CityHash128
that depend
on _mm_crc32_u64()
, an intrinsic that compiles to a CRC32 instruction
on some CPUs. However, none of the functions we provide are CRCs.
CityHashCrc256
is a variant of CityHashCrc128
that also depends
on _mm_crc32_u64()
. It returns a 256-bit hash.
All members of the CityHash
family were designed with heavy reliance
on previous work by Austin Appleby, Bob Jenkins, and others.
For example, CityHash32
has many similarities with Murmur3a
.
Performance on long strings: 64-bit CPUs
We are most excited by the performance of CityHash64
and its variants on
short strings, but long strings are interesting as well.
CityHash
is intended to be fast, under the constraint that it hash very
well. For CPUs with the CRC32 instruction, CRC is speedy, but CRC wasn't
designed as a hash function and shouldn't be used as one. CityHashCrc128
is not a CRC, but it uses the CRC32 machinery.
On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256
peaks at about
5 to 5.5 bytes/cycle. The other CityHashCrc
functions are wrappers around
CityHashCrc256
and should have similar performance on long strings.
(CityHashCrc256
in v1.0.3 was even faster, but we decided it wasn't as thorough
as it should be.) CityHash128
peaks at about 4.3 bytes/cycle. The fastest
Murmur variant on that hardware, Murmur3F
, peaks at about 2.4 bytes/cycle.
We expect the peak speed of CityHash128
to dominate CityHash64
, which is
aimed more toward short strings or use in hash tables.
For long strings, a new function by Bob Jenkins, SpookyHash
, is just
slightly slower than CityHash128
on Intel x86-64 CPUs, but noticeably
faster on AMD x86-64 CPUs. For hashing long strings on AMD CPUs
and/or CPUs without the CRC instruction, SpookyHash
may be just as
good or better than any of the CityHash
variants.
Performance on short strings: 64-bit CPUs
For short strings, e.g., most hash table keys, CityHash64
is faster than
CityHash128
, and probably faster than all the aforementioned functions,
depending on the mix of string lengths. Here are a few results from that
same hardware, where we (unrealistically) tested a single string length over
and over again:
Hash Results
CityHash64
v1.0.3 7ns for 1 byte, or 6ns for 8 bytes, or 9ns for 64 bytes
Murmur2
(64-bit) 6ns for 1 byte, or 6ns for 8 bytes, or 15ns for 64 bytes
Murmur3F
14ns for 1 byte, or 15ns for 8 bytes, or 23ns for 64 bytes
We don't have CityHash64
benchmarks results for v1.1, but we expect the
numbers to be similar.
Performance: 32-bit CPUs
CityHash32
is the newest variant of CityHash
. It is intended for
32-bit hardware in general but has been mostly tested on x86. Our benchmarks
suggest that Murmur3
is the nearest competitor to CityHash32
on x86.
We don't know of anything faster that has comparable quality. The speed rankings
in our testing: CityHash32
> Murmur3
f > Murmur3
a (for long strings), and
CityHash32
> Murmur3a
> Murmur3f
(for short strings).
Limitations
CityHash32
is intended for little-endian 32-bit code, and everything else in the current version ofCityHash
is intended for little-endian 64-bit CPUs.
All functions that don't use the CRC32 instruction should work in
little-endian 32-bit or 64-bit code. CityHash
should work on big-endian CPUs
as well, but we haven't tested that very thoroughly yet.
CityHash
is fairly complex. As a result of its complexity, it may not perform as expected on some compilers. For example, preliminary reports suggest that some Microsoft compilers compileCityHash
to assembly that's 10-20% slower than it could be.
Example
use std::hash::{Hash, Hasher}; use fasthash::{city, CityHasher}; fn hash<T: Hash>(t: &T) -> u64 { let mut s: CityHasher = Default::default(); t.hash(&mut s); s.finish() } let h = city::hash64(b"hello world\xff"); assert_eq!(h, hash(&"hello world"));
Structs
CityHash32 |
|
CityHash64 |
|
CityHash128 |
|
CityHashCrc128 |
|
CityHasher32 |
An implementation of |
CityHasher64 |
An implementation of |
CityHasher128 |
An implementation of |
CityHasherCrc128 |
An implementation of |
Functions
hash32 |
|
hash64 |
|
hash128 |
|
hash128_with_seed |
|
hash32_with_seed |
|
hash64_with_seed |
|
hash64_with_seeds |
|