Module fasthash::farm [] [src]

FarmHash, a family of hash functions.

by Geoff Pike

https://github.com/google/farmhash

Introduction

FarmHash provides hash functions for strings and other data. The functions mix the input bits thoroughly but are not suitable for cryptography. See "Hash Quality," below, for details on how FarmHash was tested and so on.

We provide reference implementations in C++, with a friendly MIT license.

All members of the FarmHash family were designed with heavy reliance on previous work by Jyrki Alakuijala, Austin Appleby, Bob Jenkins, and others.

Recommended Usage

Our belief is that the typical hash function is mostly used for in-memory hash tables and similar. That use case allows hash functions that differ on different platforms, and that change from time to time. For this, I recommend using wrapper functions in a .h file with comments such as, "may change from time to time, may differ on different platforms, and may change depending on NDEBUG."

Some projects may also require a forever-fixed, portable hash function. Again we recommend using wrapper functions in a .h, but in this case the comments on them would be very different.

We have provided a sample of these wrapper functions in src/farmhash.h. Our hope is that most people will need nothing more than src/farmhash.h and src/farmhash.cc. Those two files are a usable and relatively portable library. (One portability snag: if your compiler doesn't have __builtin_expect then you may need to define FARMHASH_NO_BUILTIN_EXPECT.) For those that prefer using a configure script (perhaps because they want to "make install" later), FarmHash has one, but for many people it's best to ignore it.

Note that the wrapper functions such as Hash() in src/farmhash.h can select one of several hash functions. The selection is done at compile time, based on your machine architecture (e.g., sizeof(size_t)) and the availability of vector instructions (e.g., SSE4.1).

To get the best performance from FarmHash, one will need to think a bit about when to use compiler flags that allow vector instructions and such: -maes, -msse4.2, -mavx, etc., or their equivalents for other compilers. Those are the g++ flags that make g++ emit more types of machine instructions than it otherwise would. For example, if you are confident that you will only be using FarmHash on systems with SSE4.2 and/or AES, you may communicate that to the compiler as explained in src/farmhash.cc. If not, use -maes, -mavx, etc., when you can, and the appropriate choices will be made by via conditional compilation in src/farmhash.cc.

It may be beneficial to try -O3 or other compiler flags as well. I also have found feedback-directed optimization (FDO) to improve the speed of FarmHash.

Further Details

The above instructions will produce a single source-level library that includes multiple hash functions. It will use conditional compilation, and perhaps GCC's multiversioning, to select among the functions. In addition, "make all check" will create an object file using your chosen compiler, and test it. The object file won't necessarily contain all the code that would be used if you were to compile the code on other platforms. The downside of this is obvious: the paths not tested may not actually work if and when you try them. The FarmHash developers try hard to prevent such problems; please let us know if you find bugs.

To aid your cross-platform testing, for each relevant platform you may compile your program that uses farmhash.cc with the preprocessor flag FARMHASHSELFTEST equal to 1. This causes a FarmHash self test to run at program startup; the self test writes output to stdout and then calls std::exit(). You can see this in action by running "make check": see src/farm-test.cc for details.

There's also a trivial workaround to force particular functions to be used: modify the wrapper functions in hash.h. You can prevent choices being made via conditional compilation or multiversioning by choosing FarmHash variants with names like farmhashaa::Hash32, farmhashab::Hash64, etc.: those compute the same hash function regardless of conditional compilation, multiversioning, or endianness. Consult their comments and ifdefs to learn their requirements: for example, they are not all guaranteed to work on all platforms.

Known Issues

1) FarmHash was developed with little-endian architectures in mind. It should work on big-endian too, but less work has gone into optimizing for those platforms. To make FarmHash work properly on big-endian platforms you may need to modify the wrapper .h file and/or your compiler flags to arrange for FARMHASH_BIG_ENDIAN to be defined, though there is logic that tries to figure it out automatically.

2) FarmHash's implementation is fairly complex.

3) The techniques described in dev/INSTRUCTIONS to let hash function developers regenerate src/*.cc from dev/* are hacky and not so portable.

Example

use std::hash::{Hash, Hasher};

use fasthash::{farm, FarmHasher};

fn hash<T: Hash>(t: &T) -> u64 {
    let mut s: FarmHasher = Default::default();
    t.hash(&mut s);
    s.finish()
}

let h = farm::hash64(b"hello world\xff");

assert_eq!(h, hash(&"hello world"));

Structs

FarmHash32

FarmHash 32-bit hash functions

FarmHash64

FarmHash 64-bit hash functions

FarmHash128

FarmHash 128-bit hash functions

FarmHasher32

An implementation of std::hash::Hasher.

FarmHasher64

An implementation of std::hash::Hasher.

FarmHasher128

An implementation of std::hash::Hasher and fasthash::HasherExt.

Functions

fingerprint32

FarmHash 32-bit fingerprint function for a byte array.

fingerprint64

FarmHash 64-bit fingerprint function for a byte array.

fingerprint128

FarmHash 128-bit fingerprint function for a byte array.

hash32

FarmHash 32-bit hash function for a byte array.

hash64

FarmHash 64-bit hash function for a byte array.

hash128

FarmHash 128-bit hash function for a byte array.

hash128_with_seed

FarmHash 128-bit hash function for a byte array. For convenience, a 128-bit seed is also hashed into the result.

hash32_with_seed

FarmHash 32-bit hash function for a byte array. For convenience, a 32-bit seed is also hashed into the result.

hash64_with_seed

FarmHash 64-bit hash function for a byte array. For convenience, a 64-bit seed is also hashed into the result.

hash64_with_seeds

FarmHash 64-bit hash function for a byte array. For convenience, two seeds are also hashed into the result.