Crate blake2b_simd
source ·Expand description
An implementation of the BLAKE2b hash with:
- 100% stable Rust.
- An AVX2 implementation ported from Samuel Neves’ implementation. This implementation is faster than any hash function provided by OpenSSL. See the Performance section below.
- A portable, safe implementation for other platforms.
- Dynamic CPU feature detection. Binaries for x86 include the AVX2 implementation by default and call it if the processor supports it at runtime.
- All the features from the the BLAKE2 spec, like adjustable length, keying, and associated data for tree hashing.
- A clone of the Coreutils
b2sum
command line utility, provided as a sub-crate.b2sum
includes command line flags for all the BLAKE2 associated data features. no_std
support. Thestd
Cargo feature is on by default, for CPU feature detection and for implementingstd::io::Write
.- An implementation of the parallel BLAKE2bp variant. This implementation is single-threaded,
but it’s twice as fast as BLAKE2b, because it uses AVX2 more efficiently. It’s available on
the command line as
b2sum --blake2bp
. - Support for computing multiple BLAKE2b hashes in parallel, matching the throughput of
BLAKE2bp. See
update4
andfinalize4
.
Example
use blake2b_simd::{blake2b, Params};
let expected = "ca002330e69d3e6b84a46a56a6533fd79d51d97a3bb7cad6c2ff43b354185d6d\
c1e723fb3db4ae0737e120378424c714bb982d9dc5bbd7a0ab318240ddd18f8d";
let hash = blake2b(b"foo");
assert_eq!(expected, &hash.to_hex());
let hash = Params::new()
.hash_length(16)
.key(b"The Magic Words are Squeamish Ossifrage")
.personal(b"L. P. Waterhouse")
.to_state()
.update(b"foo")
.update(b"bar")
.update(b"baz")
.finalize();
assert_eq!("ee8ff4e9be887297cf79348dc35dab56", &hash.to_hex());
An example using the included b2sum
command line utility:
$ cd b2sum
$ cargo build --release
Finished release [optimized] target(s) in 0.04s
$ echo hi | ./target/release/b2sum --length 256
de9543b2ae1b2b87434a730727db17f5ac8b8c020b84a5cb8c5fbcc1423443ba -
Performance
The AVX2 implementation in this crate is a port of Samuel Neves’ implementation, which is also included in libsodium. Most of the credit for performance goes to him. To run small benchmarks yourself, first install OpenSSL and libsodium on your machine, then:
cd benches/cargo_bench
cargo +nightly bench
The benches/benchmark_gig
sub-crate allocates a 1 GB array and repeatedly hashes it to
measure throughput. A similar C program, benches/bench_libsodium.c
, does the same thing using
libsodium’s implementation of BLAKE2b. Here are the results from my laptop:
- Intel Core i5-8250U, Arch Linux, kernel version 4.18.16
- libsodium version 1.0.16, gcc 8.2.1,
gcc -O3 -lsodium benches/bench_libsodium.c
(via the helper scriptbenches/bench_libsodium.sh
) - rustc 1.31.0-nightly (f99911a4a 2018-10-23),
cargo +nightly run --release
╭───────────────────────┬────────────╮
│ blake2b_simd BLAKE2bp │ 2.069 GB/s │
│ blake2b_simd update4 │ 2.057 GB/s │
│ blake2b_simd AVX2 │ 1.005 GB/s │
│ libsodium AVX2 │ 0.939 GB/s │
│ blake2b_simd portable │ 0.771 GB/s │
│ libsodium portable │ 0.743 GB/s │
╰───────────────────────┴────────────╯
The benches/bench_b2sum.py
script benchmarks b2sum
against several Coreutils hashes, on a
1 GB file of random data. Here are the results from my laptop:
╭───────────────────────────────┬────────────╮
│ blake2b_simd b2sum --blake2bp │ 1.423 GB/s │
│ blake2b_simd b2sum │ 0.810 GB/s │
│ coreutils sha1sum │ 0.802 GB/s │
│ coreutils b2sum │ 0.660 GB/s │
│ coreutils md5sum │ 0.600 GB/s │
│ coreutils sha512sum │ 0.593 GB/s │
╰───────────────────────────────┴────────────╯
The benches/count_cycles
sub-crate (cargo +nightly run --release
) measures a long message
throughput of 1.8 cycles per byte for BLAKE2b, and 0.9 cycles per byte for BLAKE2bp and
update4
.
Modules
Structs
State
that exposes all the various BLAKE2 features.Constants
BLOCKBYTES
, or else their apparent throughput will be low.