turboshake
TurboSHAKE: A Family of eXtendable Output Functions based on round reduced ( 12 rounds ) Keccak[1600] Permutation
Overview
TurboSHAKE is a family of extendable output functions (Xofs) powered by round-reduced ( i.e. 12 -rounds ) Keccak-p[1600, 12] permutation. Keccak-p[1600, 12] has previously been used in fast parallel hashing algorithm KangarooTwelve ( more @ https://keccak.team/kangarootwelve.html ). Recently a formal specification, describing TurboSHAKE was released ( more @ https://ia.cr/2023/342 ) which generally exposes the underlying primitive of KangarooTwelve ( also known as K12, see https://blake12.org ) so that post-quantum public key cryptosystems ( such as Kyber, Dilithium etc. - being standardized by NIST ) benefit from it ( more @ https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/5HveEPBsbxY ).
Here I'm maintaining a Rust library which implements TurboSHAKE{128, 256} Xof s.t. one can absorb arbitrary many bytes into sponge state, finalize sponge and squeeze arbitrary many bytes out of sponge. It also exposes ( not by default, controlled by Rust feature gate "dev" ) raw API for keccak-p[1600, 12] permutation and sponge operations i.e. absorption, finalization and squeezing. Other features ( such as "simdx2" or "simdx4" ) expose advanced Keccak-p[1600, 12] permutation implementation s.t. using {128, 256} -bit SIMD registers for parallelly applying 2 or 4 keccak permutations. See usage section below for more info on how to use these.
Prerequisites
Rust nightly toolchain; see https://rustup.rs for installation guide.
Note Nightly toolchain is required because I use
portable_simdfeature ( more @ https://doc.rust-lang.org/std/simd/struct.Simd.html ) for SIMD implementation of Keccak-p[1600, 12] permutation. See rust-toolchain file for understanding how toolchain version is overridden in this crate.
# When developing this library, I was using
)
I advise you to also use cargo-criterion for running benchmark executable. Read more about it @ https://crates.io/crates/cargo-criterion. You can just issue following command for installing it.
Testing
For ensuring functional correctness of TurboSHAKE{128, 256} implementation, I use test vectors from section 4 ( on page 9 ) and Appendix A ( on page 17 ) of https://datatracker.ietf.org/doc/draft-irtf-cfrg-kangarootwelve. Issue following command to run test cases
To ensure that {2, 4}x SIMD parallel Keccak-p[1600, 12] permutation is correctly implemented, I've added some test cases. Issue following command
RUSTFLAGS="-C opt-level=3 -C target-cpu=native"
Benchmarking
Issue following command for benchmarking round-reduced Keccak-p[1600, 12] permutation and TurboSHAKE{128, 256} Xof, for variable input and output sizes.
Note When benchmarking on
x86,x86_64,aarch64orloongarch64targets, CPU cycles and cycles/ byte metrics are reported, while for other targets, default wallclock timer of criterion.rs is used for reporting time and throughput. I found https://github.com/pornin/crrl/blob/73b33c1efc73d637f3084d197353991a22c10366/benches/util.rs pretty useful for obtaining CPU cycles when benchmarking Rust functions. But I'm using criterion.rs as benchmark harness, hence I decided to go with https://crates.io/crates/criterion-cycles-per-byte plugin, much easier to integrate. But I had to patch it for my usecase and they live in the branchadd-memfenceof my fork ofcriterion-cycles-per-byte( see my commits @ https://github.com/itzmeanjan/criterion-cycles-per-byte/commits/add-memfence ).
Note In case you're running benchmarks on aarch64 target, consider reading https://github.com/itzmeanjan/criterion-cycles-per-byte/blob/d2f5bf8638640962a9b301966dbb3e65fbc6f283/src/lib.rs#L63-L70.
Warning When benchmarking make sure you've disabled CPU frequency scaling, otherwise numbers you see can be pretty misleading. I found https://github.com/google/benchmark/blob/b40db869/docs/reducing_variance.md helpful.
# In case you didn't install `cargo-criterion`, you've to execute benchmark with
# `$ RUSTFLAGS="-C opt-level=3 -C target-cpu=native" cargo bench ...`
# When interested in TurboSHAKE{128, 256} Xof
RUSTFLAGS="-C opt-level=3 -C target-cpu=native"
# When interested in scalar Keccak-p[1600, 12] permutation
RUSTFLAGS="-C opt-level=3 -C target-cpu=native"
# When interested in 2x SIMD parallel Keccak-p[1600, 12] permutation
RUSTFLAGS="-C opt-level=3 -C target-cpu=native"
# When interested in 4x SIMD parallel Keccak-p[1600, 12] permutation
RUSTFLAGS="-C opt-level=3 -C target-cpu=native"
On 12th Gen Intel(R) Core(TM) i7-1260P
TurboSHAKE{128, 256} Xof
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
Scalar Keccak-p[1600, 12] Permutation
)
)
2x SIMD parallel Keccak-p[1600, 12] Permutation
)
)
4x SIMD parallel Keccak-p[1600, 12] Permutation
)
)
On ARM Cortex-A72 (Raspberry Pi 4B)
TurboSHAKE{128, 256} Xof
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
Scalar Keccak-p[1600, 12] Permutation
)
)
2x SIMD parallel Keccak-p[1600, 12] Permutation
)
)
4x SIMD parallel Keccak-p[1600, 12] Permutation
)
)
Usage
Using TurboSHAKE{128, 256} Xof API is fairly easy
- Add
turboshaketo Cargo.toml, with proper ( or may be none if you're only using it for TurboSHAKE Xof ) feature flags ( based on your intended use case ), as your project dependency
[]
# If only interested in using TurboSHAKE{128, 256} Xof API, do
# either
= { = "https://github.com/itzmeanjan/turboshake" }
# or
= "0.1.9"
# If interested in using underlying keccak-p[1600, 12] permutation and sponge (developer) API
= { = "0.1.9", = "dev" }
# or if interested in using underlying 2x SIMD parallel keccak-p[1600, 12] permutation API
= { = "0.1.9", = ["dev", "simdx2"] }
# or if interested in using underlying 4x SIMD parallel keccak-p[1600, 12] permutation API
= { = "0.1.9", = ["dev", "simdx4"] }
- Create a TurboSHAKE{128, 256} Xof object.
use turboshake;
- Absorb N(>=0) -bytes message into sponge state by invoking
absorb()M(>1) -many times.
hasher.absorb;
hasher.absorb;
hasher.absorb;
- When all message bytes are consumed, finalize sponge state by calling
finalize().
// Note, one needs to pass a domain seperator constant byte in finalization step.
// You can use 0x1f ( i.e. default domain seperator value ) if you're not using
// multiple instances of TurboSHAKE. Consider reading section 1 ( top of page 2 )
// of TurboSHAKE specification https://eprint.iacr.org/2023/342.pdf.
hasher. DEFAULT_DOMAIN_SEPARATOR }>;
- Now sponge is ready to be squeezed i.e. read arbitrary many bytes by invoking
squeeze()arbitrary many times.
hasher.squeeze;
hasher.squeeze;
- Finally you can reset the state of the sponge and restart the whole
absorb->finalize->squeezecycle.
hasher.reset;
I maintain two examples demonstrating use of TurboSHAKE{128, 256} Xof API.
You should be able to run those examples with following commands
# or
I also maintain examples showing usage of keccak-p[1600, 12] permutation, hidden behind "dev" feature-gate, in keccak.rs. Run that example by issuing
In case you're planning to use {2, 4}x SIMD parallel Keccak-p[1600, 12] permutation, which is hidden behind dev and simdx{2,4} feature-gates, consider looking at simd_keccak.rs. You can run that example by issuing