AES based pseudo-random number generator
This crate implements pseudo-random number generators (PRNG) using the AES block cipher in counter (CTR) mode.
Features
- Based on well-established cryptographic principles.
- Optimized for low latency and high throughput.
- Passes rigorous statistic tests (
practrandandTESTu01's Big Crush). - Provides the
RandomandJumptraits for common functionality. - Supports the traits provided by the
rand_corecrate. - Support for secure initialization is provided by the
getrandomcrate. - Support for no_std.
Crate features
getrandom: Provides secure seeding functionality based on thegetrandomcrate.getrandom: Implements the traits provided by therand_corecrate.tls: Provides thread local based utility functions for easy random number generation.
This crate is no_std compatible when disabling the default features.
Usage
Add this to your Cargo.toml:
[]
= "0.1"
Then you can use it like this:
use ;
Counter implementation details
Uses either a 64-bit or 128-bit counter, which can be randomly seeded at initialization. In case of the 64-bit counter we use a 128-bit variable and will seed the whole variable at initialization. The higher 64-bit acts as a nonce in this case. The counter wraps on overflow, ensuring continuous operation.
Security Note
While based on well-established cryptographic primitives, this PRNG is not intended for cryptographic key generation
or other sensitive cryptographic operations, simply because safe, automatic re-seeding is not provided. We tested its
statistical qualities by running versions with reduced rounds against practrand and TESTu01's Big Crush.
A version with just 3 rounds of AES encryption rounds passes the practrand tests with at least 16 TB.
TESTu01's Big Crush requires at least 5 rounds to be successfully cleared. AES-128 uses 10 rounds, whereas
AES-256 uses 14 rounds.
Supported Architectures
We provide a software implementation of AES in case there is no hardware accelerated AES provided. We provide hardware accelerated versions for the following architectures:
- aarch64: Support since Cortex-A53 (2012).
- riscv64: Must support the scalar based cryptography extension (zk).
- x86_64: Support since Intel's Westmere (2010) and AMD's Bulldozer (2011).
riscv64 needs nightly Rust, since the AES intrinsics are not marked as stable yet.
Optimal performance
We provide runtime detection for the hardware accelerated AES instruction set for all supported
platforms. Should the executing CPU not support hardware accelerated AES, a software fallback
is provided. But we highly recommend to enable the specific target feature on compile time,
since the AES instruction sets are available on modern desktop CPU for at least 10 years.
Enabling the target feature enables the compiler to more aggressively inline and provides
much better performance. The runtime detection is not supported in no_std.
Use the following target features for optimal performance:
- aarch64: "aes" (using the cryptographic extension)
- riscv64: "zkne" (using the scalar based cryptography extension)
- x86_64: "aes" (using AES-NI)
Example in .cargo/config.toml:
[]
= ["-C", "target-feature=+aes"]
[]
= ["-C", "target-feature=+zkne"]
[]
= ["-C", "target-feature=+aes"]
Benchmark
Following benchmarks were made with version v0.1.0 and enabled hardware AES target features.
For aarch64: M1 Pro (14' MacBook Pro, 2021)
Latency/TLS time: [1.1510 ns 1.1522 ns 1.1536 ns]
Latency/Aes128Ctr64 time: [1.1830 ns 1.1895 ns 1.1964 ns]
Latency/Aes128Ctr128 time: [1.3711 ns 1.3747 ns 1.3792 ns]
Latency/Aes256Ctr64 time: [1.8304 ns 1.8478 ns 1.8678 ns]
Latency/Aes256Ctr128 time: [2.0468 ns 2.0474 ns 2.0480 ns]
Latency/ChaCha8 time: [5.9017 ns 5.9146 ns 5.9327 ns]
Latency/ChaCha12 time: [8.3993 ns 8.4023 ns 8.4063 ns]
Latency/ChaCha20 time: [13.577 ns 13.721 ns 13.917 ns]
Latency/Lcg128Xsl64 time: [1.6307 ns 1.6323 ns 1.6344 ns]
Latency/Mcg128Xsl64 time: [1.2409 ns 1.2417 ns 1.2427 ns]
Throughput/TLS thrpt: [12.655 GiB/s 12.657 GiB/s 12.659 GiB/s]
Throughput/Aes128Ctr64 thrpt: [12.601 GiB/s 12.625 GiB/s 12.644 GiB/s]
Throughput/Aes128Ct128 thrpt: [11.315 GiB/s 11.349 GiB/s 11.380 GiB/s]
Throughput/Aes256Ctr64 thrpt: [8.4748 GiB/s 8.4896 GiB/s 8.5014 GiB/s]
Throughput/Aes256Ct128 thrpt: [7.4943 GiB/s 7.5061 GiB/s 7.5158 GiB/s]
Throughput/ChaCha8 thrpt: [1.3391 GiB/s 1.3402 GiB/s 1.3410 GiB/s]
Throughput/ChaCha12 thrpt: [935.03 MiB/s 938.98 MiB/s 942.22 MiB/s]
Throughput/ChaCha20 thrpt: [578.33 MiB/s 580.03 MiB/s 581.44 MiB/s]
Throughput/Lcg128Xsl64 thrpt: [4.5148 GiB/s 4.5335 GiB/s 4.5529 GiB/s]
Throughput/Mcg128Xsl64 thrpt: [5.9140 GiB/s 5.9424 GiB/s 5.9664 GiB/s]
For x86_64: AMD Ryzen 9 5950X
Latency/TLS time: [1.0577 ns 1.0592 ns 1.0616 ns]
Latency/Aes128Ctr64 time: [1.0680 ns 1.0695 ns 1.0712 ns]
Latency/Aes128Ctr128 time: [1.1573 ns 1.1582 ns 1.1592 ns]
Latency/Aes256Ctr64 time: [1.6956 ns 1.7074 ns 1.7252 ns]
Latency/Aes256Ctr128 time: [1.7394 ns 1.7410 ns 1.7427 ns]
Latency/ChaCha8 time: [1.4317 ns 1.4338 ns 1.4363 ns]
Latency/ChaCha12 time: [1.9144 ns 1.9162 ns 1.9182 ns]
Latency/ChaCha20 time: [2.9326 ns 2.9349 ns 2.9375 ns]
Latency/Lcg128Xsl64 time: [1.1708 ns 1.1717 ns 1.1727 ns]
Latency/Mcg128Xsl64 time: [863.11 ps 863.58 ps 864.11 ps]
Throughput/TLS thrpt: [14.118 GiB/s 14.130 GiB/s 14.141 GiB/s]
Throughput/Aes128Ctr64 thrpt: [14.012 GiB/s 14.034 GiB/s 14.054 GiB/s]
Throughput/Aes128Ct128 thrpt: [13.347 GiB/s 13.357 GiB/s 13.367 GiB/s]
Throughput/Aes256Ctr64 thrpt: [8.8344 GiB/s 8.8416 GiB/s 8.8483 GiB/s]
Throughput/Aes256Ct128 thrpt: [7.7617 GiB/s 7.7838 GiB/s 7.8044 GiB/s]
Throughput/ChaCha8 thrpt: [6.6074 GiB/s 6.6142 GiB/s 6.6203 GiB/s]
Throughput/ChaCha12 thrpt: [4.5944 GiB/s 4.5981 GiB/s 4.6015 GiB/s]
Throughput/ChaCha20 thrpt: [2.8697 GiB/s 2.8727 GiB/s 2.8753 GiB/s]
Throughput/Lcg128Xsl64 thrpt: [5.2476 GiB/s 5.3558 GiB/s 5.4613 GiB/s]
Throughput/Mcg128Xsl64 thrpt: [7.6245 GiB/s 7.7582 GiB/s 7.8882 GiB/s]
Acknowledgement
The software based fixsliced implementations of AES-128 and AES-256 is a copy of the
aes crate written by the RustCrypto team. Author of the original C implementation
is Alexandre Adomnicai.
We don't use the AES crate directly, simply because it doesn't inline very well, and we can provide also better inner
mutability this way (since we optimize the fast path, when hardware based AES is available at compile time).
Licence
Licensed under the Apache License, Version 2.0.