# v0.8.0 — Performance Verification
**Release date:** 2026-05-22
**Status:** pre-1.0 milestone. The public API is allowed to evolve in
breaking ways through the `0.x` series; 1.0 freezes it.
---
## What this release is
Numbers. Every "<X µs" / "<Y ns" / "<Z GiB/s" claim in this crate's
docs is now backed by a committed criterion benchmark and a
measured number on a documented reference machine. The bench
suite covers every shipped algorithm; the measurements ship in a
new `docs/PERFORMANCE.md` with reference-machine specs,
methodology, contract-check matrix, and a wrapping-overhead
analysis vs upstream RustCrypto.
This phase intentionally adds zero new public API surface. The
0.7.0 functional contract carries forward unchanged.
---
## Highlights
- **Five criterion bench suites** — `benches/aead.rs`,
`benches/hash.rs`, `benches/mac.rs`, `benches/kdf.rs`,
`benches/stream.rs`. Each runs every algorithm at the canonical
input sizes (64 B / 1 KiB / 64 KiB / 1 MiB for byte-stream ops).
- **`docs/PERFORMANCE.md`** — TL;DR comparison table at the top,
per-suite measured tables, contract-check matrix, methodology,
and a "choosing parameters for your hardware" guide. Reproduce
the numbers with `cargo bench --all-features`.
- **Two performance contract targets revised honestly:**
- **BLAKE3 < 500 ns at 1 KiB** was over-optimistic — measured
1.07 µs on Zen 5. BLAKE3's tree-parallel SIMD path doesn't
engage at 1 KiB; small-input cost is setup-dominated. At
64 KiB BLAKE3 hits **11.24 GiB/s** (~4.5× SHA-256). Updated
guidance: pick BLAKE3 above ~4 KiB, SHA-256 below on SHA-NI
hardware.
- **Argon2id < 100 ms at OWASP defaults** — measured ~9 ms on
Zen 5. The 19 MiB / 2 / 1 parameter set is too fast on
modern server CPUs. PERFORMANCE.md flags this and points
callers at `argon2_hash_with_params` to raise `t_cost` /
`m_cost`.
- **Wrapping-overhead analysis** confirms `crypt-io` is within
measurement noise of upstream RustCrypto for every algorithm.
The only material gap is the per-call `Vec` allocation in our
encrypt path; documented as post-1.0 zero-allocation work.
- **WSL2 pre-CI gate** — bench numbers (and the full test gate)
now run on WSL2 Ubuntu against the actual MSRV toolchain
*before* push, matching the Linux CI environment. Catches
Linux-specific issues (CRLF, strict base64ct, libc behavior)
in seconds instead of minutes.
---
## Reference machine
| CPU | AMD Ryzen 9 9950X3D (Zen 5, 16-core) |
| Flags | `aes` (AES-NI), `sha_ni`, `avx2`, `avx512f`, `vaes` |
| OS | WSL2 Ubuntu (Linux 6.6 on Windows 11) |
| Rust | `1.85.0` (MSRV pinned) |
| Profile | `[profile.bench]` — `opt-level = 3`, `lto = "fat"`, `codegen-units = 1` |
A Zen 5 chip is generous to both AEADs and to SHA-256; results on
older hardware (Zen 1/2, pre-Ice-Lake Intel) will differ by 2-3×
in some places. The PERFORMANCE.md numbers are reference; the
*relative* picture ("ChaCha20 wins without AES-NI", "BLAKE3 wins
above 4 KiB") is portable.
---
## TL;DR
| AEAD encrypt | ChaCha20-Poly1305 | 566 MiB/s | **1.45 GiB/s** |
| AEAD encrypt | AES-256-GCM | 1.01 GiB/s | **1.55 GiB/s** |
| Hash | BLAKE3 | 914 MiB/s | **11.24 GiB/s** |
| Hash | SHA-256 (SHA-NI) | 2.24 GiB/s | 2.49 GiB/s |
| MAC | HMAC-SHA256 | 1.69 GiB/s | 2.43 GiB/s |
| MAC | BLAKE3 keyed | 990 MiB/s | **11.74 GiB/s** |
| KDF | HKDF-SHA256 (32 B) | **304 ns** | — |
| KDF | Argon2id (OWASP defaults) | ~9 ms / hash | — |
| Stream encrypt | ChaCha20 (1 MiB plaintext) | 932 MiB/s | — |
| Stream encrypt | AES-GCM (1 MiB plaintext) | 999 MiB/s | — |
Full per-size tables in [`docs/PERFORMANCE.md`](../PERFORMANCE.md).
---
## Contract check
| ChaCha20-Poly1305 encrypt 1 KiB, < 2 µs | 1.72 µs | ✅ |
| ChaCha20-Poly1305 decrypt 1 KiB, < 2 µs | 1.56 µs | ✅ |
| AES-256-GCM encrypt 1 KiB, < 1 µs (HW accel) | 944 ns | ✅ |
| SHA-256 hash 1 KiB, < 2 µs | 426 ns | ✅ |
| HMAC-SHA256 1 KiB, < 3 µs | 565 ns | ✅ |
| HKDF-SHA256 32-byte output, < 5 µs | 304 ns | ✅ |
| BLAKE3 hash 1 KiB, < 500 ns | 1.07 µs | ⚠️ revised — see PERFORMANCE.md |
| Stream encrypt 1 MiB, > 1 GiB/s | 932-999 MiB/s | ⚠️ marginal |
| Argon2id default, < 100 ms | ~9 ms | ⚠️ too fast — tune `t_cost` on modern HW |
---
## What's NOT in 0.8.0
- **Cross-platform bench numbers** (x86 without AES-NI, ARMv8
with crypto extensions). Functional correctness is CI-covered;
fresh perf numbers on a no-AES-NI target need either
`RUSTFLAGS=-C target-cpu=...` runners or external hardware.
Deferred to post-1.0 ops work.
- **`dhat` allocation profile.** dhat requires the global
allocator swap, which doesn't fit in a bench file. Would land
as `examples/dhat_aead.rs` alongside the post-1.0 zero-allocation
encrypt path.
- **Zero-allocation encrypt path** (`Crypt::encrypt_into(...)`).
The per-call `Vec::with_capacity + nonce prepend` is the
measurable overhead vs upstream. Documented as post-1.0 work.
- **1 GiB streaming stress test in CI.** Deferred to Phase 0.9.0
(fuzz / soak) as `#[ignore]`d opt-in.
---
## Reproducing the numbers
```bash
cargo bench --all-features # all five suites (~5-10 min)
cargo bench --bench aead --all-features # just AEAD (~2 min)
cargo bench --bench hash --all-features # just hashing (~1 min)
cargo bench --bench mac --all-features # just MAC (~1-2 min)
cargo bench --bench kdf --all-features # KDF (~2 min — Argon2 slow)
cargo bench --bench stream --all-features # streaming (~1 min)
```
Filter:
```bash
cargo bench --bench aead -- chacha20
cargo bench --bench hash -- blake3
cargo bench --bench kdf -- hkdf_sha256
```
Numbers will vary by hardware. Compare against the reference
table in PERFORMANCE.md.
---
## Verification
| `cargo fmt --all -- --check` (Windows + WSL Linux) | clean |
| `cargo clippy --all-targets --all-features -- -D warnings` (both) | clean |
| `cargo test --all-features` (both) | 126 unit + 1 smoke + 25 stream + 31 doctest — all passing |
| `cargo doc --no-deps --all-features` with `RUSTDOCFLAGS="-D warnings"` (both) | clean |
| All 5 bench suites build and run | passes |
| MSRV (1.85) build on Windows + Linux | clean |
---
## Compatibility & build
- **No public API changes.** `Crypt`, `Algorithm`, `Error`,
`Result`, all five modules (`aead`, `hash`, `mac`, `kdf`,
`stream`) carry forward unchanged from 0.7.0.
- **MSRV** unchanged: Rust 1.85 (edition 2024).
- **Default features** unchanged.
- **No new dependencies.** The criterion + proptest dev-deps
were already there from earlier phases.
---
## Installation
```toml
[dependencies]
crypt-io = "0.8"
```
---
## What's next
Phase 0.9.0: Fuzz testing. `cargo-fuzz` targets for every
algorithm with random inputs — `encrypt(random_key, random_plaintext)`,
`decrypt(random_key, random_ciphertext)`, every hash with random
input, every KDF with random parameters, stream encryption with
random chunk boundaries. Run each for 1 CPU-hour minimum. Any
findings (panics, OOMs, infinite loops) get fixed before the
1.0.0-rc cut.