crypt-io 0.8.0 - Docs.rs

# v0.8.0 — Performance Verification

**Release date:** 2026-05-22
**Status:** pre-1.0 milestone. The public API is allowed to evolve in
breaking ways through the `0.x` series; 1.0 freezes it.

---

## What this release is

Numbers. Every "<X µs" / "<Y ns" / "<Z GiB/s" claim in this crate's
docs is now backed by a committed criterion benchmark and a
measured number on a documented reference machine. The bench
suite covers every shipped algorithm; the measurements ship in a
new `docs/PERFORMANCE.md` with reference-machine specs,
methodology, contract-check matrix, and a wrapping-overhead
analysis vs upstream RustCrypto.

This phase intentionally adds zero new public API surface. The
0.7.0 functional contract carries forward unchanged.

---

## Highlights

- **Five criterion bench suites** — `benches/aead.rs`,
  `benches/hash.rs`, `benches/mac.rs`, `benches/kdf.rs`,
  `benches/stream.rs`. Each runs every algorithm at the canonical
  input sizes (64 B / 1 KiB / 64 KiB / 1 MiB for byte-stream ops).
- **`docs/PERFORMANCE.md`** — TL;DR comparison table at the top,
  per-suite measured tables, contract-check matrix, methodology,
  and a "choosing parameters for your hardware" guide. Reproduce
  the numbers with `cargo bench --all-features`.
- **Two performance contract targets revised honestly:**
  - **BLAKE3 < 500 ns at 1 KiB** was over-optimistic — measured
    1.07 µs on Zen 5. BLAKE3's tree-parallel SIMD path doesn't
    engage at 1 KiB; small-input cost is setup-dominated. At
    64 KiB BLAKE3 hits **11.24 GiB/s** (~4.5× SHA-256). Updated
    guidance: pick BLAKE3 above ~4 KiB, SHA-256 below on SHA-NI
    hardware.
  - **Argon2id < 100 ms at OWASP defaults** — measured ~9 ms on
    Zen 5. The 19 MiB / 2 / 1 parameter set is too fast on
    modern server CPUs. PERFORMANCE.md flags this and points
    callers at `argon2_hash_with_params` to raise `t_cost` /
    `m_cost`.
- **Wrapping-overhead analysis** confirms `crypt-io` is within
  measurement noise of upstream RustCrypto for every algorithm.
  The only material gap is the per-call `Vec` allocation in our
  encrypt path; documented as post-1.0 zero-allocation work.
- **WSL2 pre-CI gate** — bench numbers (and the full test gate)
  now run on WSL2 Ubuntu against the actual MSRV toolchain
  *before* push, matching the Linux CI environment. Catches
  Linux-specific issues (CRLF, strict base64ct, libc behavior)
  in seconds instead of minutes.

---

## Reference machine

| | |
|---|---|
| CPU | AMD Ryzen 9 9950X3D (Zen 5, 16-core) |
| Flags | `aes` (AES-NI), `sha_ni`, `avx2`, `avx512f`, `vaes` |
| OS | WSL2 Ubuntu (Linux 6.6 on Windows 11) |
| Rust | `1.85.0` (MSRV pinned) |
| Profile | `[profile.bench]` — `opt-level = 3`, `lto = "fat"`, `codegen-units = 1` |

A Zen 5 chip is generous to both AEADs and to SHA-256; results on
older hardware (Zen 1/2, pre-Ice-Lake Intel) will differ by 2-3×
in some places. The PERFORMANCE.md numbers are reference; the
*relative* picture ("ChaCha20 wins without AES-NI", "BLAKE3 wins
above 4 KiB") is portable.

---

## TL;DR

| Operation | Algorithm | @ 1 KiB | @ 64 KiB |
|---|---|---:|---:|
| AEAD encrypt | ChaCha20-Poly1305 | 566 MiB/s | **1.45 GiB/s** |
| AEAD encrypt | AES-256-GCM       | 1.01 GiB/s | **1.55 GiB/s** |
| Hash | BLAKE3 | 914 MiB/s | **11.24 GiB/s** |
| Hash | SHA-256 (SHA-NI) | 2.24 GiB/s | 2.49 GiB/s |
| MAC | HMAC-SHA256 | 1.69 GiB/s | 2.43 GiB/s |
| MAC | BLAKE3 keyed | 990 MiB/s | **11.74 GiB/s** |
| KDF | HKDF-SHA256 (32 B) | **304 ns** | — |
| KDF | Argon2id (OWASP defaults) | ~9 ms / hash | — |
| Stream encrypt | ChaCha20 (1 MiB plaintext) | 932 MiB/s | — |
| Stream encrypt | AES-GCM (1 MiB plaintext) | 999 MiB/s | — |

Full per-size tables in [`docs/PERFORMANCE.md`](../PERFORMANCE.md).

---

## Contract check

| Target | Measured | Status |
|---|---:|:---:|
| ChaCha20-Poly1305 encrypt 1 KiB, < 2 µs | 1.72 µs | ✅ |
| ChaCha20-Poly1305 decrypt 1 KiB, < 2 µs | 1.56 µs | ✅ |
| AES-256-GCM encrypt 1 KiB, < 1 µs (HW accel) | 944 ns | ✅ |
| SHA-256 hash 1 KiB, < 2 µs | 426 ns | ✅ |
| HMAC-SHA256 1 KiB, < 3 µs | 565 ns | ✅ |
| HKDF-SHA256 32-byte output, < 5 µs | 304 ns | ✅ |
| BLAKE3 hash 1 KiB, < 500 ns | 1.07 µs | ⚠️ revised — see PERFORMANCE.md |
| Stream encrypt 1 MiB, > 1 GiB/s | 932-999 MiB/s | ⚠️ marginal |
| Argon2id default, < 100 ms | ~9 ms | ⚠️ too fast — tune `t_cost` on modern HW |

---

## What's NOT in 0.8.0

- **Cross-platform bench numbers** (x86 without AES-NI, ARMv8
  with crypto extensions). Functional correctness is CI-covered;
  fresh perf numbers on a no-AES-NI target need either
  `RUSTFLAGS=-C target-cpu=...` runners or external hardware.
  Deferred to post-1.0 ops work.
- **`dhat` allocation profile.** dhat requires the global
  allocator swap, which doesn't fit in a bench file. Would land
  as `examples/dhat_aead.rs` alongside the post-1.0 zero-allocation
  encrypt path.
- **Zero-allocation encrypt path** (`Crypt::encrypt_into(...)`).
  The per-call `Vec::with_capacity + nonce prepend` is the
  measurable overhead vs upstream. Documented as post-1.0 work.
- **1 GiB streaming stress test in CI.** Deferred to Phase 0.9.0
  (fuzz / soak) as `#[ignore]`d opt-in.

---

## Reproducing the numbers

```bash
cargo bench --all-features                  # all five suites (~5-10 min)
cargo bench --bench aead --all-features     # just AEAD (~2 min)
cargo bench --bench hash --all-features     # just hashing (~1 min)
cargo bench --bench mac --all-features      # just MAC (~1-2 min)
cargo bench --bench kdf --all-features      # KDF (~2 min — Argon2 slow)
cargo bench --bench stream --all-features   # streaming (~1 min)
```

Filter:

```bash
cargo bench --bench aead -- chacha20
cargo bench --bench hash -- blake3
cargo bench --bench kdf -- hkdf_sha256
```

Numbers will vary by hardware. Compare against the reference
table in PERFORMANCE.md.

---

## Verification

| Check | Result |
|---|---|
| `cargo fmt --all -- --check` (Windows + WSL Linux) | clean |
| `cargo clippy --all-targets --all-features -- -D warnings` (both) | clean |
| `cargo test --all-features` (both) | 126 unit + 1 smoke + 25 stream + 31 doctest — all passing |
| `cargo doc --no-deps --all-features` with `RUSTDOCFLAGS="-D warnings"` (both) | clean |
| All 5 bench suites build and run | passes |
| MSRV (1.85) build on Windows + Linux | clean |

---

## Compatibility & build

- **No public API changes.** `Crypt`, `Algorithm`, `Error`,
  `Result`, all five modules (`aead`, `hash`, `mac`, `kdf`,
  `stream`) carry forward unchanged from 0.7.0.
- **MSRV** unchanged: Rust 1.85 (edition 2024).
- **Default features** unchanged.
- **No new dependencies.** The criterion + proptest dev-deps
  were already there from earlier phases.

---

## Installation

```toml
[dependencies]
crypt-io = "0.8"
```

---

## What's next

Phase 0.9.0: Fuzz testing. `cargo-fuzz` targets for every
algorithm with random inputs — `encrypt(random_key, random_plaintext)`,
`decrypt(random_key, random_ciphertext)`, every hash with random
input, every KDF with random parameters, stream encryption with
random chunk boundaries. Run each for 1 CPU-hour minimum. Any
findings (panics, OOMs, infinite loops) get fixed before the
1.0.0-rc cut.