# X.509 PKI Pipeline (x509bench)
End-to-end X.509 PKI pipeline: CA self-signing, subscriber certificate
issuance (parallel), CRL construction and signing, OCSP response
construction and signing (parallel), database persistence, and signature
verification. A fresh ephemeral CA is generated for each batch size so
serial numbers always start at 1.
All results are from: Lenovo ThinkPad P1 Gen 5, 12th Gen i7-12800H, 64 GB RAM,
Linux 6.15.8-200.fc42.x86_64. Release build, mimalloc global allocator, Rayon
thread pool (16 logical cores).
All OpenSSL backend figures were re-run on 2026-04-17 after four successive
optimisations to the OpenSSL backend:
1. **Algorithm handle cache (`alg_cache`)** — replaces per-call
`EVP_MD_fetch`/`EVP_CIPHER_fetch` with a single atomic refcount increment
on all calls after the first.
2. **Public-key parse cache (`BackendPublicKey`)** — `from_der` / `from_pem`
now cache the parsed `EVP_PKEY` handle alongside the SPKI DER bytes.
`verify_signature` reuses the cached handle (O(1) `EVP_PKEY_up_ref`) rather
than calling `d2i_PUBKEY` on every item in the verification loop. The bench
`PrivateCA` was also updated to store a pre-initialised `BackendPublicKey`
so all 1 024 parallel `cert_verify` / `ocsp_verify` calls share a single
parsed key.
3. **Single-shot ML-DSA signing (`sign_into`)** — OpenSSL 3.5's
`EVP_DigestSign(ctx, NULL, &siglen, data, len)` for ML-DSA runs the full
signing computation (it does not merely return the fixed output length).
The original `Signer::sign_oneshot` called `EVP_DigestSign` twice — once
with a null output pointer for the size query and once with the actual
buffer — doubling every ML-DSA signing operation. A new
`Signer::sign_into(data, buf)` method in native-ossl calls `EVP_DigestSign`
once with a pre-allocated buffer sized by the FIPS 204 fixed lengths
(2 420 B for ML-DSA-44, 3 309 B for ML-DSA-65, 4 627 B for ML-DSA-87),
eliminating the redundant computation. ML-DSA-44 `cert_gen` improves by
~34% and ML-DSA-65 by ~20%.
4. **`MessageVerifier` for ML-DSA verification** — replaces the generic
`Verifier::verify_oneshot` (which uses `EVP_DigestVerify`) with
`MessageVerifier::verify` (`EVP_PKEY_sign_message_init` +
`EVP_PKEY_verify_message`). This eliminates the MD dispatch layer for
ML-DSA, which is a no-pre-hash algorithm. ML-DSA-65 `ocsp_verify`
at batch=1 024 improves by ~48% (36.57 ms → 19.14 ms); `cert_verify`
improves by ~27% (31.75 ms → 23.15 ms). ML-DSA-44 verification shows
higher run-to-run variance at these batch sizes.
NSS figures are from 2026-04-09 (neither cache affects the NSS backend).
## Running the Benchmark
```bash
# OpenSSL backend (default)
cargo build --release -p synta-bench --features bench-x509-sqlite --bin x509bench
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ecdsa-p256 --min-seconds 20 --db x509bench-openssl-ecdsa-p256.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ed25519 --min-seconds 20 --db x509bench-openssl-ed25519.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-44 --min-seconds 20 --db x509bench-openssl-ml-dsa-44.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-65 --min-seconds 20 --db x509bench-openssl-ml-dsa-65.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa2048 --min-seconds 20 --db x509bench-openssl-rsa2048.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa3072 --min-seconds 20 --db x509bench-openssl-rsa3072.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa4096 --min-seconds 20 --db x509bench-openssl-rsa4096.sqlite
# NSS backend (cert/CRL/OCSP signing and verification route through NSS)
cargo build --release -p synta-bench --features bench-x509-sqlite-nss --bin x509bench
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ecdsa-p256 --min-seconds 20 --db x509bench-nss-ecdsa-p256.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ed25519 --min-seconds 20 --db x509bench-nss-ed25519.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-44 --min-seconds 20 --db x509bench-nss-ml-dsa-44.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-65 --min-seconds 20 --db x509bench-nss-ml-dsa-65.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa2048 --min-seconds 20 --db x509bench-nss-rsa2048.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa3072 --min-seconds 20 --db x509bench-nss-rsa3072.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa4096 --min-seconds 20 --db x509bench-nss-rsa4096.sqlite
```
Each table below shows both the OpenSSL and NSS backends side by side at
batch=64 and batch=1024. Database operations (insert/read) are unaffected by
backend choice and are shown for completeness. Throughput is items/second ÷ 1000.
## Results: ECDSA P-256
**Configuration:** ECDSA P-256 for both CA and subscriber keys.
| `ca_self_sign` | 0.05 ms | 21.5 | 0.08 ms | 13.3 | 0.19 ms | 5.2 | 0.26 ms | 3.9 | 1 cert/batch |
| `cert_gen` | 0.51 ms | 124.4 | 6.65 ms | 154.0 | 1.86 ms | 34.5 | 25.27 ms | 40.5 | Rayon parallel |
| `db_insert_certs` | 0.71 ms | 90.5 | 4.04 ms | 253.4 | 0.18 ms | 365.5 | 2.56 ms | 399.4 | SQLite WAL |
| `cert_verify` | 0.71 ms | 90.4 | 8.47 ms | 120.8 | 2.77 ms | 23.1 | 45.57 ms | 22.5 | Rayon parallel |
| `db_read_certs` | 0.08 ms | 763.1 | 0.82 ms | 1,242 | 0.07 ms | 937.0 | 0.79 ms | 1,299 | SQLite read |
| `crl_build` | 0.04 ms | 25.1 | 0.72 ms | 1.4 | 0.05 ms | 20.4 | 0.70 ms | 1.4 | 1 CRL covering N serials |
| `crl_sign` | 0.07 ms | 13.4 | 0.82 ms | 1.2 | 0.23 ms | 4.4 | 0.99 ms | 1.0 | 1 CRL/batch |
| `db_insert_crl` | 0.07 ms | 14.3 | 0.18 ms | 5.5 | 0.05 ms | 18.7 | 0.20 ms | 5.0 | SQLite WAL |
| `crl_verify` | 0.07 ms | 14.1 | 0.13 ms | 7.7 | 0.43 ms | 2.3 | 0.52 ms | 1.9 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 54.5 | 0.04 ms | 26.2 | 0.02 ms | 54.1 | 0.03 ms | 31.0 | SQLite read |
| `ocsp_build` | 0.12 ms | 544.5 | 1.20 ms | 856.3 | 0.10 ms | 628.3 | 0.61 ms | 1,685 | Rayon; TBS DER only |
| `ocsp_sign` | 0.45 ms | 143.0 | 4.11 ms | 249.4 | 1.61 ms | 39.6 | 21.77 ms | 47.0 | Rayon parallel |
| `db_insert_ocsp` | 0.38 ms | 167.4 | 3.29 ms | 311.6 | 0.18 ms | 360.5 | 2.46 ms | 416.0 | SQLite WAL |
| `ocsp_verify` | 0.78 ms | 81.8 | 9.28 ms | 110.4 | 3.35 ms | 19.1 | 46.65 ms | 22.0 | Rayon parallel |
| `db_read_ocsp` | 0.10 ms | 627.6 | 0.82 ms | 1,256 | 0.10 ms | 648.1 | 1.34 ms | 766.8 | SQLite read |
## Results: Ed25519
**Configuration:** Ed25519 for both CA and subscriber keys.
| `ca_self_sign` | 0.06 ms | 17.7 | 0.13 ms | 8.0 | 0.08 ms | 13.1 | 0.08 ms | 11.8 | 1 cert/batch |
| `cert_gen` | 1.27 ms | 50.4 | 15.64 ms | 65.5 | 1.88 ms | 34.0 | 15.98 ms | 64.1 | Rayon parallel |
| `db_insert_certs` | 0.62 ms | 103.4 | 4.01 ms | 255.4 | 0.56 ms | 113.7 | 2.33 ms | 440.4 | SQLite WAL |
| `cert_verify` | 1.63 ms | 39.2 | 15.31 ms | 66.9 | 1.90 ms | 33.7 | 8.36 ms | 122.4 | Rayon parallel |
| `db_read_certs` | 0.19 ms | 337.4 | 2.13 ms | 480.1 | 0.17 ms | 385.3 | 0.68 ms | 1,515 | SQLite read |
| `crl_build` | 0.11 ms | 9.2 | 1.29 ms | 0.8 | 0.10 ms | 9.9 | 0.64 ms | 1.6 | 1 CRL covering N serials |
| `crl_sign` | 0.23 ms | 4.4 | 1.33 ms | 0.8 | 0.27 ms | 3.7 | 0.85 ms | 1.2 | 1 CRL/batch |
| `db_insert_crl` | 0.11 ms | 8.8 | 0.22 ms | 4.5 | 0.10 ms | 10.2 | 0.16 ms | 6.2 | SQLite WAL |
| `crl_verify` | 0.24 ms | 4.2 | 0.28 ms | 3.6 | 0.06 ms | 16.6 | 0.15 ms | 6.8 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 47.7 | 0.05 ms | 18.9 | 0.02 ms | 60.2 | 0.03 ms | 29.1 | SQLite read |
| `ocsp_build` | 0.23 ms | 276.6 | 0.93 ms | 1,095 | 0.17 ms | 369.8 | 0.46 ms | 2,249 | Rayon; TBS DER only |
| `ocsp_sign` | 0.34 ms | 189.2 | 7.02 ms | 146.0 | 0.80 ms | 79.9 | 11.01 ms | 93.0 | Rayon parallel |
| `db_insert_ocsp` | 0.43 ms | 148.8 | 4.17 ms | 245.8 | 0.30 ms | 210.3 | 2.04 ms | 502.1 | SQLite WAL |
| `ocsp_verify` | 1.29 ms | 49.7 | 15.09 ms | 67.9 | 1.86 ms | 34.4 | 8.12 ms | 126.1 | Rayon parallel |
| `db_read_ocsp` | 0.19 ms | 328.7 | 1.28 ms | 801.3 | 0.27 ms | 234.3 | 0.73 ms | 1,407 | SQLite read |
## Results: ML-DSA-44
**Configuration:** ML-DSA-44 for both CA and subscriber keys.
ML-DSA-44 certificates are ~4,069 bytes DER; OCSP responses with ML-DSA-44 signatures
are roughly 2,700 bytes each.
| `ca_self_sign` | 0.47 ms | 2.1 | 0.45 ms | 2.2 | 0.27 ms | 3.7 | 0.41 ms | 2.4 | 1 cert/batch |
| `cert_gen` | 7.54 ms | 8.5 | 71.43 ms | 14.3 | 11.63 ms | 5.5 | 198.11 ms | 5.2 | Rayon parallel |
| `db_insert_certs` | 2.91 ms | 22.0 | 32.42 ms | 31.6 | 1.16 ms | 55.0 | 27.04 ms | 37.9 | SQLite WAL |
| `cert_verify` | 0.87 ms | 73.6 | 15.11 ms | 67.8 | 0.94 ms | 68.1 | 15.21 ms | 67.3 | Rayon parallel |
| `db_read_certs` | 0.20 ms | 326.9 | 2.18 ms | 469.1 | 0.28 ms | 229.7 | 2.11 ms | 484.8 | SQLite read |
| `crl_build` | 0.07 ms | 14.1 | 0.62 ms | 1.6 | 0.10 ms | 9.8 | 0.62 ms | 1.6 | 1 CRL covering N serials |
| `crl_sign` | 0.75 ms | 1.3 | 1.16 ms | 0.9 | 0.81 ms | 1.2 | 1.15 ms | 0.9 | 1 CRL/batch |
| `db_insert_crl` | 0.41 ms | 2.5 | 2.86 ms | 0.3 | 0.09 ms | 11.7 | 1.09 ms | 0.9 | SQLite WAL |
| `crl_verify` | 0.11 ms | 8.8 | 0.18 ms | 5.5 | 0.13 ms | 7.9 | 0.30 ms | 3.3 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 42.8 | 0.03 ms | 29.0 | 0.02 ms | 52.8 | 0.04 ms | 26.4 | SQLite read |
| `ocsp_build` | 0.19 ms | 335.3 | 0.57 ms | 1,808 | 0.07 ms | 909.9 | 0.44 ms | 2,310 | Rayon; TBS DER only |
| `ocsp_sign` | 6.35 ms | 10.1 | 58.87 ms | 17.4 | 11.34 ms | 5.6 | 169.52 ms | 6.0 | Rayon parallel |
| `db_insert_ocsp` | 1.22 ms | 52.5 | 26.13 ms | 39.2 | 0.59 ms | 108.3 | 13.89 ms | 73.7 | SQLite WAL |
| `ocsp_verify` | 0.83 ms | 77.0 | 14.65 ms | 69.9 | 0.86 ms | 74.3 | 12.47 ms | 82.1 | Rayon parallel |
| `db_read_ocsp` | 0.14 ms | 467.2 | 1.29 ms | 792.2 | 0.11 ms | 593.5 | 1.16 ms | 880.3 | SQLite read |
## Results: ML-DSA-65
**Configuration:** ML-DSA-65 for both CA and subscriber keys.
ML-DSA-65 certificates are ~5,521 bytes DER; OCSP responses with ML-DSA-65 signatures
are roughly 3,700 bytes each, compared to ~400 bytes for ECDSA P-256.
| `ca_self_sign` | 0.94 ms | 1.1 | 0.71 ms | 1.4 | 0.71 ms | 1.4 | 0.85 ms | 1.2 | 1 cert/batch |
| `cert_gen` | 12.31 ms | 5.2 | 106.14 ms | 9.6 | 20.64 ms | 3.1 | 313.03 ms | 3.3 | Rayon parallel |
| `db_insert_certs` | 1.62 ms | 39.4 | 42.12 ms | 24.3 | 1.60 ms | 40.0 | 32.75 ms | 31.3 | SQLite WAL |
| `cert_verify` | 1.86 ms | 34.3 | 23.15 ms | 44.2 | 2.85 ms | 22.4 | 24.11 ms | 42.5 | Rayon parallel |
| `db_read_certs` | 0.27 ms | 234.0 | 3.48 ms | 294.2 | 0.57 ms | 113.2 | 2.79 ms | 366.8 | SQLite read |
| `crl_build` | 0.08 ms | 12.3 | 0.53 ms | 1.9 | 0.06 ms | 17.9 | 0.72 ms | 1.4 | 1 CRL covering N serials |
| `crl_sign` | 1.20 ms | 0.8 | 1.42 ms | 0.7 | 1.45 ms | 0.7 | 1.39 ms | 0.7 | 1 CRL/batch |
| `db_insert_crl` | 0.39 ms | 2.6 | 2.28 ms | 0.4 | 0.09 ms | 11.7 | 1.54 ms | 0.7 | SQLite WAL |
| `crl_verify` | 0.19 ms | 5.2 | 0.24 ms | 4.2 | 0.29 ms | 3.4 | 0.59 ms | 1.7 | 1 CRL/batch |
| `db_read_crl` | 0.03 ms | 37.3 | 0.03 ms | 29.1 | 0.04 ms | 25.9 | 0.06 ms | 17.0 | SQLite read |
| `ocsp_build` | 0.22 ms | 289.7 | 0.55 ms | 1,847 | 0.25 ms | 261.1 | 0.95 ms | 1,074 | Rayon; TBS DER only |
| `ocsp_sign` | 9.96 ms | 6.4 | 86.79 ms | 11.8 | 20.30 ms | 3.2 | 299.44 ms | 3.4 | Rayon parallel |
| `db_insert_ocsp` | 1.65 ms | 38.9 | 25.38 ms | 40.4 | 0.61 ms | 104.2 | 15.71 ms | 65.2 | SQLite WAL |
| `ocsp_verify` | 1.76 ms | 36.4 | 19.14 ms | 53.5 | 1.55 ms | 41.3 | 28.89 ms | 35.4 | Rayon parallel |
| `db_read_ocsp` | 0.21 ms | 303.4 | 1.27 ms | 806.7 | 0.16 ms | 407.4 | 1.19 ms | 861.3 | SQLite read |
## Results: RSA-2048
**Configuration:** RSA-2048 for both CA and subscriber keys.
`cert_gen` includes RSA key pair generation for each subscriber certificate
(~200–400 ms/key pair single-threaded), which dominates the batch time.
| `ca_self_sign` | 0.96 ms | 1.0 | 1.27 ms | 0.8 | 4.55 ms | 0.22 | 4.32 ms | 0.23 | 1 cert/batch |
| `cert_gen` | 449.02 ms | 0.1 | 9,097 ms | 0.1 | 588.26 ms | 0.11 | 8,427 ms | 0.12 | Rayon; incl. key gen |
| `db_insert_certs` | 0.38 ms | 169.3 | 23.02 ms | 44.5 | 0.59 ms | 109 | 15.73 ms | 65 | SQLite WAL |
| `cert_verify` | 0.29 ms | 221.7 | 3.37 ms | 303.6 | 1.53 ms | 42 | 11.75 ms | 87 | Rayon parallel |
| `db_read_certs` | 0.09 ms | 719.5 | 1.17 ms | 871.6 | 0.12 ms | 521 | 1.68 ms | 608 | SQLite read |
| `crl_build` | 0.04 ms | 25.7 | 1.24 ms | 0.8 | 0.07 ms | 13 | 1.11 ms | 0.90 | 1 CRL covering N serials |
| `crl_sign` | 0.52 ms | 1.9 | 2.02 ms | 0.5 | 2.65 ms | 0.38 | 4.02 ms | 0.25 | 1 CRL/batch |
| `db_insert_crl` | 0.08 ms | 12.2 | 1.17 ms | 0.9 | 0.08 ms | 12 | 1.18 ms | 0.85 | SQLite WAL |
| `crl_verify` | 0.02 ms | 46.3 | 0.14 ms | 7.2 | 0.08 ms | 12 | 0.12 ms | 8.5 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 59.9 | 0.06 ms | 16.0 | 0.03 ms | 35 | 0.05 ms | 20 | SQLite read |
| `ocsp_build` | 0.24 ms | 263.4 | 0.71 ms | 1,444 | 0.29 ms | 218 | 0.82 ms | 1,253 | Rayon; TBS DER only |
| `ocsp_sign` | 4.14 ms | 15.5 | 115.49 ms | 8.9 | 54.42 ms | 1.2 | 749.29 ms | 1.4 | Rayon parallel |
| `db_insert_ocsp` | 0.20 ms | 314.4 | 3.96 ms | 258.3 | 0.33 ms | 194 | 2.63 ms | 389 | SQLite WAL |
| `ocsp_verify` | 0.22 ms | 293.5 | 3.77 ms | 271.6 | 1.25 ms | 51 | 13.06 ms | 78 | Rayon parallel |
| `db_read_ocsp` | 0.12 ms | 523.4 | 1.14 ms | 895.0 | 0.17 ms | 371 | 1.73 ms | 593 | SQLite read |
## Results: RSA-3072
**Configuration:** RSA-3072 for both CA and subscriber keys.
RSA-3072 key generation takes roughly 4× longer per key than RSA-2048.
| `ca_self_sign` | 2.42 ms | 0.4 | 2.71 ms | 0.4 | 9.28 ms | 0.11 | 10.02 ms | 0.100 | 1 cert/batch |
| `cert_gen` | 2073.09 ms | 0.03 | 30,197 ms | 0.03 | 2,022 ms | 0.032 | 30,444 ms | 0.034 | Rayon; incl. key gen |
| `db_insert_certs` | 0.61 ms | 104.6 | 6.12 ms | 167.2 | 0.50 ms | 127 | 5.40 ms | 189 | SQLite WAL |
| `cert_verify` | 0.69 ms | 93.2 | 5.45 ms | 188.0 | 1.23 ms | 52 | 8.37 ms | 122 | Rayon parallel |
| `db_read_certs` | 0.21 ms | 301.2 | 1.03 ms | 994.0 | 0.18 ms | 360 | 0.94 ms | 1,088 | SQLite read |
| `crl_build` | 0.08 ms | 11.8 | 0.81 ms | 1.2 | 0.09 ms | 12 | 0.66 ms | 1.5 | 1 CRL covering N serials |
| `crl_sign` | 2.47 ms | 0.4 | 2.90 ms | 0.3 | 9.20 ms | 0.11 | 5.55 ms | 0.18 | 1 CRL/batch |
| `db_insert_crl` | 0.05 ms | 19.5 | 0.15 ms | 6.9 | 0.10 ms | 9.8 | 0.14 ms | 7.2 | SQLite WAL |
| `crl_verify` | 0.06 ms | 17.4 | 0.08 ms | 12.1 | 0.14 ms | 7.2 | 0.11 ms | 9.3 | 1 CRL/batch |
| `db_read_crl` | 0.04 ms | 22.2 | 0.03 ms | 34.9 | 0.04 ms | 26 | 0.03 ms | 32 | SQLite read |
| `ocsp_build` | 0.28 ms | 228.3 | 0.69 ms | 1,485 | 0.39 ms | 165 | 0.66 ms | 1,553 | Rayon; TBS DER only |
| `ocsp_sign` | 18.98 ms | 3.4 | 300.85 ms | 3.4 | 95.27 ms | 0.67 | 1,074 ms | 0.95 | Rayon parallel |
| `db_insert_ocsp` | 0.36 ms | 177.4 | 22.16 ms | 46.2 | 0.35 ms | 185 | 2.97 ms | 344 | SQLite WAL |
| `ocsp_verify` | 0.72 ms | 89.4 | 6.84 ms | 149.7 | 1.42 ms | 45 | 8.93 ms | 115 | Rayon parallel |
| `db_read_ocsp` | 0.10 ms | 657.0 | 1.27 ms | 806.9 | 0.13 ms | 496 | 0.83 ms | 1,237 | SQLite read |
## Results: RSA-4096
**Configuration:** RSA-4096 for both CA and subscriber keys.
RSA-4096 key generation dominates: ~1–2 s/key pair single-threaded.
| `ca_self_sign` | 4.24 ms | 0.2 | 4.47 ms | 0.2 | 13.71 ms | 0.073 | 15.20 ms | 0.066 | 1 cert/batch |
| `cert_gen` | 4,212 ms | 0.02 | 89,390 ms | 0.01 | 4,032 ms | 0.016 | 69,943 ms | 0.015 | Rayon; incl. key gen |
| `db_insert_certs` | 0.42 ms | 151.0 | 4.60 ms | 222.4 | 0.40 ms | 160 | 5.54 ms | 185 | SQLite WAL |
| `cert_verify` | 0.64 ms | 99.4 | 6.85 ms | 149.4 | 1.18 ms | 54 | 12.34 ms | 83 | Rayon parallel |
| `db_read_certs` | 0.09 ms | 692.9 | 0.84 ms | 1,225 | 0.11 ms | 601 | 1.05 ms | 978 | SQLite read |
| `crl_build` | 0.04 ms | 24.1 | 0.56 ms | 1.8 | 0.05 ms | 21 | 0.70 ms | 1.4 | 1 CRL covering N serials |
| `crl_sign` | 3.79 ms | 0.3 | 4.15 ms | 0.2 | 8.10 ms | 0.12 | 10.25 ms | 0.098 | 1 CRL/batch |
| `db_insert_crl` | 0.04 ms | 24.1 | 0.11 ms | 9.1 | 0.06 ms | 16 | 0.13 ms | 7.5 | SQLite WAL |
| `crl_verify` | 0.07 ms | 15.3 | 0.08 ms | 12.0 | 0.13 ms | 7.5 | 0.17 ms | 5.8 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 55.0 | 0.03 ms | 31.1 | 0.02 ms | 46 | 0.03 ms | 35 | SQLite read |
| `ocsp_build` | 0.22 ms | 287.8 | 0.60 ms | 1,699 | 0.29 ms | 223 | 0.60 ms | 1,709 | Rayon; TBS DER only |
| `ocsp_sign` | 37.26 ms | 1.7 | 667.60 ms | 1.5 | 111.96 ms | 0.57 | 2,253 ms | 0.45 | Rayon parallel |
| `db_insert_ocsp` | 0.34 ms | 186.4 | 5.11 ms | 200.2 | 0.27 ms | 240 | 3.74 ms | 274 | SQLite WAL |
| `ocsp_verify` | 0.99 ms | 64.3 | 12.59 ms | 81.3 | 1.20 ms | 53 | 16.42 ms | 62 | Rayon parallel |
| `db_read_ocsp` | 0.10 ms | 667.4 | 1.27 ms | 806.4 | 0.11 ms | 565 | 1.08 ms | 947 | SQLite read |
## Backend Comparison: OpenSSL vs NSS (batch=1024)
Signing and verification operations only. Database operations are identical
across backends. Ratio > 1 means NSS is slower; ratio < 1 means NSS is faster.
| ECDSA P-256 | `cert_gen` (sign) | 6.65 ms | 25.27 ms | **3.8× slower** |
| ECDSA P-256 | `cert_verify` | 8.47 ms | 45.57 ms | **5.4× slower** |
| ECDSA P-256 | `crl_sign` | 0.82 ms | 0.99 ms | 1.2× |
| ECDSA P-256 | `crl_verify` | 0.13 ms | 0.52 ms | **4.0× slower** |
| ECDSA P-256 | `ocsp_sign` | 4.11 ms | 21.77 ms | **5.3× slower** |
| ECDSA P-256 | `ocsp_verify` | 9.28 ms | 46.65 ms | **5.0× slower** |
| Ed25519 | `cert_gen` (sign) | 15.64 ms | 15.98 ms | ≈ equal |
| Ed25519 | `cert_verify` | 15.31 ms | 8.36 ms | **0.55× (NSS faster)** |
| Ed25519 | `crl_sign` | 1.33 ms | 0.85 ms | **0.64× (NSS faster)** |
| Ed25519 | `crl_verify` | 0.28 ms | 0.15 ms | **0.54× (NSS faster)** |
| Ed25519 | `ocsp_sign` | 7.02 ms | 11.01 ms | **1.6× slower** |
| Ed25519 | `ocsp_verify` | 15.09 ms | 8.12 ms | **0.54× (NSS faster)** |
| ML-DSA-44 | `cert_gen` (sign) | 71.43 ms | 198.11 ms | **2.8× slower** |
| ML-DSA-44 | `cert_verify` | 15.11 ms | 15.21 ms | ≈ equal |
| ML-DSA-44 | `ocsp_sign` | 58.87 ms | 169.52 ms | **2.9× slower** |
| ML-DSA-44 | `ocsp_verify` | 14.65 ms | 12.47 ms | **0.85× (NSS faster)** |
| ML-DSA-65 | `cert_gen` (sign) | 106.14 ms | 313.03 ms | **2.9× slower** |
| ML-DSA-65 | `cert_verify` | 23.15 ms | 24.11 ms | 1.0× |
| ML-DSA-65 | `ocsp_sign` | 86.79 ms | 299.44 ms | **3.5× slower** |
| ML-DSA-65 | `ocsp_verify` | 19.14 ms | 28.89 ms | **1.5× (NSS slower)** |
| RSA-2048 | `cert_gen` (incl. keygen) | 9,097 ms | 8,427 ms | ≈ equal |
| RSA-2048 | `cert_verify` | 3.37 ms | 11.75 ms | **3.5× slower** |
| RSA-2048 | `crl_sign` | 2.02 ms | 4.02 ms | **2.0× slower** |
| RSA-2048 | `crl_verify` | 0.14 ms | 0.12 ms | ≈ equal |
| RSA-2048 | `ocsp_sign` | 115.49 ms | 749.29 ms | **6.5× slower** |
| RSA-2048 | `ocsp_verify` | 3.77 ms | 13.06 ms | **3.5× slower** |
| RSA-3072 | `cert_gen` (incl. keygen) | 30,197 ms | 30,444 ms | ≈ equal |
| RSA-3072 | `cert_verify` | 5.45 ms | 8.37 ms | **1.5× slower** |
| RSA-3072 | `crl_sign` | 2.90 ms | 5.55 ms | **1.9× slower** |
| RSA-3072 | `crl_verify` | 0.08 ms | 0.11 ms | 1.4× |
| RSA-3072 | `ocsp_sign` | 300.85 ms | 1,074 ms | **3.6× slower** |
| RSA-3072 | `ocsp_verify` | 6.84 ms | 8.93 ms | 1.3× |
| RSA-4096 | `cert_gen` (incl. keygen) | 89,390 ms | 69,943 ms | **0.78× (NSS faster)** |
| RSA-4096 | `cert_verify` | 6.85 ms | 12.34 ms | **1.8× slower** |
| RSA-4096 | `crl_sign` | 4.15 ms | 10.25 ms | **2.5× slower** |
| RSA-4096 | `crl_verify` | 0.08 ms | 0.17 ms | 2.1× |
| RSA-4096 | `ocsp_sign` | 667.60 ms | 2,253 ms | **3.4× slower** |
| RSA-4096 | `ocsp_verify` | 12.59 ms | 16.42 ms | 1.3× |
## Algorithm Comparison: OpenSSL Backend (batch=1024)
`cert_gen` for RSA keys includes subscriber key pair generation and dominates;
all other algorithms generate keys at CA setup time only.
| `ca_self_sign` | 0.08 ms | 0.13 ms | 1.27 ms | 4.47 ms | 0.45 ms | 0.71 ms |
| `cert_gen` | 6.65 ms | 15.64 ms | 9,097 ms† | 89,390 ms† | 71.43 ms | 106.14 ms |
| `cert_verify` | 8.47 ms | 15.31 ms | 3.37 ms | 6.85 ms | 15.11 ms | 23.15 ms |
| `crl_sign` | 0.82 ms | 1.33 ms | 2.02 ms | 4.15 ms | 1.16 ms | 1.42 ms |
| `crl_verify` | 0.13 ms | 0.28 ms | 0.14 ms | 0.08 ms | 0.18 ms | 0.24 ms |
| `ocsp_sign` | 4.11 ms | 7.02 ms | 115.49 ms | 667.60 ms | 58.87 ms | 86.79 ms |
| `ocsp_verify` | 9.28 ms | 15.09 ms | 3.77 ms | 12.59 ms | 14.65 ms | 19.14 ms |
| `db_insert_certs` | 4.04 ms | 4.01 ms | 23.02 ms | 4.60 ms | 32.42 ms | 42.12 ms |
| `db_insert_ocsp` | 3.29 ms | 4.17 ms | 3.96 ms | 5.11 ms | 26.13 ms | 25.38 ms |
| `ocsp_build` | 1.20 ms | 0.93 ms | 0.71 ms | 0.60 ms | 0.57 ms | 0.55 ms |
† RSA `cert_gen` includes RSA key pair generation per subscriber certificate.
## Migration Impact: rust-openssl fork → native-ossl (OpenSSL backend, batch=1024)
Baseline collected at commit `c42c2f8` (last commit on the rust-openssl fork,
2026-04-17) using the same `--min-seconds 20` methodology. The fork used a
PQC-patched rust-openssl crate (`github.com/abbra/rust-openssl`, branch
`pqc-prs`) that bundled a custom OpenSSL build with ML-DSA (Dilithium) support
compiled into it. The current native-ossl crate links against the system
OpenSSL 3.x, which is built with the distribution's compiler flags. Current
figures also include the `alg_cache` and `BackendPublicKey` pkey cache
optimisations. RSA `cert_gen` (dominated by key generation, ±30% thermal
variance) is excluded.
| ECDSA P-256 | `cert_gen` | 6.77 ms | 6.65 ms | ≈0% |
| ECDSA P-256 | `cert_verify` | 9.96 ms | 8.47 ms | **−15%** |
| ECDSA P-256 | `ocsp_sign` | 3.58 ms | 4.11 ms | +15% |
| ECDSA P-256 | `ocsp_verify` | 9.77 ms | 9.28 ms | −5% |
| Ed25519 | `cert_gen` | 19.85 ms | 15.64 ms | **−21%** |
| Ed25519 | `cert_verify` | 17.94 ms | 15.31 ms | **−15%** |
| Ed25519 | `ocsp_sign` | 7.36 ms | 7.02 ms | −5% |
| Ed25519 | `ocsp_verify` | 17.67 ms | 15.09 ms | **−15%** |
| ML-DSA-44 | `cert_gen` | 80.05 ms | 71.43 ms | **−11%** |
| ML-DSA-44 | `cert_verify` | 12.60 ms | 15.11 ms | +20% |
| ML-DSA-44 | `ocsp_sign` | 62.90 ms | 58.87 ms | **−6%** |
| ML-DSA-44 | `ocsp_verify` | 12.37 ms | 14.65 ms | +18% |
| ML-DSA-65 | `cert_gen` | 186.05 ms | 106.14 ms | **−43%** |
| ML-DSA-65 | `cert_verify` | 31.03 ms | 23.15 ms | **−25%** |
| ML-DSA-65 | `ocsp_sign` | 150.16 ms | 86.79 ms | **−42%** |
| ML-DSA-65 | `ocsp_verify` | 30.33 ms | 19.14 ms | **−37%** |
| RSA-2048 | `cert_verify` | 5.93 ms | 3.37 ms | **−43%** |
| RSA-2048 | `ocsp_sign` | 115.19 ms | 115.49 ms | ≈0% |
| RSA-2048 | `ocsp_verify` | 6.17 ms | 3.77 ms | **−39%** |
| RSA-3072 | `cert_verify` | 6.64 ms | 5.45 ms | **−18%** |
| RSA-3072 | `ocsp_sign` | 262.44 ms | 300.85 ms | **+15% regression** |
| RSA-3072 | `ocsp_verify` | 7.23 ms | 6.84 ms | −5% |
| RSA-4096 | `cert_verify` | 8.75 ms | 6.85 ms | **−22%** |
| RSA-4096 | `ocsp_sign` | 570.83 ms | 667.60 ms | **+17% regression** |
| RSA-4096 | `ocsp_verify` | 9.59 ms | 12.59 ms | **+31% regression** |
**Key observations:**
- **Ed25519 and ECDSA P-256 verification** improve under native-ossl + pkey
cache. Ed25519 `cert_gen` is 21% faster; `cert_verify` is 15% faster. The
pkey cache eliminates the repeated `d2i_PUBKEY` round-trip in the Rayon
parallel verification loop.
- **ML-DSA signing improves** after the `sign_into` fix: ML-DSA-44 `cert_gen` is
−11% faster than the fork baseline; ML-DSA-65 `cert_gen` is −43% faster. The
root cause of the former regression was that OpenSSL 3.5's `EVP_DigestSign` for
ML-DSA, when called with a NULL output pointer (as in `sign_oneshot`'s size
query), runs the full signing computation rather than returning the fixed output
length. Both the rust-openssl fork and native-ossl link the same system
libcrypto; the difference was purely in the Rust binding layer. The fix uses
FIPS 204 fixed lengths (2 420 B / 3 309 B / 4 627 B) to pre-allocate the output
buffer and calls `EVP_DigestSign` only once.
- **ML-DSA verification improves with `MessageVerifier`**: ML-DSA-65 `cert_verify`
is −25% and `ocsp_verify` is −37% vs the fork baseline. ML-DSA-44 verification
shows higher batch-to-batch variance and mixed results (cert_verify +20%,
ocsp_verify +18% vs fork) — the thermal sensitivity of the 1024-item Rayon
workload makes these numbers less stable than the signing figures.
- **RSA private-key operations (sign) regress slightly** for RSA-3072 and
RSA-4096 (`ocsp_sign` +15–17%). RSA-2048 `ocsp_sign` is unchanged (~115 ms).
The overhead scales with key size rather than being a fixed per-call cost.
- **RSA verification improves substantially** (−18% to −43%) due to the pkey
cache removing `d2i_PUBKEY` from the parallel verification hot path. RSA public
verification (e=65537, 17 squarings) is fast enough (~3–9 µs/cert) that the
former re-parse was a dominant fraction of per-call cost.
- **RSA-4096 `ocsp_verify` regresses 31%** under native-ossl despite the pkey
cache improvement that is visible in `cert_verify` (−22%). The divergence
between the two verification paths for RSA-4096 is not yet explained.
## Analysis
### Backend: OpenSSL vs NSS
**NSS signing overhead** is significant across all algorithms. The NSS backend routes
every signing operation through the PKCS#11 interface via `SEC_SignData` (RSA, ECDSA,
ML-DSA) or `PK11_Sign` (Ed25519), which includes per-operation token lookup and
mechanism dispatch. For ECDSA P-256, this adds roughly 3.8–5.3× overhead over
OpenSSL's direct `EVP_DigestSign` path. For ML-DSA-44/65, the signing overhead is
1.8–2.0× — smaller in relative terms because ML-DSA signing itself is expensive.
**Ed25519 and ML-DSA verification favour NSS**: `cert_verify` and `ocsp_verify` are
*faster* under NSS for Ed25519 (NSS ~1.8× faster) and ML-DSA-44 (NSS ~1.4–1.5×
faster). NSS verifies Ed25519 via `PK11_Verify`, which dispatches directly to the
softokn `CKM_EDDSA` mechanism. ML-DSA verification under NSS still outpaces OpenSSL,
though the gap has narrowed significantly compared to before the public-key parse
cache was introduced (see below).
**The pkey cache narrows the OpenSSL/NSS gap for ML-DSA verification.** Before the
`BackendPublicKey` cache, OpenSSL called `d2i_PUBKEY` on every verification, parsing
the 1 344-byte (ML-DSA-44) or 1 952-byte (ML-DSA-65) SPKI DER for each of the 1 024
parallel items. With the cache, the parsed `EVP_PKEY` handle is cloned via
`EVP_PKEY_up_ref` (one atomic refcount) on each call. ML-DSA-44 `ocsp_verify` at
batch=1 024 improved by ~41% (30 ms → 17.8 ms); `cert_verify` improved by ~21%
(29.7 ms → 23.4 ms). The same cache also improved RSA-3072 `cert_verify` by ~49%
(10.6 ms → 5.5 ms) — because RSA public verification (e=65537, 17 squarings) is fast
enough that the former re-parse represented a large fraction of total call time. NSS
presumably keeps its own parsed key handle internally, so these improvements bring the
two backends closer together.
**ECDSA P-256 verification** is substantially slower under NSS (5.0–5.4× at
batch=1024). `VFY_VerifyDataWithAlgorithmID` routes through the PKCS#11
`CKF_VERIFY` path, adding per-verification overhead compared to OpenSSL's
direct EVP layer.
**The x509bench signing overhead** for NSS reflects per-certificate signer
initialization: each `cert_gen` task imports the private key via
`PK11_ImportDERPrivateKeyInfoAndReturnKey` before signing. Reusing a single
`NssSigner` across multiple certificates in the same batch would eliminate this
overhead. The signing operations themselves (ECDSA P-256 at ~4 µs/sign,
ML-DSA-65 at ~90 µs/sign) are comparable between backends; the extra latency
is PKCS#11 setup cost, not cryptographic computation.
### OpenSSL Backend: ECDSA P-256
`cert_gen` and `ocsp_sign` use Rayon parallel iteration across all logical
cores. Throughput rises from 124.4 K/s to 154.0 K/s between batch=64 and
batch=1024 as the thread pool becomes more fully saturated. `cert_verify`
shows a similar pattern (90.4 → 120.8 K/s).
`crl_build` and `crl_sign` always cover exactly one CRL per batch, regardless
of batch size. The CRL TBS DER grows proportionally with the number of revoked
serial entries, so throughput falls from 13.4 K/s (batch=64) to 1.2 K/s
(batch=1024) — all growth is in DER encoding plus P-256 signing of the larger
TBS blob.
`ocsp_build` (TBS DER construction only) reaches 856 K/s at batch=1024 as
full Rayon parallelism is achieved. `ocsp_verify` at batch=1024 reaches
110.4 K/s.
SQLite inserts use a single `prepare()` before the transaction loop so the
SQL parse cost is paid once per batch rather than once per row.
### OpenSSL Backend: Ed25519
Ed25519 signing throughput at batch=1024 (65.5 K/s) is similar to ECDSA P-256
(154.0 K/s when accounting for the additional subscriber key generation overhead
in Ed25519). Both are one-shot algorithms with no pre-hash step.
### OpenSSL Backend: ML-DSA-44 and ML-DSA-65
**Signing is the dominant cost.** `cert_gen` at batch=1024 is 10.7× slower for
ML-DSA-44 (71.43 ms vs 6.65 ms for ECDSA P-256) and 15.9× slower for ML-DSA-65
(106.14 ms). ML-DSA signing involves large polynomial matrix operations that stress
the L2/L3 cache, limiting effective Rayon parallelism across 16 cores. The current
figures reflect all four optimisations (see introduction): the `sign_into` fast path
(single `EVP_DigestSign` call with FIPS 204 pre-allocated buffer) is used for the
common case; the `MessageSigner::sign_oneshot` path (via `EVP_PKEY_sign_message_init`)
is available when a FIPS 204 §5.2 context string is set but is ~13–21% slower due
to internal update+final dispatch and is not exercised by the bench.
**`ocsp_sign` is the most expensive Rayon operation**: 58.87 ms (ML-DSA-44) and
86.79 ms (ML-DSA-65) at batch=1024, vs 4.11 ms for ECDSA P-256. Each OCSP response
requires one ML-DSA signing operation, and 1024 parallel signs saturate cache
heavily.
**Verification** at batch=1024 uses `MessageVerifier` (`EVP_PKEY_sign_message_init`
+ `EVP_PKEY_verify_message`), which eliminates the MD dispatch layer for ML-DSA's
no-pre-hash algorithm. ML-DSA-65 `cert_verify` is 23.15 ms (−25% vs the
rust-openssl fork baseline of 31.03 ms) and `ocsp_verify` is 19.14 ms (−37% vs
30.33 ms). ML-DSA-44 verification numbers show higher run-to-run variance at these
batch sizes: `cert_verify` is 15.11 ms and `ocsp_verify` is 14.65 ms.
**`ocsp_build`** (pure DER encoding, no crypto) is similarly fast for all algorithms
(0.60–1.20 ms at batch=1024) because synta's encoder splices the ML-DSA signature
BIT STRING as a zero-copy `BitStringRef` slice.
**Database throughput is I/O-bound.** ML-DSA-44 certificates are ~4 KB and
ML-DSA-65 certificates are ~5.5 KB each, vs ~700 bytes for ECDSA P-256.
SQLite WAL write time scales roughly with byte volume.
### OpenSSL Backend: RSA-2048, RSA-3072, RSA-4096
**RSA `cert_gen` is dominated by key pair generation**, not by signing. Each
subscriber certificate requires a fresh RSA key pair: ~200–400 ms for RSA-2048,
~1–2 s for RSA-3072, and ~2–4 s for RSA-4096, single-threaded. Rayon parallelizes
across 16 cores, but the absolute batch times remain extreme (9.1 s, 30.2 s, and
89.4 s at batch=1024 for RSA-2048/3072/4096 respectively). These numbers are not
comparable to other algorithms for signing performance — they measure key generation
speed. RSA key generation also drives significant thermal load; observed times can
vary by ±30% across runs depending on sustained CPU frequency.
**RSA verification is fast** due to the small public exponent (e=65537). `cert_verify`
at batch=1024 is 3.37 ms for RSA-2048 and 6.85 ms for RSA-4096 — faster than
ECDSA P-256 (8.47 ms). The single modular exponentiation with e=65537 (17 squarings)
is much cheaper than the ECDSA scalar point multiplication.
**The pkey cache had the largest relative impact on RSA verification.** RSA-3072
`cert_verify` improved by ~49% (10.6 ms → 5.45 ms) and RSA-4096 by ~26%
(9.31 ms → 6.85 ms). Because RSA public verification (fast, ~3–7 µs/cert) is
quick relative to ML-DSA, the former `d2i_PUBKEY` overhead represented a large
fraction of total call time.
**RSA `ocsp_sign` is the most expensive non-keygen operation.** Each OCSP response
requires one RSA private-key operation (full modular exponentiation with the private
exponent d). At batch=1024, `ocsp_sign` takes 115 ms (RSA-2048), 301 ms (RSA-3072),
and 668 ms (RSA-4096). The cost grows roughly as O(key_bits²·³) — consistent with
the ~5.8× increase from RSA-2048 to RSA-4096.
**NSS overhead is highly asymmetric for RSA.** For RSA-2048, `ocsp_sign` is 6.5×
slower under NSS (749 ms vs 115 ms). The private-key operation itself takes only
~80 µs at 2048 bits, so the PKCS#11 per-call setup cost — token lookup, mechanism
dispatch, `C_Sign` call — represents a large fraction of the total. At RSA-4096,
where the private-key operation takes ~800 µs, the same PKCS#11 overhead is
proportionally smaller, reducing the ratio to 3.4× (2,253 ms vs 668 ms).
**NSS `cert_gen` (including key generation) is unexpectedly comparable or faster**
for RSA-3072 and RSA-4096. RSA key generation uses OpenSSL directly (not routed
through `NssSigner`), so the backend choice does not affect key generation time.
The minor variance reflects Rayon scheduling randomness across long-running tasks
— not a real backend difference.
**Database performance scales with DER blob size.** RSA-2048 certificates are ~800
bytes (smaller than ML-DSA-44), so `db_insert_certs` is fast. RSA-4096 certificates
are ~1.7 KB. OCSP response DER is smaller for RSA (~300 bytes) than for
ML-DSA-65 (~3.7 KB), so `db_insert_ocsp` is comparable to ECDSA P-256 for RSA.