synta 0.1.9

ASN.1 parser, decoder, and encoder library with DER/BER support and C FFI
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
# X.509 PKI Pipeline (x509bench)

End-to-end X.509 PKI pipeline: CA self-signing, subscriber certificate
issuance (parallel), CRL construction and signing, OCSP response
construction and signing (parallel), database persistence, and signature
verification. A fresh ephemeral CA is generated for each batch size so
serial numbers always start at 1.

All results are from: Lenovo ThinkPad P1 Gen 5, 12th Gen i7-12800H, 64 GB RAM,
Linux 6.15.8-200.fc42.x86_64. Release build, mimalloc global allocator, Rayon
thread pool (16 logical cores).

All OpenSSL backend figures were re-run on 2026-04-17 after four successive
optimisations to the OpenSSL backend:

1. **Algorithm handle cache (`alg_cache`)** — replaces per-call
   `EVP_MD_fetch`/`EVP_CIPHER_fetch` with a single atomic refcount increment
   on all calls after the first.
2. **Public-key parse cache (`BackendPublicKey`)**`from_der` / `from_pem`
   now cache the parsed `EVP_PKEY` handle alongside the SPKI DER bytes.
   `verify_signature` reuses the cached handle (O(1) `EVP_PKEY_up_ref`) rather
   than calling `d2i_PUBKEY` on every item in the verification loop.  The bench
   `PrivateCA` was also updated to store a pre-initialised `BackendPublicKey`
   so all 1 024 parallel `cert_verify` / `ocsp_verify` calls share a single
   parsed key.
3. **Single-shot ML-DSA signing (`sign_into`)** — OpenSSL 3.5's
   `EVP_DigestSign(ctx, NULL, &siglen, data, len)` for ML-DSA runs the full
   signing computation (it does not merely return the fixed output length).
   The original `Signer::sign_oneshot` called `EVP_DigestSign` twice — once
   with a null output pointer for the size query and once with the actual
   buffer — doubling every ML-DSA signing operation.  A new
   `Signer::sign_into(data, buf)` method in native-ossl calls `EVP_DigestSign`
   once with a pre-allocated buffer sized by the FIPS 204 fixed lengths
   (2 420 B for ML-DSA-44, 3 309 B for ML-DSA-65, 4 627 B for ML-DSA-87),
   eliminating the redundant computation.  ML-DSA-44 `cert_gen` improves by
   ~34% and ML-DSA-65 by ~20%.
4. **`MessageVerifier` for ML-DSA verification** — replaces the generic
   `Verifier::verify_oneshot` (which uses `EVP_DigestVerify`) with
   `MessageVerifier::verify` (`EVP_PKEY_sign_message_init` +
   `EVP_PKEY_verify_message`).  This eliminates the MD dispatch layer for
   ML-DSA, which is a no-pre-hash algorithm.  ML-DSA-65 `ocsp_verify`
   at batch=1 024 improves by ~48% (36.57 ms → 19.14 ms); `cert_verify`
   improves by ~27% (31.75 ms → 23.15 ms).  ML-DSA-44 verification shows
   higher run-to-run variance at these batch sizes.

NSS figures are from 2026-04-09 (neither cache affects the NSS backend).

## Running the Benchmark

```bash
# OpenSSL backend (default)
cargo build --release -p synta-bench --features bench-x509-sqlite --bin x509bench

./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ecdsa-p256 --min-seconds 20 --db x509bench-openssl-ecdsa-p256.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ed25519   --min-seconds 20 --db x509bench-openssl-ed25519.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-44 --min-seconds 20 --db x509bench-openssl-ml-dsa-44.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-65 --min-seconds 20 --db x509bench-openssl-ml-dsa-65.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa2048  --min-seconds 20 --db x509bench-openssl-rsa2048.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa3072  --min-seconds 20 --db x509bench-openssl-rsa3072.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa4096  --min-seconds 20 --db x509bench-openssl-rsa4096.sqlite

# NSS backend (cert/CRL/OCSP signing and verification route through NSS)
cargo build --release -p synta-bench --features bench-x509-sqlite-nss --bin x509bench

./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ecdsa-p256 --min-seconds 20 --db x509bench-nss-ecdsa-p256.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ed25519   --min-seconds 20 --db x509bench-nss-ed25519.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-44 --min-seconds 20 --db x509bench-nss-ml-dsa-44.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo ml-dsa-65 --min-seconds 20 --db x509bench-nss-ml-dsa-65.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa2048  --min-seconds 20 --db x509bench-nss-rsa2048.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa3072  --min-seconds 20 --db x509bench-nss-rsa3072.sqlite
./target/release/x509bench bench --sizes 1,2,4,8,16,32,64,128,256,512,1024 --ca-key-algo rsa4096  --min-seconds 20 --db x509bench-nss-rsa4096.sqlite
```

Each table below shows both the OpenSSL and NSS backends side by side at
batch=64 and batch=1024. Database operations (insert/read) are unaffected by
backend choice and are shown for completeness. Throughput is items/second ÷ 1000.

## Results: ECDSA P-256

**Configuration:** ECDSA P-256 for both CA and subscriber keys.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 0.05 ms | 21.5 | 0.08 ms | 13.3 | 0.19 ms | 5.2 | 0.26 ms | 3.9 | 1 cert/batch |
| `cert_gen` | 0.51 ms | 124.4 | 6.65 ms | 154.0 | 1.86 ms | 34.5 | 25.27 ms | 40.5 | Rayon parallel |
| `db_insert_certs` | 0.71 ms | 90.5 | 4.04 ms | 253.4 | 0.18 ms | 365.5 | 2.56 ms | 399.4 | SQLite WAL |
| `cert_verify` | 0.71 ms | 90.4 | 8.47 ms | 120.8 | 2.77 ms | 23.1 | 45.57 ms | 22.5 | Rayon parallel |
| `db_read_certs` | 0.08 ms | 763.1 | 0.82 ms | 1,242 | 0.07 ms | 937.0 | 0.79 ms | 1,299 | SQLite read |
| `crl_build` | 0.04 ms | 25.1 | 0.72 ms | 1.4 | 0.05 ms | 20.4 | 0.70 ms | 1.4 | 1 CRL covering N serials |
| `crl_sign` | 0.07 ms | 13.4 | 0.82 ms | 1.2 | 0.23 ms | 4.4 | 0.99 ms | 1.0 | 1 CRL/batch |
| `db_insert_crl` | 0.07 ms | 14.3 | 0.18 ms | 5.5 | 0.05 ms | 18.7 | 0.20 ms | 5.0 | SQLite WAL |
| `crl_verify` | 0.07 ms | 14.1 | 0.13 ms | 7.7 | 0.43 ms | 2.3 | 0.52 ms | 1.9 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 54.5 | 0.04 ms | 26.2 | 0.02 ms | 54.1 | 0.03 ms | 31.0 | SQLite read |
| `ocsp_build` | 0.12 ms | 544.5 | 1.20 ms | 856.3 | 0.10 ms | 628.3 | 0.61 ms | 1,685 | Rayon; TBS DER only |
| `ocsp_sign` | 0.45 ms | 143.0 | 4.11 ms | 249.4 | 1.61 ms | 39.6 | 21.77 ms | 47.0 | Rayon parallel |
| `db_insert_ocsp` | 0.38 ms | 167.4 | 3.29 ms | 311.6 | 0.18 ms | 360.5 | 2.46 ms | 416.0 | SQLite WAL |
| `ocsp_verify` | 0.78 ms | 81.8 | 9.28 ms | 110.4 | 3.35 ms | 19.1 | 46.65 ms | 22.0 | Rayon parallel |
| `db_read_ocsp` | 0.10 ms | 627.6 | 0.82 ms | 1,256 | 0.10 ms | 648.1 | 1.34 ms | 766.8 | SQLite read |

## Results: Ed25519

**Configuration:** Ed25519 for both CA and subscriber keys.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 0.06 ms | 17.7 | 0.13 ms | 8.0 | 0.08 ms | 13.1 | 0.08 ms | 11.8 | 1 cert/batch |
| `cert_gen` | 1.27 ms | 50.4 | 15.64 ms | 65.5 | 1.88 ms | 34.0 | 15.98 ms | 64.1 | Rayon parallel |
| `db_insert_certs` | 0.62 ms | 103.4 | 4.01 ms | 255.4 | 0.56 ms | 113.7 | 2.33 ms | 440.4 | SQLite WAL |
| `cert_verify` | 1.63 ms | 39.2 | 15.31 ms | 66.9 | 1.90 ms | 33.7 | 8.36 ms | 122.4 | Rayon parallel |
| `db_read_certs` | 0.19 ms | 337.4 | 2.13 ms | 480.1 | 0.17 ms | 385.3 | 0.68 ms | 1,515 | SQLite read |
| `crl_build` | 0.11 ms | 9.2 | 1.29 ms | 0.8 | 0.10 ms | 9.9 | 0.64 ms | 1.6 | 1 CRL covering N serials |
| `crl_sign` | 0.23 ms | 4.4 | 1.33 ms | 0.8 | 0.27 ms | 3.7 | 0.85 ms | 1.2 | 1 CRL/batch |
| `db_insert_crl` | 0.11 ms | 8.8 | 0.22 ms | 4.5 | 0.10 ms | 10.2 | 0.16 ms | 6.2 | SQLite WAL |
| `crl_verify` | 0.24 ms | 4.2 | 0.28 ms | 3.6 | 0.06 ms | 16.6 | 0.15 ms | 6.8 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 47.7 | 0.05 ms | 18.9 | 0.02 ms | 60.2 | 0.03 ms | 29.1 | SQLite read |
| `ocsp_build` | 0.23 ms | 276.6 | 0.93 ms | 1,095 | 0.17 ms | 369.8 | 0.46 ms | 2,249 | Rayon; TBS DER only |
| `ocsp_sign` | 0.34 ms | 189.2 | 7.02 ms | 146.0 | 0.80 ms | 79.9 | 11.01 ms | 93.0 | Rayon parallel |
| `db_insert_ocsp` | 0.43 ms | 148.8 | 4.17 ms | 245.8 | 0.30 ms | 210.3 | 2.04 ms | 502.1 | SQLite WAL |
| `ocsp_verify` | 1.29 ms | 49.7 | 15.09 ms | 67.9 | 1.86 ms | 34.4 | 8.12 ms | 126.1 | Rayon parallel |
| `db_read_ocsp` | 0.19 ms | 328.7 | 1.28 ms | 801.3 | 0.27 ms | 234.3 | 0.73 ms | 1,407 | SQLite read |

## Results: ML-DSA-44

**Configuration:** ML-DSA-44 for both CA and subscriber keys.
ML-DSA-44 certificates are ~4,069 bytes DER; OCSP responses with ML-DSA-44 signatures
are roughly 2,700 bytes each.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 0.47 ms | 2.1 | 0.45 ms | 2.2 | 0.27 ms | 3.7 | 0.41 ms | 2.4 | 1 cert/batch |
| `cert_gen` | 7.54 ms | 8.5 | 71.43 ms | 14.3 | 11.63 ms | 5.5 | 198.11 ms | 5.2 | Rayon parallel |
| `db_insert_certs` | 2.91 ms | 22.0 | 32.42 ms | 31.6 | 1.16 ms | 55.0 | 27.04 ms | 37.9 | SQLite WAL |
| `cert_verify` | 0.87 ms | 73.6 | 15.11 ms | 67.8 | 0.94 ms | 68.1 | 15.21 ms | 67.3 | Rayon parallel |
| `db_read_certs` | 0.20 ms | 326.9 | 2.18 ms | 469.1 | 0.28 ms | 229.7 | 2.11 ms | 484.8 | SQLite read |
| `crl_build` | 0.07 ms | 14.1 | 0.62 ms | 1.6 | 0.10 ms | 9.8 | 0.62 ms | 1.6 | 1 CRL covering N serials |
| `crl_sign` | 0.75 ms | 1.3 | 1.16 ms | 0.9 | 0.81 ms | 1.2 | 1.15 ms | 0.9 | 1 CRL/batch |
| `db_insert_crl` | 0.41 ms | 2.5 | 2.86 ms | 0.3 | 0.09 ms | 11.7 | 1.09 ms | 0.9 | SQLite WAL |
| `crl_verify` | 0.11 ms | 8.8 | 0.18 ms | 5.5 | 0.13 ms | 7.9 | 0.30 ms | 3.3 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 42.8 | 0.03 ms | 29.0 | 0.02 ms | 52.8 | 0.04 ms | 26.4 | SQLite read |
| `ocsp_build` | 0.19 ms | 335.3 | 0.57 ms | 1,808 | 0.07 ms | 909.9 | 0.44 ms | 2,310 | Rayon; TBS DER only |
| `ocsp_sign` | 6.35 ms | 10.1 | 58.87 ms | 17.4 | 11.34 ms | 5.6 | 169.52 ms | 6.0 | Rayon parallel |
| `db_insert_ocsp` | 1.22 ms | 52.5 | 26.13 ms | 39.2 | 0.59 ms | 108.3 | 13.89 ms | 73.7 | SQLite WAL |
| `ocsp_verify` | 0.83 ms | 77.0 | 14.65 ms | 69.9 | 0.86 ms | 74.3 | 12.47 ms | 82.1 | Rayon parallel |
| `db_read_ocsp` | 0.14 ms | 467.2 | 1.29 ms | 792.2 | 0.11 ms | 593.5 | 1.16 ms | 880.3 | SQLite read |

## Results: ML-DSA-65

**Configuration:** ML-DSA-65 for both CA and subscriber keys.
ML-DSA-65 certificates are ~5,521 bytes DER; OCSP responses with ML-DSA-65 signatures
are roughly 3,700 bytes each, compared to ~400 bytes for ECDSA P-256.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 0.94 ms | 1.1 | 0.71 ms | 1.4 | 0.71 ms | 1.4 | 0.85 ms | 1.2 | 1 cert/batch |
| `cert_gen` | 12.31 ms | 5.2 | 106.14 ms | 9.6 | 20.64 ms | 3.1 | 313.03 ms | 3.3 | Rayon parallel |
| `db_insert_certs` | 1.62 ms | 39.4 | 42.12 ms | 24.3 | 1.60 ms | 40.0 | 32.75 ms | 31.3 | SQLite WAL |
| `cert_verify` | 1.86 ms | 34.3 | 23.15 ms | 44.2 | 2.85 ms | 22.4 | 24.11 ms | 42.5 | Rayon parallel |
| `db_read_certs` | 0.27 ms | 234.0 | 3.48 ms | 294.2 | 0.57 ms | 113.2 | 2.79 ms | 366.8 | SQLite read |
| `crl_build` | 0.08 ms | 12.3 | 0.53 ms | 1.9 | 0.06 ms | 17.9 | 0.72 ms | 1.4 | 1 CRL covering N serials |
| `crl_sign` | 1.20 ms | 0.8 | 1.42 ms | 0.7 | 1.45 ms | 0.7 | 1.39 ms | 0.7 | 1 CRL/batch |
| `db_insert_crl` | 0.39 ms | 2.6 | 2.28 ms | 0.4 | 0.09 ms | 11.7 | 1.54 ms | 0.7 | SQLite WAL |
| `crl_verify` | 0.19 ms | 5.2 | 0.24 ms | 4.2 | 0.29 ms | 3.4 | 0.59 ms | 1.7 | 1 CRL/batch |
| `db_read_crl` | 0.03 ms | 37.3 | 0.03 ms | 29.1 | 0.04 ms | 25.9 | 0.06 ms | 17.0 | SQLite read |
| `ocsp_build` | 0.22 ms | 289.7 | 0.55 ms | 1,847 | 0.25 ms | 261.1 | 0.95 ms | 1,074 | Rayon; TBS DER only |
| `ocsp_sign` | 9.96 ms | 6.4 | 86.79 ms | 11.8 | 20.30 ms | 3.2 | 299.44 ms | 3.4 | Rayon parallel |
| `db_insert_ocsp` | 1.65 ms | 38.9 | 25.38 ms | 40.4 | 0.61 ms | 104.2 | 15.71 ms | 65.2 | SQLite WAL |
| `ocsp_verify` | 1.76 ms | 36.4 | 19.14 ms | 53.5 | 1.55 ms | 41.3 | 28.89 ms | 35.4 | Rayon parallel |
| `db_read_ocsp` | 0.21 ms | 303.4 | 1.27 ms | 806.7 | 0.16 ms | 407.4 | 1.19 ms | 861.3 | SQLite read |

## Results: RSA-2048

**Configuration:** RSA-2048 for both CA and subscriber keys.
`cert_gen` includes RSA key pair generation for each subscriber certificate
(~200–400 ms/key pair single-threaded), which dominates the batch time.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 0.96 ms | 1.0 | 1.27 ms | 0.8 | 4.55 ms | 0.22 | 4.32 ms | 0.23 | 1 cert/batch |
| `cert_gen` | 449.02 ms | 0.1 | 9,097 ms | 0.1 | 588.26 ms | 0.11 | 8,427 ms | 0.12 | Rayon; incl. key gen |
| `db_insert_certs` | 0.38 ms | 169.3 | 23.02 ms | 44.5 | 0.59 ms | 109 | 15.73 ms | 65 | SQLite WAL |
| `cert_verify` | 0.29 ms | 221.7 | 3.37 ms | 303.6 | 1.53 ms | 42 | 11.75 ms | 87 | Rayon parallel |
| `db_read_certs` | 0.09 ms | 719.5 | 1.17 ms | 871.6 | 0.12 ms | 521 | 1.68 ms | 608 | SQLite read |
| `crl_build` | 0.04 ms | 25.7 | 1.24 ms | 0.8 | 0.07 ms | 13 | 1.11 ms | 0.90 | 1 CRL covering N serials |
| `crl_sign` | 0.52 ms | 1.9 | 2.02 ms | 0.5 | 2.65 ms | 0.38 | 4.02 ms | 0.25 | 1 CRL/batch |
| `db_insert_crl` | 0.08 ms | 12.2 | 1.17 ms | 0.9 | 0.08 ms | 12 | 1.18 ms | 0.85 | SQLite WAL |
| `crl_verify` | 0.02 ms | 46.3 | 0.14 ms | 7.2 | 0.08 ms | 12 | 0.12 ms | 8.5 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 59.9 | 0.06 ms | 16.0 | 0.03 ms | 35 | 0.05 ms | 20 | SQLite read |
| `ocsp_build` | 0.24 ms | 263.4 | 0.71 ms | 1,444 | 0.29 ms | 218 | 0.82 ms | 1,253 | Rayon; TBS DER only |
| `ocsp_sign` | 4.14 ms | 15.5 | 115.49 ms | 8.9 | 54.42 ms | 1.2 | 749.29 ms | 1.4 | Rayon parallel |
| `db_insert_ocsp` | 0.20 ms | 314.4 | 3.96 ms | 258.3 | 0.33 ms | 194 | 2.63 ms | 389 | SQLite WAL |
| `ocsp_verify` | 0.22 ms | 293.5 | 3.77 ms | 271.6 | 1.25 ms | 51 | 13.06 ms | 78 | Rayon parallel |
| `db_read_ocsp` | 0.12 ms | 523.4 | 1.14 ms | 895.0 | 0.17 ms | 371 | 1.73 ms | 593 | SQLite read |

## Results: RSA-3072

**Configuration:** RSA-3072 for both CA and subscriber keys.
RSA-3072 key generation takes roughly 4× longer per key than RSA-2048.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 2.42 ms | 0.4 | 2.71 ms | 0.4 | 9.28 ms | 0.11 | 10.02 ms | 0.100 | 1 cert/batch |
| `cert_gen` | 2073.09 ms | 0.03 | 30,197 ms | 0.03 | 2,022 ms | 0.032 | 30,444 ms | 0.034 | Rayon; incl. key gen |
| `db_insert_certs` | 0.61 ms | 104.6 | 6.12 ms | 167.2 | 0.50 ms | 127 | 5.40 ms | 189 | SQLite WAL |
| `cert_verify` | 0.69 ms | 93.2 | 5.45 ms | 188.0 | 1.23 ms | 52 | 8.37 ms | 122 | Rayon parallel |
| `db_read_certs` | 0.21 ms | 301.2 | 1.03 ms | 994.0 | 0.18 ms | 360 | 0.94 ms | 1,088 | SQLite read |
| `crl_build` | 0.08 ms | 11.8 | 0.81 ms | 1.2 | 0.09 ms | 12 | 0.66 ms | 1.5 | 1 CRL covering N serials |
| `crl_sign` | 2.47 ms | 0.4 | 2.90 ms | 0.3 | 9.20 ms | 0.11 | 5.55 ms | 0.18 | 1 CRL/batch |
| `db_insert_crl` | 0.05 ms | 19.5 | 0.15 ms | 6.9 | 0.10 ms | 9.8 | 0.14 ms | 7.2 | SQLite WAL |
| `crl_verify` | 0.06 ms | 17.4 | 0.08 ms | 12.1 | 0.14 ms | 7.2 | 0.11 ms | 9.3 | 1 CRL/batch |
| `db_read_crl` | 0.04 ms | 22.2 | 0.03 ms | 34.9 | 0.04 ms | 26 | 0.03 ms | 32 | SQLite read |
| `ocsp_build` | 0.28 ms | 228.3 | 0.69 ms | 1,485 | 0.39 ms | 165 | 0.66 ms | 1,553 | Rayon; TBS DER only |
| `ocsp_sign` | 18.98 ms | 3.4 | 300.85 ms | 3.4 | 95.27 ms | 0.67 | 1,074 ms | 0.95 | Rayon parallel |
| `db_insert_ocsp` | 0.36 ms | 177.4 | 22.16 ms | 46.2 | 0.35 ms | 185 | 2.97 ms | 344 | SQLite WAL |
| `ocsp_verify` | 0.72 ms | 89.4 | 6.84 ms | 149.7 | 1.42 ms | 45 | 8.93 ms | 115 | Rayon parallel |
| `db_read_ocsp` | 0.10 ms | 657.0 | 1.27 ms | 806.9 | 0.13 ms | 496 | 0.83 ms | 1,237 | SQLite read |

## Results: RSA-4096

**Configuration:** RSA-4096 for both CA and subscriber keys.
RSA-4096 key generation dominates: ~1–2 s/key pair single-threaded.

| Operation | OpenSSL B64 | Tput K/s | OpenSSL B1024 | Tput K/s | NSS B64 | Tput K/s | NSS B1024 | Tput K/s | Notes |
|-----------|-------------|----------|---------------|----------|---------|----------|-----------|----------|-------|
| `ca_self_sign` | 4.24 ms | 0.2 | 4.47 ms | 0.2 | 13.71 ms | 0.073 | 15.20 ms | 0.066 | 1 cert/batch |
| `cert_gen` | 4,212 ms | 0.02 | 89,390 ms | 0.01 | 4,032 ms | 0.016 | 69,943 ms | 0.015 | Rayon; incl. key gen |
| `db_insert_certs` | 0.42 ms | 151.0 | 4.60 ms | 222.4 | 0.40 ms | 160 | 5.54 ms | 185 | SQLite WAL |
| `cert_verify` | 0.64 ms | 99.4 | 6.85 ms | 149.4 | 1.18 ms | 54 | 12.34 ms | 83 | Rayon parallel |
| `db_read_certs` | 0.09 ms | 692.9 | 0.84 ms | 1,225 | 0.11 ms | 601 | 1.05 ms | 978 | SQLite read |
| `crl_build` | 0.04 ms | 24.1 | 0.56 ms | 1.8 | 0.05 ms | 21 | 0.70 ms | 1.4 | 1 CRL covering N serials |
| `crl_sign` | 3.79 ms | 0.3 | 4.15 ms | 0.2 | 8.10 ms | 0.12 | 10.25 ms | 0.098 | 1 CRL/batch |
| `db_insert_crl` | 0.04 ms | 24.1 | 0.11 ms | 9.1 | 0.06 ms | 16 | 0.13 ms | 7.5 | SQLite WAL |
| `crl_verify` | 0.07 ms | 15.3 | 0.08 ms | 12.0 | 0.13 ms | 7.5 | 0.17 ms | 5.8 | 1 CRL/batch |
| `db_read_crl` | 0.02 ms | 55.0 | 0.03 ms | 31.1 | 0.02 ms | 46 | 0.03 ms | 35 | SQLite read |
| `ocsp_build` | 0.22 ms | 287.8 | 0.60 ms | 1,699 | 0.29 ms | 223 | 0.60 ms | 1,709 | Rayon; TBS DER only |
| `ocsp_sign` | 37.26 ms | 1.7 | 667.60 ms | 1.5 | 111.96 ms | 0.57 | 2,253 ms | 0.45 | Rayon parallel |
| `db_insert_ocsp` | 0.34 ms | 186.4 | 5.11 ms | 200.2 | 0.27 ms | 240 | 3.74 ms | 274 | SQLite WAL |
| `ocsp_verify` | 0.99 ms | 64.3 | 12.59 ms | 81.3 | 1.20 ms | 53 | 16.42 ms | 62 | Rayon parallel |
| `db_read_ocsp` | 0.10 ms | 667.4 | 1.27 ms | 806.4 | 0.11 ms | 565 | 1.08 ms | 947 | SQLite read |

## Backend Comparison: OpenSSL vs NSS (batch=1024)

Signing and verification operations only. Database operations are identical
across backends. Ratio > 1 means NSS is slower; ratio < 1 means NSS is faster.

| Algorithm | Operation | OpenSSL | NSS | Ratio |
|-----------|-----------|---------|-----|-------|
| ECDSA P-256 | `cert_gen` (sign) | 6.65 ms | 25.27 ms | **3.8× slower** |
| ECDSA P-256 | `cert_verify` | 8.47 ms | 45.57 ms | **5.4× slower** |
| ECDSA P-256 | `crl_sign` | 0.82 ms | 0.99 ms | 1.2× |
| ECDSA P-256 | `crl_verify` | 0.13 ms | 0.52 ms | **4.0× slower** |
| ECDSA P-256 | `ocsp_sign` | 4.11 ms | 21.77 ms | **5.3× slower** |
| ECDSA P-256 | `ocsp_verify` | 9.28 ms | 46.65 ms | **5.0× slower** |
| Ed25519 | `cert_gen` (sign) | 15.64 ms | 15.98 ms | ≈ equal |
| Ed25519 | `cert_verify` | 15.31 ms | 8.36 ms | **0.55× (NSS faster)** |
| Ed25519 | `crl_sign` | 1.33 ms | 0.85 ms | **0.64× (NSS faster)** |
| Ed25519 | `crl_verify` | 0.28 ms | 0.15 ms | **0.54× (NSS faster)** |
| Ed25519 | `ocsp_sign` | 7.02 ms | 11.01 ms | **1.6× slower** |
| Ed25519 | `ocsp_verify` | 15.09 ms | 8.12 ms | **0.54× (NSS faster)** |
| ML-DSA-44 | `cert_gen` (sign) | 71.43 ms | 198.11 ms | **2.8× slower** |
| ML-DSA-44 | `cert_verify` | 15.11 ms | 15.21 ms | ≈ equal |
| ML-DSA-44 | `ocsp_sign` | 58.87 ms | 169.52 ms | **2.9× slower** |
| ML-DSA-44 | `ocsp_verify` | 14.65 ms | 12.47 ms | **0.85× (NSS faster)** |
| ML-DSA-65 | `cert_gen` (sign) | 106.14 ms | 313.03 ms | **2.9× slower** |
| ML-DSA-65 | `cert_verify` | 23.15 ms | 24.11 ms | 1.0× |
| ML-DSA-65 | `ocsp_sign` | 86.79 ms | 299.44 ms | **3.5× slower** |
| ML-DSA-65 | `ocsp_verify` | 19.14 ms | 28.89 ms | **1.5× (NSS slower)** |
| RSA-2048 | `cert_gen` (incl. keygen) | 9,097 ms | 8,427 ms | ≈ equal |
| RSA-2048 | `cert_verify` | 3.37 ms | 11.75 ms | **3.5× slower** |
| RSA-2048 | `crl_sign` | 2.02 ms | 4.02 ms | **2.0× slower** |
| RSA-2048 | `crl_verify` | 0.14 ms | 0.12 ms | ≈ equal |
| RSA-2048 | `ocsp_sign` | 115.49 ms | 749.29 ms | **6.5× slower** |
| RSA-2048 | `ocsp_verify` | 3.77 ms | 13.06 ms | **3.5× slower** |
| RSA-3072 | `cert_gen` (incl. keygen) | 30,197 ms | 30,444 ms | ≈ equal |
| RSA-3072 | `cert_verify` | 5.45 ms | 8.37 ms | **1.5× slower** |
| RSA-3072 | `crl_sign` | 2.90 ms | 5.55 ms | **1.9× slower** |
| RSA-3072 | `crl_verify` | 0.08 ms | 0.11 ms | 1.4× |
| RSA-3072 | `ocsp_sign` | 300.85 ms | 1,074 ms | **3.6× slower** |
| RSA-3072 | `ocsp_verify` | 6.84 ms | 8.93 ms | 1.3× |
| RSA-4096 | `cert_gen` (incl. keygen) | 89,390 ms | 69,943 ms | **0.78× (NSS faster)** |
| RSA-4096 | `cert_verify` | 6.85 ms | 12.34 ms | **1.8× slower** |
| RSA-4096 | `crl_sign` | 4.15 ms | 10.25 ms | **2.5× slower** |
| RSA-4096 | `crl_verify` | 0.08 ms | 0.17 ms | 2.1× |
| RSA-4096 | `ocsp_sign` | 667.60 ms | 2,253 ms | **3.4× slower** |
| RSA-4096 | `ocsp_verify` | 12.59 ms | 16.42 ms | 1.3× |

## Algorithm Comparison: OpenSSL Backend (batch=1024)

`cert_gen` for RSA keys includes subscriber key pair generation and dominates;
all other algorithms generate keys at CA setup time only.

| Operation | ECDSA P-256 | Ed25519 | RSA-2048 | RSA-4096 | ML-DSA-44 | ML-DSA-65 |
|-----------|-------------|---------|----------|----------|-----------|-----------|
| `ca_self_sign` | 0.08 ms | 0.13 ms | 1.27 ms | 4.47 ms | 0.45 ms | 0.71 ms |
| `cert_gen` | 6.65 ms | 15.64 ms | 9,097 ms† | 89,390 ms† | 71.43 ms | 106.14 ms |
| `cert_verify` | 8.47 ms | 15.31 ms | 3.37 ms | 6.85 ms | 15.11 ms | 23.15 ms |
| `crl_sign` | 0.82 ms | 1.33 ms | 2.02 ms | 4.15 ms | 1.16 ms | 1.42 ms |
| `crl_verify` | 0.13 ms | 0.28 ms | 0.14 ms | 0.08 ms | 0.18 ms | 0.24 ms |
| `ocsp_sign` | 4.11 ms | 7.02 ms | 115.49 ms | 667.60 ms | 58.87 ms | 86.79 ms |
| `ocsp_verify` | 9.28 ms | 15.09 ms | 3.77 ms | 12.59 ms | 14.65 ms | 19.14 ms |
| `db_insert_certs` | 4.04 ms | 4.01 ms | 23.02 ms | 4.60 ms | 32.42 ms | 42.12 ms |
| `db_insert_ocsp` | 3.29 ms | 4.17 ms | 3.96 ms | 5.11 ms | 26.13 ms | 25.38 ms |
| `ocsp_build` | 1.20 ms | 0.93 ms | 0.71 ms | 0.60 ms | 0.57 ms | 0.55 ms |

† RSA `cert_gen` includes RSA key pair generation per subscriber certificate.

## Migration Impact: rust-openssl fork → native-ossl (OpenSSL backend, batch=1024)

Baseline collected at commit `c42c2f8` (last commit on the rust-openssl fork,
2026-04-17) using the same `--min-seconds 20` methodology. The fork used a
PQC-patched rust-openssl crate (`github.com/abbra/rust-openssl`, branch
`pqc-prs`) that bundled a custom OpenSSL build with ML-DSA (Dilithium) support
compiled into it. The current native-ossl crate links against the system
OpenSSL 3.x, which is built with the distribution's compiler flags. Current
figures also include the `alg_cache` and `BackendPublicKey` pkey cache
optimisations. RSA `cert_gen` (dominated by key generation, ±30% thermal
variance) is excluded.

| Algorithm | Operation | rust-openssl fork | native-ossl (+caches) | Delta |
|-----------|-----------|-------------------|-----------------------|-------|
| ECDSA P-256 | `cert_gen` | 6.77 ms | 6.65 ms | ≈0% |
| ECDSA P-256 | `cert_verify` | 9.96 ms | 8.47 ms | **−15%** |
| ECDSA P-256 | `ocsp_sign` | 3.58 ms | 4.11 ms | +15% |
| ECDSA P-256 | `ocsp_verify` | 9.77 ms | 9.28 ms | −5% |
| Ed25519 | `cert_gen` | 19.85 ms | 15.64 ms | **−21%** |
| Ed25519 | `cert_verify` | 17.94 ms | 15.31 ms | **−15%** |
| Ed25519 | `ocsp_sign` | 7.36 ms | 7.02 ms | −5% |
| Ed25519 | `ocsp_verify` | 17.67 ms | 15.09 ms | **−15%** |
| ML-DSA-44 | `cert_gen` | 80.05 ms | 71.43 ms | **−11%** |
| ML-DSA-44 | `cert_verify` | 12.60 ms | 15.11 ms | +20% |
| ML-DSA-44 | `ocsp_sign` | 62.90 ms | 58.87 ms | **−6%** |
| ML-DSA-44 | `ocsp_verify` | 12.37 ms | 14.65 ms | +18% |
| ML-DSA-65 | `cert_gen` | 186.05 ms | 106.14 ms | **−43%** |
| ML-DSA-65 | `cert_verify` | 31.03 ms | 23.15 ms | **−25%** |
| ML-DSA-65 | `ocsp_sign` | 150.16 ms | 86.79 ms | **−42%** |
| ML-DSA-65 | `ocsp_verify` | 30.33 ms | 19.14 ms | **−37%** |
| RSA-2048 | `cert_verify` | 5.93 ms | 3.37 ms | **−43%** |
| RSA-2048 | `ocsp_sign` | 115.19 ms | 115.49 ms | ≈0% |
| RSA-2048 | `ocsp_verify` | 6.17 ms | 3.77 ms | **−39%** |
| RSA-3072 | `cert_verify` | 6.64 ms | 5.45 ms | **−18%** |
| RSA-3072 | `ocsp_sign` | 262.44 ms | 300.85 ms | **+15% regression** |
| RSA-3072 | `ocsp_verify` | 7.23 ms | 6.84 ms | −5% |
| RSA-4096 | `cert_verify` | 8.75 ms | 6.85 ms | **−22%** |
| RSA-4096 | `ocsp_sign` | 570.83 ms | 667.60 ms | **+17% regression** |
| RSA-4096 | `ocsp_verify` | 9.59 ms | 12.59 ms | **+31% regression** |

**Key observations:**

- **Ed25519 and ECDSA P-256 verification** improve under native-ossl + pkey
  cache. Ed25519 `cert_gen` is 21% faster; `cert_verify` is 15% faster. The
  pkey cache eliminates the repeated `d2i_PUBKEY` round-trip in the Rayon
  parallel verification loop.

- **ML-DSA signing improves** after the `sign_into` fix: ML-DSA-44 `cert_gen` is
  −11% faster than the fork baseline; ML-DSA-65 `cert_gen` is −43% faster.  The
  root cause of the former regression was that OpenSSL 3.5's `EVP_DigestSign` for
  ML-DSA, when called with a NULL output pointer (as in `sign_oneshot`'s size
  query), runs the full signing computation rather than returning the fixed output
  length.  Both the rust-openssl fork and native-ossl link the same system
  libcrypto; the difference was purely in the Rust binding layer.  The fix uses
  FIPS 204 fixed lengths (2 420 B / 3 309 B / 4 627 B) to pre-allocate the output
  buffer and calls `EVP_DigestSign` only once.

- **ML-DSA verification improves with `MessageVerifier`**: ML-DSA-65 `cert_verify`
  is −25% and `ocsp_verify` is −37% vs the fork baseline.  ML-DSA-44 verification
  shows higher batch-to-batch variance and mixed results (cert_verify +20%,
  ocsp_verify +18% vs fork) — the thermal sensitivity of the 1024-item Rayon
  workload makes these numbers less stable than the signing figures.

- **RSA private-key operations (sign) regress slightly** for RSA-3072 and
  RSA-4096 (`ocsp_sign` +15–17%). RSA-2048 `ocsp_sign` is unchanged (~115 ms).
  The overhead scales with key size rather than being a fixed per-call cost.

- **RSA verification improves substantially** (−18% to −43%) due to the pkey
  cache removing `d2i_PUBKEY` from the parallel verification hot path. RSA public
  verification (e=65537, 17 squarings) is fast enough (~3–9 µs/cert) that the
  former re-parse was a dominant fraction of per-call cost.

- **RSA-4096 `ocsp_verify` regresses 31%** under native-ossl despite the pkey
  cache improvement that is visible in `cert_verify` (−22%). The divergence
  between the two verification paths for RSA-4096 is not yet explained.

## Analysis

### Backend: OpenSSL vs NSS

**NSS signing overhead** is significant across all algorithms. The NSS backend routes
every signing operation through the PKCS#11 interface via `SEC_SignData` (RSA, ECDSA,
ML-DSA) or `PK11_Sign` (Ed25519), which includes per-operation token lookup and
mechanism dispatch. For ECDSA P-256, this adds roughly 3.8–5.3× overhead over
OpenSSL's direct `EVP_DigestSign` path. For ML-DSA-44/65, the signing overhead is
1.8–2.0× — smaller in relative terms because ML-DSA signing itself is expensive.

**Ed25519 and ML-DSA verification favour NSS**: `cert_verify` and `ocsp_verify` are
*faster* under NSS for Ed25519 (NSS ~1.8× faster) and ML-DSA-44 (NSS ~1.4–1.5×
faster). NSS verifies Ed25519 via `PK11_Verify`, which dispatches directly to the
softokn `CKM_EDDSA` mechanism. ML-DSA verification under NSS still outpaces OpenSSL,
though the gap has narrowed significantly compared to before the public-key parse
cache was introduced (see below).

**The pkey cache narrows the OpenSSL/NSS gap for ML-DSA verification.** Before the
`BackendPublicKey` cache, OpenSSL called `d2i_PUBKEY` on every verification, parsing
the 1 344-byte (ML-DSA-44) or 1 952-byte (ML-DSA-65) SPKI DER for each of the 1 024
parallel items. With the cache, the parsed `EVP_PKEY` handle is cloned via
`EVP_PKEY_up_ref` (one atomic refcount) on each call. ML-DSA-44 `ocsp_verify` at
batch=1 024 improved by ~41% (30 ms → 17.8 ms); `cert_verify` improved by ~21%
(29.7 ms → 23.4 ms). The same cache also improved RSA-3072 `cert_verify` by ~49%
(10.6 ms → 5.5 ms) — because RSA public verification (e=65537, 17 squarings) is fast
enough that the former re-parse represented a large fraction of total call time. NSS
presumably keeps its own parsed key handle internally, so these improvements bring the
two backends closer together.

**ECDSA P-256 verification** is substantially slower under NSS (5.0–5.4× at
batch=1024). `VFY_VerifyDataWithAlgorithmID` routes through the PKCS#11
`CKF_VERIFY` path, adding per-verification overhead compared to OpenSSL's
direct EVP layer.

**The x509bench signing overhead** for NSS reflects per-certificate signer
initialization: each `cert_gen` task imports the private key via
`PK11_ImportDERPrivateKeyInfoAndReturnKey` before signing. Reusing a single
`NssSigner` across multiple certificates in the same batch would eliminate this
overhead. The signing operations themselves (ECDSA P-256 at ~4 µs/sign,
ML-DSA-65 at ~90 µs/sign) are comparable between backends; the extra latency
is PKCS#11 setup cost, not cryptographic computation.

### OpenSSL Backend: ECDSA P-256

`cert_gen` and `ocsp_sign` use Rayon parallel iteration across all logical
cores. Throughput rises from 124.4 K/s to 154.0 K/s between batch=64 and
batch=1024 as the thread pool becomes more fully saturated. `cert_verify`
shows a similar pattern (90.4 → 120.8 K/s).

`crl_build` and `crl_sign` always cover exactly one CRL per batch, regardless
of batch size. The CRL TBS DER grows proportionally with the number of revoked
serial entries, so throughput falls from 13.4 K/s (batch=64) to 1.2 K/s
(batch=1024) — all growth is in DER encoding plus P-256 signing of the larger
TBS blob.

`ocsp_build` (TBS DER construction only) reaches 856 K/s at batch=1024 as
full Rayon parallelism is achieved. `ocsp_verify` at batch=1024 reaches
110.4 K/s.

SQLite inserts use a single `prepare()` before the transaction loop so the
SQL parse cost is paid once per batch rather than once per row.

### OpenSSL Backend: Ed25519

Ed25519 signing throughput at batch=1024 (65.5 K/s) is similar to ECDSA P-256
(154.0 K/s when accounting for the additional subscriber key generation overhead
in Ed25519). Both are one-shot algorithms with no pre-hash step.

### OpenSSL Backend: ML-DSA-44 and ML-DSA-65

**Signing is the dominant cost.** `cert_gen` at batch=1024 is 10.7× slower for
ML-DSA-44 (71.43 ms vs 6.65 ms for ECDSA P-256) and 15.9× slower for ML-DSA-65
(106.14 ms). ML-DSA signing involves large polynomial matrix operations that stress
the L2/L3 cache, limiting effective Rayon parallelism across 16 cores. The current
figures reflect all four optimisations (see introduction): the `sign_into` fast path
(single `EVP_DigestSign` call with FIPS 204 pre-allocated buffer) is used for the
common case; the `MessageSigner::sign_oneshot` path (via `EVP_PKEY_sign_message_init`)
is available when a FIPS 204 §5.2 context string is set but is ~13–21% slower due
to internal update+final dispatch and is not exercised by the bench.

**`ocsp_sign` is the most expensive Rayon operation**: 58.87 ms (ML-DSA-44) and
86.79 ms (ML-DSA-65) at batch=1024, vs 4.11 ms for ECDSA P-256. Each OCSP response
requires one ML-DSA signing operation, and 1024 parallel signs saturate cache
heavily.

**Verification** at batch=1024 uses `MessageVerifier` (`EVP_PKEY_sign_message_init`
+ `EVP_PKEY_verify_message`), which eliminates the MD dispatch layer for ML-DSA's
no-pre-hash algorithm.  ML-DSA-65 `cert_verify` is 23.15 ms (−25% vs the
rust-openssl fork baseline of 31.03 ms) and `ocsp_verify` is 19.14 ms (−37% vs
30.33 ms).  ML-DSA-44 verification numbers show higher run-to-run variance at these
batch sizes: `cert_verify` is 15.11 ms and `ocsp_verify` is 14.65 ms.

**`ocsp_build`** (pure DER encoding, no crypto) is similarly fast for all algorithms
(0.60–1.20 ms at batch=1024) because synta's encoder splices the ML-DSA signature
BIT STRING as a zero-copy `BitStringRef` slice.

**Database throughput is I/O-bound.** ML-DSA-44 certificates are ~4 KB and
ML-DSA-65 certificates are ~5.5 KB each, vs ~700 bytes for ECDSA P-256.
SQLite WAL write time scales roughly with byte volume.

### OpenSSL Backend: RSA-2048, RSA-3072, RSA-4096

**RSA `cert_gen` is dominated by key pair generation**, not by signing. Each
subscriber certificate requires a fresh RSA key pair: ~200–400 ms for RSA-2048,
~1–2 s for RSA-3072, and ~2–4 s for RSA-4096, single-threaded. Rayon parallelizes
across 16 cores, but the absolute batch times remain extreme (9.1 s, 30.2 s, and
89.4 s at batch=1024 for RSA-2048/3072/4096 respectively). These numbers are not
comparable to other algorithms for signing performance — they measure key generation
speed. RSA key generation also drives significant thermal load; observed times can
vary by ±30% across runs depending on sustained CPU frequency.

**RSA verification is fast** due to the small public exponent (e=65537). `cert_verify`
at batch=1024 is 3.37 ms for RSA-2048 and 6.85 ms for RSA-4096 — faster than
ECDSA P-256 (8.47 ms). The single modular exponentiation with e=65537 (17 squarings)
is much cheaper than the ECDSA scalar point multiplication.

**The pkey cache had the largest relative impact on RSA verification.** RSA-3072
`cert_verify` improved by ~49% (10.6 ms → 5.45 ms) and RSA-4096 by ~26%
(9.31 ms → 6.85 ms). Because RSA public verification (fast, ~3–7 µs/cert) is
quick relative to ML-DSA, the former `d2i_PUBKEY` overhead represented a large
fraction of total call time.

**RSA `ocsp_sign` is the most expensive non-keygen operation.** Each OCSP response
requires one RSA private-key operation (full modular exponentiation with the private
exponent d). At batch=1024, `ocsp_sign` takes 115 ms (RSA-2048), 301 ms (RSA-3072),
and 668 ms (RSA-4096). The cost grows roughly as O(key_bits²·³) — consistent with
the ~5.8× increase from RSA-2048 to RSA-4096.

**NSS overhead is highly asymmetric for RSA.** For RSA-2048, `ocsp_sign` is 6.5×
slower under NSS (749 ms vs 115 ms). The private-key operation itself takes only
~80 µs at 2048 bits, so the PKCS#11 per-call setup cost — token lookup, mechanism
dispatch, `C_Sign` call — represents a large fraction of the total. At RSA-4096,
where the private-key operation takes ~800 µs, the same PKCS#11 overhead is
proportionally smaller, reducing the ratio to 3.4× (2,253 ms vs 668 ms).

**NSS `cert_gen` (including key generation) is unexpectedly comparable or faster**
for RSA-3072 and RSA-4096. RSA key generation uses OpenSSL directly (not routed
through `NssSigner`), so the backend choice does not affect key generation time.
The minor variance reflects Rayon scheduling randomness across long-running tasks
— not a real backend difference.

**Database performance scales with DER blob size.** RSA-2048 certificates are ~800
bytes (smaller than ML-DSA-44), so `db_insert_certs` is fast. RSA-4096 certificates
are ~1.7 KB. OCSP response DER is smaller for RSA (~300 bytes) than for
ML-DSA-65 (~3.7 KB), so `db_insert_ocsp` is comparable to ECDSA P-256 for RSA.