synta 0.1.3 - Docs.rs

# Performance

## Test Environment

- **Hardware:** Lenovo ThinkPad P1 Gen 5, 12th Gen Intel(R) Core(TM) i7-12800H, 64 GB RAM
- **Platform:** Linux 6.15.8-200.fc42.x86_64
- **Benchmark tool:** Criterion.rs 0.8 (Rust); `time.perf_counter` with warmup (Python)
- **Samples:** 100 per benchmark (20 for whole-store benchmarks); 3 s warmup, 5 s measurement
- **Build:** `--release` with full optimizations
- **Test vectors:** PyCA cryptography PKITS (traditional RSA/ECDSA), IETF LAMPS (ML-DSA, ML-KEM), Mozilla CA roots (NSS), Common CA Database (CCADB)
- **Run date:** 2026-03-08

Five benchmark suites cover different aspects of performance:

- **Library comparison** (`synta-bench/benches/comparison.rs`) — synta vs six other X.509
  parsing implementations (three pure-Rust, three C-backed) on identical input, measuring
  both parse-only and parse+all-fields operation profiles.
- **Element vs Typed API** (`synta-bench/benches/comparison_typed.rs`) — three parsing modes
  within synta itself: O(1) lazy `Element` capture, full recursive `Element` traversal, and
  typed `Certificate` decoding via derive macros.
- **Bindings overhead** (`synta-bench/benches/bindings.rs`) — the additional cost of each
  of synta's language binding layers (Rust typed, Rust element, C FFI) on top of the shared
  `synta-certificate` backend.
- **CA store benchmarks** (`synta-bench/benches/mozilla_ca_certs.rs`,
  `synta-bench/benches/ccadb_certs.rs`) — synta vs NSS, rust-openssl, and ossl on the
  real-world certificate stores shipped in production operating systems.
- **Python benchmark** (`python/bench_certificate.py`) — synta's PyO3 binding vs
  `cryptography.x509`. Run separately: `cd synta-python && maturin develop --release && cd ..
  && python python/bench_certificate.py`.

The `ossl` crate is part of the [Kryoptic](https://github.com/latchset/kryoptic) project
and provides partial OpenSSL bindings; where its API does not cover what certificate parsing
needs, the benchmark falls back to direct unsafe C FFI into the library. `rust-openssl` is
separate, using the `openssl` crate's safe Rust bindings.

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*

- [Test Environment](#test-environment)
- [Quick Reference](#quick-reference)
- [Library Comparison — Parse Only](#library-comparison-parse-only)
  - [Implementations](#implementations)
  - [Traditional X.509 Certificates](#traditional-x509-certificates)
  - [Post-Quantum (ML-DSA) Certificates](#post-quantum-ml-dsa-certificates)
  - [Certificate Size Scalability](#certificate-size-scalability)
- [Library Comparison — Parse + All Fields](#library-comparison-parse-all-fields)
  - [Traditional X.509 Certificates](#traditional-x509-certificates-1)
  - [Post-Quantum (ML-DSA) Certificates](#post-quantum-ml-dsa-certificates-1)
- [Element vs Typed API](#element-vs-typed-api)
- [Bindings Overhead](#bindings-overhead)
  - [Traditional X.509 — Parse Only](#traditional-x509-parse-only)
  - [Traditional X.509 — Parse + All Fields](#traditional-x509-parse-all-fields)
  - [Post-Quantum (ML-DSA) — Parse Only](#post-quantum-ml-dsa-parse-only)
  - [Post-Quantum (ML-DSA) — Parse + All Fields](#post-quantum-ml-dsa-parse-all-fields)
- [PKCS#7 and PKCS#12 Certificate Extraction](#pkcs7-and-pkcs12-certificate-extraction)
  - [Test Inputs](#test-inputs)
  - [Rust-Level Results (Criterion, release build)](#rust-level-results-criterion-release-build)
  - [Python vs cryptography (bench_pkcs.py, CPython 3.14+)](#python-vs-cryptography-bench_pkcspy-cpython-314)
  - [Why These Numbers Differ](#why-these-numbers-differ)
  - [Reproducing](#reproducing)
- [Real-World CA Store Benchmarks](#real-world-ca-store-benchmarks)
  - [Mozilla NSS Root Store (`mozilla_ca_certs`)](#mozilla-nss-root-store-mozillacacerts)
  - [CCADB V4 All Certificate Information (`ccadb_certs`)](#ccadb-v4-all-certificate-information-ccadb_certs)
  - [ML-DSA Synthetic CA Hierarchy (`mldsa_certs`)](#ml-dsa-synthetic-ca-hierarchy-mldsa_certs)
  - [Throughput Results](#throughput-results)
  - [Single-Certificate Performance (Hot Cache)](#single-certificate-performance-hot-cache)
  - [Per-Field Access Latency](#per-field-access-latency)
  - [Trust Hierarchy Construction](#trust-hierarchy-construction)
  - [Why C Libraries Are Slower](#why-c-libraries-are-slower)
- [ASN.1 Primitive Performance](#asn1-primitive-performance)
  - [Tag/Length Parsing](#taglength-parsing)
  - [Integer Encode/Decode](#integer-encodedecode)
  - [Constrained INTEGER — Native Primitive Types](#constrained-integer-native-primitive-types)
  - [OctetString Encode/Decode](#octetstring-encodedecode)
  - [Sequence Encode/Decode](#sequence-encodedecode)
  - [Derive Macro Overhead](#derive-macro-overhead)
- [Memory Usage](#memory-usage)
- [Benchmark Methodology](#benchmark-methodology)
  - [Setup](#setup)
  - [Test Certificates](#test-certificates)
  - [Measurement Scope](#measurement-scope)
  - [Reproducing](#reproducing-1)
- [Recommendations](#recommendations)
  - [When to choose synta](#when-to-choose-synta)
  - [When to choose x509-parser](#when-to-choose-x509-parser)
  - [When to choose cryptography-x509](#when-to-choose-cryptography-x509)
- [See Also](#see-also)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

---

## Quick Reference

Average over 5 PyCA PKITS traditional certificates (914–968 bytes):

| Library           | Parse-only   | Parse+fields | vs synta (parse-only) | vs synta (parse+fields) |
| ----------------- | ------------ | ------------ | --------------------- | ----------------------- |
| **synta**         | **0.48 µs**  | **1.32 µs**  | —                     | —                       |
| cryptography-x509 | 1.45 µs      | 1.43 µs      | 3.0× slower           | 1.1× slower             |
| x509-parser       | 2.01 µs      | 1.99 µs      | 4.2× slower           | 1.5× slower             |
| x509-cert         | 3.16 µs      | 3.15 µs      | 6.6× slower           | 2.4× slower             |
| NSS               | 7.90 µs      | 7.99 µs      | 16× slower            | 6.1× slower             |
| rust-openssl      | 15.4 µs      | 15.1 µs      | 32× slower            | 11× slower              |
| ossl              | 16.1 µs      | 15.8 µs      | 33× slower            | 12× slower              |

Parse+fields accesses every named field: serial number, issuer/subject DNs, signature
algorithm OID, signature bytes, validity period, public key algorithm OID, public key bytes,
and version. The parse+fields speedup is the fair end-to-end comparison: synta's parse-only
advantage is large because most fields are stored as zero-copy slices deferred until access,
while other libraries must materialise all fields eagerly at parse time.

CA store throughput (parse-only, all certs in each dataset):

| Dataset                   | synta              | NSS          | rust-openssl | ossl         |
| ------------------------- | ------------------ | ------------ | ------------ | ------------ |
| Mozilla 180 root CAs      | **88 µs** (2.0 M/sec) | 1.58 ms (18×) | 3.55 ms (40×) | 3.62 ms (41×) |
| CCADB 9,898 certs         | **5.10 ms** (1.9 M/sec) | 106 ms (21×) | 203 ms (40×) | 214 ms (42×) |
| ML-DSA synth 9,889 certs  | **5.78 ms** (1.71 M/sec) | 103 ms (18×) | 239 ms (41×) | 256 ms (44×) |

---

## Library Comparison — Parse Only

```bash
cargo bench -p synta-bench --bench comparison --features bench-compare
```

(`library_comparison` and `post_quantum_comparison` Criterion groups)

### Implementations

Each library takes a fundamentally different approach to the same problem:

| Implementation        | Parse strategy                                                                     |
| --------------------- | ---------------------------------------------------------------------------------- |
| **synta**             | Typed RFC 5280 parse; issuer/subject/extensions stored as `RawDer<'a>` (borrowed byte span — no DN traversal, no allocation at parse time) |
| **cryptography-x509** | PyCA Rust core; deferred-everything — raw DER byte offsets, decode only on first field access |
| **x509-parser**       | nom-based; fully eager typed parse — every DN, extension, and value decoded during parse |
| **x509-cert**         | RustCrypto; fully eager typed parse — same approach as x509-parser but using the `der` crate |
| **NSS**               | `CERT_NewTempCertificate`; formats issuer/subject Distinguished Names into C strings *at parse time*; arena allocation |
| **rust-openssl**      | OpenSSL `d2i_X509` via the `openssl` crate's safe Rust bindings                   |
| **ossl**              | OpenSSL `d2i_X509` via partial Rust FFI (Kryoptic project)                        |

The dominant cost in X.509 parsing is Distinguished Name traversal: a certificate's issuer
and subject each contain a SEQUENCE OF SET OF SEQUENCE with per-attribute OID lookup. synta
defers this entirely by storing the Name as a `RawDer<'a>` — a pointer+length into the
original input with no decoding. cryptography-x509 takes a similar deferred approach. The
nom-based and RustCrypto libraries decode Names eagerly. NSS goes further and formats them
into C strings, which is the dominant fraction of its 16× parse overhead.

### Traditional X.509 Certificates

Five PKITS end-entity certificates (AllCertificatesNoPoliciesTest2,
AllCertificatesSamePoliciesTest×2, AllCertificatesanyPolicyTest11, AnyPolicyTest14EE),
914–968 bytes each:

| Certificate              | synta       | cryptography-x509 | x509-parser | x509-cert   | NSS        |
| ------------------------ | ----------- | ----------------- | ----------- | ----------- | ---------- |
| cert_00 (NoPolicies)     | 483.55 ns   | 1426.4 ns         | 1826.3 ns   | 3032.1 ns   | 7883.6 ns  |
| cert_01 (SamePolicies-1) | 485.60 ns   | 1466.8 ns         | 2094.8 ns   | 3238.4 ns   | 7941.7 ns  |
| cert_02 (SamePolicies-2) | 484.66 ns   | 1448.2 ns         | 2135.0 ns   | 3183.5 ns   | 8017.1 ns  |
| cert_03 (anyPolicy)      | 480.88 ns   | 1441.0 ns         | 1981.2 ns   | 3180.3 ns   | 7908.4 ns  |
| cert_04 (AnyPolicyEE)    | 476.91 ns   | 1441.1 ns         | 1990.7 ns   | 3147.8 ns   | 7755.7 ns  |
| **Average**              | **482 ns**  | **1445 ns**       | **2006 ns** | **3156 ns** | **7901 ns**|

rust-openssl and ossl averaged **15.4 µs** and **16.1 µs** respectively across the five
certs (not shown per-cert to keep the table readable).

synta is **3.0× faster** than cryptography-x509, **4.2× faster** than x509-parser,
**6.6× faster** than x509-cert, and **16× faster** than NSS.

The variation across certs (476–486 ns for synta) reflects differences in extension lists:
certs with more policy extensions contain more bytes in the `extensions` `RawDer` field's
tag+length header, which is the only part synta reads at parse time.

### Post-Quantum (ML-DSA) Certificates

| Certificate | Size    | synta       | cryptography-x509 | x509-parser | x509-cert   | NSS        |
| ----------- | ------- | ----------- | ----------------- | ----------- | ----------- | ---------- |
| ML-DSA-44   | 3,992 B | 462.52 ns   | 1237.0 ns         | 1720.8 ns   | 2610.9 ns   | 7118.6 ns  |
| ML-DSA-65   | 5,521 B | 462.37 ns   | 1238.3 ns         | 1698.1 ns   | 2658.0 ns   | 7168.1 ns  |
| ML-DSA-87   | 7,479 B | 463.95 ns   | 1236.6 ns         | 1707.0 ns   | 2678.0 ns   | 7263.5 ns  |
| **Average** |         | **463 ns**  | **1237 ns**       | **1709 ns** | **2649 ns** | **7183 ns**|

rust-openssl and ossl ranged from 13.9–17.0 µs and 14.5–17.8 µs respectively, growing with
certificate size.

**Parse time is size-independent for synta**: the large ML-DSA signature BIT STRING
(2,420–4,627 bytes) is stored as a `BitStringRef<'a>` — a borrowed pointer+length into the
input buffer — with no copying and no content decoding. synta reads the same tag+length
fields regardless of the payload size, so a 7 KB ML-DSA-87 certificate parses as fast as a
900 B traditional one.

Comparison libraries that decode content eagerly (x509-parser, x509-cert, rust-openssl, ossl)
grow roughly linearly with the size of the payload they process. cryptography-x509 is
similarly size-independent because it also defers content decoding. NSS copies the full DER
buffer into its arena even though it doesn't decode the signature content, so it grows
slightly with certificate size.

### Certificate Size Scalability

Three representative sizes from the PKITS corpus (parse-only), now including all seven
libraries:

| Size           | synta     | cryptography-x509 | x509-parser | x509-cert  | NSS        | rust-openssl | ossl       |
| -------------- | --------- | ----------------- | ----------- | ---------- | ---------- | ------------ | ---------- |
| Small (914 B)  | 476.20 ns | 1480.3 ns         | 1985.8 ns   | 3148.0 ns  | 7917.4 ns  | 14977 ns     | 15715 ns   |
| Medium (933 B) | 483.16 ns | 1508.3 ns         | 2379.8 ns   | 3496.5 ns  | 8320.2 ns  | 15581 ns     | 15903 ns   |
| Large (968 B)  | 483.97 ns | 1629.0 ns         | 2812.1 ns   | 3866.9 ns  | 8171.3 ns  | 15503 ns     | 16241 ns   |
| **Growth**     | **+2%**   | **+10%**          | **+42%**    | **+23%**   | **+3%**    | **+4%**      | **+3%**    |

synta grows only 2% over a 6% certificate-size increase. x509-parser grows 42% over the
same range — nom parsers traverse content bytes proportionally, so more bytes in the
certificate body mean more decode work. x509-cert grows 23% for the same reason. cryptography-x509
is mostly deferred so it grows only 10%, dominated by the overhead of computing more DER
byte offsets. NSS, rust-openssl, and ossl are dominated by fixed C-library overhead
(locking, arena allocation, FFI transitions) that dwarfs any content growth at these sizes.

---

## Library Comparison — Parse + All Fields

```bash
cargo bench -p synta-bench --bench comparison --features bench-compare
```

(`library_comparison_fields` and `post_quantum_comparison_fields` Criterion groups)

This profile parses the certificate and then reads every named field: serial number, issuer
DN (`format_dn()`), subject DN (`format_dn()`), signature algorithm OID
(`identify_signature_algorithm()`), signature bytes, notBefore, notAfter, public key
algorithm OID (`identify_public_key_algorithm()`), public key bytes, and version.

The key insight is how parse-only cost and field-access cost combine for each library:

- **synta**: parse-only is fast because Names are `RawDer` (no decode); field access triggers
  `format_dn()` (~400 ns each) and `identify_*()` (5–6 ns, `&'static str` return).
  Total parse+fields is dominated by the two `format_dn()` calls.
- **cryptography-x509**: parse-only records raw byte offsets for every field, so parse and
  field-access costs nearly collapse — parse+fields (1.43 µs) is almost the same as
  parse-only (1.45 µs). This is the deferred-everything architecture.
- **x509-parser and x509-cert**: eagerly decode everything at parse time, so field access is
  a free struct read. Parse+fields ≈ parse-only for them too.
- **NSS**: formats DNs to C strings at parse time, so field access is also free. The cost
  is entirely in the parse step.

### Traditional X.509 Certificates

| Certificate              | synta        | cryptography-x509 | x509-parser  | x509-cert    | NSS          |
| ------------------------ | ------------ | ----------------- | ------------ | ------------ | ------------ |
| cert_00 (NoPolicies)     | 1333.7 ns    | 1386.7 ns         | 1815.9 ns    | 2990.6 ns    | 7940.3 ns    |
| cert_01 (SamePolicies-1) | 1348.8 ns    | 1441.0 ns         | 2033.4 ns    | 3174.3 ns    | 7963.8 ns    |
| cert_02 (SamePolicies-2) | 1338.6 ns    | 1440.1 ns         | 2120.1 ns    | 3205.6 ns    | 8206.8 ns    |
| cert_03 (anyPolicy)      | 1362.4 ns    | 1468.3 ns         | 2006.2 ns    | 3194.5 ns    | 7902.4 ns    |
| cert_04 (AnyPolicyEE)    | 1232.9 ns    | 1424.7 ns         | 1968.6 ns    | 3168.1 ns    | 7913.1 ns    |
| **Average**              | **1323 ns**  | **1432 ns**       | **1989 ns**  | **3147 ns**  | **7985 ns**  |

rust-openssl and ossl averaged **15.1 µs** and **15.8 µs** respectively.

The gap between synta (1.32 µs) and cryptography-x509 (1.43 µs) is tighter here than in
parse-only (3.0×) because synta's field access includes two `format_dn()` calls (~800 ns
combined) that cryptography-x509 does for effectively free (its offsets were computed at
parse time). Synta leads by ~8% overall.

### Post-Quantum (ML-DSA) Certificates

| Certificate | synta        | cryptography-x509 | x509-parser  | x509-cert    | NSS          |
| ----------- | ------------ | ----------------- | ------------ | ------------ | ------------ |
| ML-DSA-44   | 1030.9 ns    | 1256.4 ns         | 1732.2 ns    | 2666.0 ns    | 7286.9 ns    |
| ML-DSA-65   | 1124.9 ns    | 1237.5 ns         | 1690.5 ns    | 2664.2 ns    | 7222.1 ns    |
| ML-DSA-87   | 1102.6 ns    | 1226.5 ns         | 1727.2 ns    | 2696.6 ns    | 7284.6 ns    |
| **Average** | **1086 ns**  | **1240 ns**       | **1717 ns**  | **2675 ns**  | **7265 ns**  |

synta's ML-DSA parse+fields (1.09 µs) is faster than its traditional parse+fields (1.32 µs)
because ML-DSA test certificates have shorter Distinguished Names (one attribute each in
issuer and subject vs multiple attributes in PKITS certs). The signature BIT STRING — which
is 2,420–4,627 bytes for ML-DSA — is accessed as a zero-copy slice with no size-dependent
cost. cryptography-x509 leads by only 14% here, down from its parity position in parse-only.

---

## Element vs Typed API

```bash
cargo bench -p synta-bench --bench comparison_typed
```

(`parsing_comparison` and `roundtrip` Criterion groups)

Three parsing modes are measured for an apples-to-apples comparison within synta:

| API                          | Mode           | Operation                                   | Avg time     |
| ---------------------------- | -------------- | ------------------------------------------- | ------------ |
| `Element` (`element_lazy`)   | Lazy O(1)      | Outer SEQUENCE tag+length only              | ~20 ns       |
| `Element` (`element_eager`)  | Full traversal | Recursive decode of every nested element    | ~1.59 µs     |
| `Certificate` (`typed`)      | Full RFC 5280  | Typed decode + named field construction     | ~481 ns      |
| `Element` (roundtrip)        | Lazy capture   | Lazy decode + re-emit raw captured bytes    | ~104 ns      |
| `Certificate` (roundtrip)    | Full cycle     | Full decode + re-encode all fields          | ~2.16 µs     |

**`element_lazy`** (~20 ns) is O(1): the decoder reads the outer SEQUENCE tag and length,
captures a borrowed slice of the content bytes, and returns. No child elements are decoded.
This is the cost of "entering" the certificate without inspecting it — useful as a baseline
for how fast the parser can recognise a DER boundary.

**`element_eager`** (~1.59 µs) is the fair comparison with typed decoding: a `traverse()`
function recursively matches every `Sequence`, `Set`, and `Tagged` variant and iterates
their children until all leaf values have been visited — the same set of DER elements that
typed decoding processes. At ~1.59 µs it is **3.3× slower** than typed decoding. The
overhead is runtime dynamic dispatch (`match el { ... }` at every node) and the per-element
`Result`-wrapping iterator protocol, repeated across all nesting levels. Typed decoding
replaces this with monomorphised, inlined call sites generated at compile time from the
`#[derive(Asn1Sequence)]` macro, so no dispatch decisions remain at runtime.

**`typed`** (~481 ns) performs the same DER traversal as `element_eager` via specialised
paths that also construct named struct fields and validate encoding constraints at compile
time.

**Roundtrip comparison** reveals another dimension:
- `element` roundtrip (~104 ns): lazy decode captures the certificate's raw bytes in O(1);
  the encoder re-emits them unchanged. The entire certificate is treated as an opaque blob.
- `typed` roundtrip (~2.16 µs): full decode constructs every named field, then re-encodes
  each field individually. The 20× difference vs element roundtrip reflects the cost of
  re-serialising a structured type rather than copying raw bytes.

> **Implication for the `bindings` benchmark**: `rust_element` performs the equivalent of
> `element_eager` — a full recursive traversal.  Its parse-only cost (~1.67 µs for
> traditional certs) is higher than `typed` for the same reason — generic traversal overhead
> — not because of the binding layer itself.

---

## Bindings Overhead

```bash
cargo bench -p synta-bench --bench bindings --features bench-bindings
```

All three binding layers parse the same `synta-certificate` backend; the numbers isolate
the cost introduced by each layer's API contract.

| Layer          | API                                          | Parse depth                         |
| -------------- | -------------------------------------------- | ----------------------------------- |
| `rust_typed`   | `synta_certificate::Certificate`             | Full RFC 5280                       |
| `rust_element` | `synta::Element` + full recursive traversal  | Equivalent to `element_eager`       |
| `c_ffi`        | `synta_certificate_parse_der` (`synta-ffi`)  | Full RFC 5280, copies fields to C-owned buffers |

`c_ffi` does everything `rust_typed` does, then additionally copies every field value into
a heap-allocated `Vec<u8>` to produce an opaque `SyntaCertificate` struct that can cross
the C ABI boundary safely. This means up to 14 allocations per certificate, each touching
heap memory. The owned-buffer copy cost grows with the size of large fields (signature,
public key), which is why `c_ffi` is more expensive on ML-DSA certs than on traditional ones.

### Traditional X.509 — Parse Only

| Binding        | cert_00    | cert_01    | cert_02    | cert_03    | cert_04    | Avg          |
| -------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 489.84 ns  | 473.68 ns  | 495.69 ns  | 499.09 ns  | 488.81 ns  | **489 ns**   |
| `rust_element` | 1552.5 ns  | 1693.8 ns  | 1702.7 ns  | 1687.7 ns  | 1693.8 ns  | **1666 ns**  |
| `c_ffi`        | 1746.7 ns  | 1813.0 ns  | 1788.2 ns  | 1816.6 ns  | 1834.5 ns  | **1800 ns**  |

`rust_element` is ~8% faster than `c_ffi` at parse time because `rust_element` holds
borrowed slices from the input buffer with no allocation, whereas `c_ffi` copies each field
into an owned buffer. The gap is relatively small here because the traditional cert's
signature and public key are only ~71 and ~294 bytes respectively.

### Traditional X.509 — Parse + All Fields

| Binding        | cert_00    | cert_01    | cert_02    | cert_03    | cert_04    | Avg          |
| -------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 1329.5 ns  | 1347.2 ns  | 1337.2 ns  | 1321.9 ns  | 1242.1 ns  | **1316 ns**  |
| `rust_element` | 1584.6 ns  | 1663.9 ns  | 1682.0 ns  | 1707.0 ns  | 1628.1 ns  | **1653 ns**  |
| `c_ffi`        | 2274.0 ns  | 2261.7 ns  | 2357.5 ns  | 2271.1 ns  | 2337.5 ns  | **2300 ns**  |

`rust_typed` is fastest: `identify_*()` returns `&'static str` for OID names with no
allocation; `issuer_raw` and `subject_raw` are zero-copy slices requiring only a pointer
read; `format_dn()` allocates but is the same cost across all bindings. `rust_element`'s
parse+fields cost converges with its parse-only cost because the field-access traversal is
already part of the recursive decode. `c_ffi` adds owned buffer copies on top of
`rust_typed`'s parse+fields cost, reaching 2.3 µs.

### Post-Quantum (ML-DSA) — Parse Only

| Binding        | ML-DSA-44  | ML-DSA-65  | ML-DSA-87  | Avg          |
| -------------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 472.77 ns  | 491.67 ns  | 475.44 ns  | **480 ns**   |
| `rust_element` | 1277.6 ns  | 1274.0 ns  | 1330.0 ns  | **1294 ns**  |
| `c_ffi`        | 1915.4 ns  | 1962.8 ns  | 2014.3 ns  | **1964 ns**  |

`rust_element` is faster on ML-DSA (1.29 µs) than on traditional certs (1.67 µs) because
ML-DSA certs have shorter Distinguished Names — less traversal work despite larger overall
cert size. `c_ffi` is slower on ML-DSA (1.96 µs vs 1.80 µs) because copying the
signature field (2,420–4,627 bytes) touches significantly more cache lines than copying a
71-byte RSA signature.

### Post-Quantum (ML-DSA) — Parse + All Fields

| Binding        | ML-DSA-44  | ML-DSA-65  | ML-DSA-87  | Avg          |
| -------------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 1030.2 ns  | 1064.7 ns  | 1126.4 ns  | **1074 ns**  |
| `rust_element` | 1293.0 ns  | 1301.8 ns  | 1279.2 ns  | **1291 ns**  |
| `c_ffi`        | 2468.4 ns  | 2607.7 ns  | 2642.1 ns  | **2573 ns**  |

`rust_typed` ML-DSA parse+fields (1.07 µs) is faster than traditional (1.32 µs) because
the shorter ML-DSA Distinguished Names reduce `format_dn()` cost. `rust_element` trails
`rust_typed` by only ~20% on ML-DSA (vs ~25% on traditional), for the same reason.
`c_ffi`'s ML-DSA cost grows 12% over its traditional cost, driven by the larger signature
buffer copy.

---

## PKCS#7 and PKCS#12 Certificate Extraction

```bash
cargo bench -p synta-bench --bench pkcs_formats
```

These benchmarks measure the cost of extracting X.509 certificates from PKCS#7 SignedData
blobs and PKCS#12 PFX archives — the two container formats used for CA bundles, trust store
imports, and inter-system certificate transfer.

### Test Inputs

| Name | Format | Size | Certs |
|------|--------|------|-------|
| `amazon_roots` | PKCS#7 DER | 1,848 B | 2 |
| `pem_isrg` | PKCS#7 PEM | 1,992 B | 1 |
| `unencrypted_3certs` | PKCS#12 DER | 3,539 B | 3 |
| `unencrypted_1cert_with_key` | PKCS#12 DER | 756 B | 1 cert + private key |

### Rust-Level Results (Criterion, release build)

| Benchmark | Time |
|-----------|------|
| `pkcs7/synta/amazon_roots` | **845 ns** |
| `pkcs7/synta/pem_isrg` | **3.62 µs** |
| `pkcs12/synta/unencrypted_3certs` | **2.39 µs** |
| `pkcs12/synta/unencrypted_1cert_with_key` | **1.41 µs** |

### Python vs cryptography (bench_pkcs.py, CPython 3.14+)

| Operation | `synta` | `cryptography` | Speedup |
|-----------|---------|----------------|---------|
| PKCS#7 DER (amazon_roots) | **1.55 µs** | 48.3 µs | ~31× |
| PKCS#7 PEM (pem_isrg) | **4.47 µs** | 37.4 µs | ~8× |
| PKCS#12 unencrypted (3 certs) | **2.11 µs** | 159.7 µs | ~76× |
| PKCS#12 unencrypted (1 cert + key) | **1.06 µs** | — | — |

The PyO3 boundary adds ~0.7–0.8 µs over the Rust-level times for the DER cases; PEM cases
are comparable because the base-64 decode dominates the parse cost for both layers.

### Why These Numbers Differ

**PKCS#7 DER:** synta walks the SignedData SEQUENCE with a single-pass forward scan, collecting
raw DER certificate byte spans with no intermediate allocation per certificate. The ~845 ns
Rust / ~1.55 µs Python cost grows sub-linearly with the number of embedded certificates.
`cryptography` constructs a full `PKCS7` object plus a Python list of `x509.Certificate`
objects, allocating Python heap objects for each embedded cert.

**PKCS#7 PEM:** both synta and `cryptography` must base-64 decode the PEM armor before the
DER parse. The PEM decode alone accounts for ~3 µs, which is why the PEM ratio (8×) is lower
than the DER ratio (31×). The DER parse cost after decoding is the same as the DER case.

**PKCS#12:** synta uses a pure-Rust PKCS#12 parser that identifies certificate bags in a single
forward pass through the `PFX → AuthenticatedSafe → SafeContents` nesting. No MAC verification
or key decryption is performed when only certificate extraction is requested. `cryptography`
calls OpenSSL `PKCS12_parse()`, which verifies the integrity MAC, decrypts the full archive
(even when the password is absent / empty), and constructs key objects — all mandatory steps in
the OpenSSL PKCS#12 API regardless of what the caller requests.

### Reproducing

```bash
# Rust (Criterion)
cargo bench -p synta-bench --bench pkcs_formats

# Python vs cryptography
python python/bench_pkcs.py
```

---

## Real-World CA Store Benchmarks

```bash
BENCH_CA_FEATURES=bench-nss,bench-ossl,bench-openssl \
  ./contrib/ci/local-ci.sh bench-ca-roots
```

These benchmarks test synta against the CA certificate databases that ship in production
operating systems. Unlike the PKITS comparison above, which uses five small identical-format
certs, the CA stores contain hundreds to thousands of certificates from many different CAs,
covering a wide range of DN complexity, extension sets, key types, and DER sizes. They
measure sustained throughput under realistic diversity.

### Mozilla NSS Root Store (`mozilla_ca_certs`)

180 root CA certificates from Mozilla's `certdata.txt` — the same trust anchor set shipped
by Fedora's `ca-certificates` package and embedded in the Mozilla NSS library. All 180 certs
are self-signed root CAs with diverse key types (RSA 2048/4096, ECDSA P-256/P-384) and DN
structures. The median cert by DER size is "Entrust.net Premium 2048 Secure Server CA"
(1,070 bytes); the benchmark uses this cert for single-certificate and field-access
sub-benchmarks to get stable results that are not sensitive to certificate-size outliers.

### CCADB V4 All Certificate Information (`ccadb_certs`)

9,898 certificates from the Common CA Database (CCADB), covering the full multi-level
hierarchy used by Mozilla, Chrome, Apple, and Microsoft:

| Depth | Count | Description                            |
| ----: | ----: | -------------------------------------- |
|     0 |   919 | Root CAs (self-signed)                 |
|     1 | 6,627 | Intermediates issued directly by roots |
|     2 | 2,212 | Two levels deep                        |
|     3 |   137 | Three levels deep                      |
|     4 |     3 | Four levels deep                       |

Intermediate CA certificates tend to have more complex DNs and more extensions than the root
CAs in the Mozilla store. The CCADB median cert is "Bayerische SSL-CA-2014-01" (10,432 bytes).

### ML-DSA Synthetic CA Hierarchy (`mldsa_certs`)

9,889 certificates generated by `tests/vectors/generate_mldsa_certs.py`, mirroring the
CCADB trust hierarchy with post-quantum signatures. Each CCADB certificate's subject DN
and full extension set are preserved; only the algorithm, key, and signature are replaced
with ML-DSA-65 or ML-DSA-87 (alternating by certificate index across the full run). The
hierarchy depth structure mirrors CCADB:

| Depth | Count | Description                            |
| ----: | ----: | -------------------------------------- |
|     0 |   919 | Root CAs (self-signed)                 |
|     1 | 6,627 | Intermediates issued directly by roots |
|     2 | 2,212 | Two levels deep                        |
|     3 |   137 | Three levels deep                      |
|     4 |     3 | Four levels deep                       |

Nine CCADB certificates were skipped: OpenSSL's `x509 -x509toreq -copy_extensions copy`
step failed to convert them to CSR form, typically because those certs use non-standard DER
encodings or critical extensions that the `x509toreq` pipeline cannot copy into a
PKCS#10 request. (The failures are in OpenSSL's cert→CSR conversion; synta parses all
9,898 original CCADB certs without error.) This leaves 9,889 of the original 9,898 certs
in the synthetic database.

The median cert by DER size is "TrustCor Basic Secure Site (CA1)" (6,705 bytes). ML-DSA
certs range from 5,530 B to 16,866 B; the distribution is shifted left relative to the
CCADB RSA/ECDSA median (10,432 B) because the smallest CCADB certs (compact root CAs with
few extensions) become the new median position after ML-DSA key replacement enlarges all
certs uniformly.

Run:

```bash
SYNTA_CERT_DB=mldsa cargo bench -p synta-bench --bench ccadb_certs
```

### Throughput Results

| Benchmark                     | Library      | Dataset                     | Time          | Throughput      |
| ----------------------------- | ------------ | --------------------------- | ------------- | --------------- |
| `synta_parse_all`             | synta        | Mozilla (180 certs)         | **87.8 µs**   | **2.0 M/sec**   |
| `nss_parse_all`               | NSS          | Mozilla (180 certs)         | 1.577 ms      | 114 K/sec       |
| `openssl_parse_all`           | rust-openssl | Mozilla (180 certs)         | 3.552 ms      | 50.7 K/sec      |
| `ossl_parse_all`              | ossl         | Mozilla (180 certs)         | 3.617 ms      | 49.8 K/sec      |
| `synta_parse_and_access`      | synta        | Mozilla (180 certs)         | **261 µs**    | **690 K/sec**   |
| `synta_build_trust_chain`     | synta        | Mozilla (180 certs)         | **11.6 µs**   | —               |
| `synta_parse_all`             | synta        | CCADB (9,898 certs)         | **5.10 ms**   | **1.94 M/sec**  |
| `nss_parse_all`               | NSS          | CCADB (9,898 certs)         | 106 ms        | 93 K/sec        |
| `openssl_parse_all`           | rust-openssl | CCADB (9,898 certs)         | 203 ms        | 48.8 K/sec      |
| `ossl_parse_all`              | ossl         | CCADB (9,898 certs)         | 214 ms        | 46.3 K/sec      |
| `synta_parse_and_access`      | synta        | CCADB (9,898 certs)         | **16.1 ms**   | **615 K/sec**   |
| `synta_parse_roots`           | synta        | CCADB (919 roots)           | **457.7 µs**  | **2.01 M/sec**  |
| `synta_parse_intermediates`   | synta        | CCADB (8,979 intermediates) | **4.735 ms**  | **1.90 M/sec**  |
| `synta_build_dependency_tree` | synta        | CCADB (9,898 certs)         | **559 µs**    | —               |
| `synta_parse_all`             | synta        | ML-DSA synth (9,889 certs)  | **5.78 ms**  | **1.71 M/sec**  |
| `nss_parse_all`               | NSS          | ML-DSA synth (9,889 certs)  | 103 ms       | 96.4 K/sec      |
| `openssl_parse_all`           | rust-openssl | ML-DSA synth (9,889 certs)  | 239 ms       | 41.4 K/sec      |
| `ossl_parse_all`              | ossl         | ML-DSA synth (9,889 certs)  | 256 ms       | 38.6 K/sec      |
| `synta_parse_and_access`      | synta        | ML-DSA synth (9,889 certs)  | **17.5 ms**  | **566 K/sec**   |
| `synta_parse_roots`           | synta        | ML-DSA synth (919 roots)    | **463 µs**   | **1.98 M/sec**  |
| `synta_parse_intermediates`   | synta        | ML-DSA synth (8,970 ints.)  | **5.10 ms**  | **1.76 M/sec**  |
| `synta_build_dependency_tree` | synta        | ML-DSA synth (9,889 certs)  | **549 µs**   | —               |

NSS is **18–21× slower** than synta across all three datasets; rust-openssl is **40–41×
slower** and ossl is **41–44× slower**. All three C-backed libraries successfully parse
ML-DSA certificates (NSS 3.120+ and OpenSSL 3.4+ support ML-DSA natively). NSS's absolute
parse time is nearly identical across CCADB traditional certs (106 ms) and ML-DSA synthetic
certs (103 ms) — confirming that NSS's dominant cost is eager DN formatting at parse time,
which depends on DN attribute count rather than the signature algorithm. The slightly lower
relative slowdown for NSS on ML-DSA (18× vs 21×) is entirely because synta is slower on
ML-DSA (5.78 ms vs 5.10 ms), not because NSS is faster.

synta's throughput is consistent at ~1.7–2.0 M certs/sec across all three datasets,
confirming linear O(n) scaling. Parse rate is slightly lower for the ML-DSA synthetic
hierarchy (1.71 M/sec) than for the CCADB traditional hierarchy (1.94 M/sec) because the
larger ML-DSA SubjectPublicKeyInfo and signature BIT STRING fields add bytes to the
tag+length-header scan that synta performs at parse time. The intermediates-only
sub-benchmark is slightly lower than roots-only in each dataset (1.76 M/sec vs 1.98 M/sec
for ML-DSA; 1.90 M/sec vs 2.01 M/sec for CCADB) because intermediate CAs tend to have
more complex DNs and extension lists.

### Single-Certificate Performance (Hot Cache)

Each benchmark suite includes a `per_cert` sub-benchmark that repeatedly parses a single
median-sized certificate whose DER bytes remain in the L1/L2 cache. This isolates parse
throughput from dataset-iteration overhead and memory access patterns.

| Benchmark                    | Cert size | Parse       | Parse + access |
| ---------------------------- | --------- | ----------- | -------------- |
| Mozilla `synta_per_cert`     | 1,070 B   | **487 ns**  | **1,611 ns**   |
| CCADB `synta_per_cert`       | 10,432 B  | **520 ns**  | **1,634 ns**   |
| ML-DSA `synta_per_cert`      | 6,705 B   | **523 ns**  | **1,952 ns**   |

The Mozilla cert (1,070 B) parses 7% faster than the CCADB median (10,432 B) because the
larger cert has more bytes in tag+length headers of its extension list and SubjectPublicKeyInfo
— the only parts synta reads at parse time. The ML-DSA synthetic median (6,705 B) parses in
523 ns — nearly identical to the CCADB median (520 ns) despite different overall cert sizes
— because synta's parse-time work is bounded by the count of tag+length headers, not the
byte length of opaque fields such as the signature BIT STRING or public key BIT STRING.

The parse+access cost for the ML-DSA median (1,952 ns) is higher than the CCADB median
(1,634 ns) because the ML-DSA median cert ("TrustCor Basic Secure Site (CA1)") has a more
complex subject DN (324 ns vs 292 ns) and longer validity strings (231 ns vs 206 ns) than
"Bayerische SSL-CA-2014-01" — the ML-DSA key replacement shifts which cert lands at the
median position, so the two medians are not the same underlying CA.

The ~5 ns difference between the hot-cache per-cert times and the whole-store average
confirms that dataset-iteration memory access contributes negligible overhead to the
whole-store benchmarks.

### Per-Field Access Latency

Pre-parsed certificate, single field read, no allocation unless noted:

| Field                                          | Mozilla (1,070 B) | CCADB (10,432 B) | ML-DSA (6,705 B) | Notes                           |
| ---------------------------------------------- | ----------------: | ---------------: | ---------------: | ------------------------------- |
| `issuer_raw` / `subject_raw`                   |    4.1 / 4.1 ns  |    4.2 / 4.1 ns  |    4.5 / 4.4 ns  | Zero-copy slice                 |
| `public_key_bytes` / `signature_bytes`         |    4.1 / 4.1 ns  |    4.2 / 4.2 ns  |    4.6 / 4.4 ns  | Zero-copy slice                 |
| `signature_algorithm` / `public_key_algorithm` |    5.9 / 5.4 ns  |    5.9 / 5.5 ns  |    6.3 / 6.4 ns  | OID → `&'static str`            |
| `serial_number`                                |        10.9 ns   |        6.8 ns    |        7.5 ns    | Integer → i64, length-dependent |
| `validity`                                     |        180 ns    |        206 ns    |        231 ns    | Two time-string allocations     |
| `issuer_dn`                                    |        401 ns    |        224 ns    |        246 ns    | `format_dn()` → `String`        |
| `subject_dn`                                   |        404 ns    |        292 ns    |        324 ns    | `format_dn()` → `String`        |

Zero-copy fields (`issuer_raw`, `subject_raw`, `public_key_bytes`, `signature_bytes`) cost
~4–5 ns — the price of reading a pointer and length from a struct field. The slightly higher
cost for CCADB and ML-DSA fields vs Mozilla is within measurement noise.

`identify_signature_algorithm()` and `identify_public_key_algorithm()` match the OID
component array against a static table and return `&'static str` — no allocation, no string
formatting. The ~5–6 ns cost is a few comparisons and a pointer return.

`serial_number` cost depends on the integer's byte length: the Entrust Mozilla cert carries
a 16-byte serial number (parsed via `SmallVec<[u8; 16]>`), while the CCADB and ML-DSA
synthetic medians have shorter serials. At 10.9, 6.8, and 7.5 ns respectively, all are
negligible.

`validity` (~180–231 ns) allocates two strings: UTCTime and GeneralizedTime are formatted
from their raw DER bytes into owned `String`s. The two calls account for essentially all
of the cost; the `YYMMDDHHMMSSZ` to RFC 3339 formatting is the dominant work.

`format_dn()` is the most variable field: it walks the Name DER bytes, decodes each
SEQUENCE OF SET OF SEQUENCE, looks up each attribute OID by name, and formats the result
into an owned `String`. The Mozilla cert's issuer DN is more complex (multiple attributes,
longer values: 401 ns) than the CCADB median (224 ns) or the ML-DSA synthetic median
(246 ns). The ML-DSA synthetic median's subject DN (324 ns) is slightly more expensive
than the CCADB median (292 ns) because a different cert occupies the median position after
key replacement. `format_dn()` cost is proportional to the DN's attribute count and string
lengths.

Each `mozilla_ca_cert_fields`, `ccadb_cert_fields`, and `mldsa_cert_fields` benchmark uses
the **median-sized** certificate by DER byte length. Atypically small certs (e.g. the
889-byte GlobalSign Root CA) amplify cache-line alignment effects and produce misleading
per-field regressions across branches.

### Trust Hierarchy Construction

- **Mozilla** `build_trust_chain` (11.6 µs): builds a `HashMap<subject_bytes, index>` keyed
  on the DER Name bytes pre-extracted from each certificate's `CKA_SUBJECT` entry in
  `certdata.txt`. The Name bytes are identical to `issuer_raw.as_bytes()` on any certificate
  issued by that CA, so chain lookup requires no re-parsing. 180 entries complete in 11.6 µs
  — dominated by hash computation over 10–200 byte keys.
- **CCADB** `build_dependency_tree` (559 µs): builds a `HashMap<sha256_fingerprint, index>`
  and resolves each certificate's `Parent SHA-256 Fingerprint` CSV field over 9,898 entries.
  The majority of the cost is 9,898 SHA-256 string hash operations; actual certificate
  parsing is a small fraction.
- **ML-DSA synthetic** `build_dependency_tree` (549 µs): identical structure to CCADB —
  same SHA-256 fingerprint `HashMap`, same parent-resolution logic — over 9,889 entries.
  Time is nearly identical to CCADB (549 µs vs 559 µs) because the cost is dominated by
  SHA-256 string hashing over the CSV fingerprint values, independent of certificate content
  or algorithm.

### Why C Libraries Are Slower

`CERT_NewTempCertificate` (NSS) and OpenSSL's `d2i_X509` perform significantly more work
per certificate than synta:

1. **Eager DN formatting** — NSS formats the issuer and subject Distinguished Names into
   internal C strings during `CERT_NewTempCertificate`, even when the caller never reads
   them. Distinguished Name formatting is the single most expensive operation in certificate
   parsing; doing it unconditionally at parse time accounts for roughly 80% of NSS's total
   parse cost. OpenSSL decodes DN structure eagerly as well.

2. **Arena and heap allocation** — each NSS certificate allocates a `PLArena` block and
   copies the full DER buffer into it (`copyDER = 1`). OpenSSL allocates from the C heap.
   These allocations are additional work beyond decoding.

3. **Library state and locking** — NSS acquires internal locks on every
   `CERT_NewTempCertificate` call to update the certificate cache, even when the resulting
   certificate is marked as temporary. This serialises concurrent parsing in multi-threaded
   applications.

4. **FFI boundary costs** — the `rust-openssl` and `ossl` measurements include the overhead
   of crossing from Rust into the C library via `extern "C"` calls and pointer marshalling.

synta defers all of (1): `issuer` and `subject` are stored as `RawDer<'a>` (borrowed byte
spans) and decoded only when the caller calls `format_dn()`. There is no locking, no arena,
and no FFI boundary.

---

## ASN.1 Primitive Performance

```bash
cargo bench -p synta-bench --bench encoding
cargo bench -p synta-bench --bench derive_performance
cargo bench -p synta-bench --bench constrained_integers
```

### Tag/Length Parsing

| Operation                | Time     |
| ------------------------ | -------- |
| Short length (1-byte)    | 6.09 ns  |
| Long length (multi-byte) | 6.95 ns  |

Tag+length parsing is the inner loop of the DER decoder. The 0.86 ns difference between
short and long lengths reflects the cost of the extra branch and multi-byte length assembly
in the BER/DER long-form path. Both paths are branch-predicted and cache-resident in
production workloads.

### Integer Encode/Decode

| Operation             | Time    |
| --------------------- | ------- |
| Encode small (42)     | 31.3 ns |
| Encode medium (i64::MAX) | 34.4 ns |
| Encode large (i128::MAX) | 31.5 ns |
| Decode small          | 13.2 ns |
| Decode medium         | 13.4 ns |
| Roundtrip integer_42  | 43.9 ns |

Decode cost (~13 ns) is nearly independent of integer size because the decoder reads the
tag+length and then slices the content bytes into a `SmallVec<[u8; 16]>` — a copy of at
most 16 bytes on the stack. Encode cost varies slightly by value because the encoder must
determine the minimum byte representation and handle the sign extension byte for negative
values.

### Constrained INTEGER — Native Primitive Types

```bash
cargo bench -p synta-bench --bench constrained_integers
```

When a schema declares `INTEGER (lo..hi)`, `synta-codegen` selects the smallest native
Rust primitive (`u8`, `u16`, `u32`, `u64`, `i8`, `i16`, `i32`, `i64`) that covers the
constraint range, instead of the general-purpose `Integer` wrapper. This benchmark
measures the memory and runtime cost difference between the two representations using
four constrained newtype examples with non-trivial ranges:

- `ConstrainedU8` — `INTEGER (0..200)`, stored as `u8`
- `ConstrainedU16` — `INTEGER (0..10000)`, stored as `u16`
- `ConstrainedI16` — `INTEGER (-1000..1000)`, stored as `i16`
- `ConstrainedI64` — `INTEGER (-1000000000..1000000000)`, stored as `i64`

The primary benefit is memory layout; the trade-off is a slightly slower decode path due
to extra validation steps that enforce the declared constraint at decode time.

**Struct size:**

| Field type                 | Size |
| -------------------------- | ---- |
| `Integer` (unconstrained)  | 32 B |
| `u8` (0..=200)             | 1 B  |
| `u16` (0..=10 000)         | 2 B  |
| `i16` (−1 000..=1 000)     | 2 B  |
| `i64` (−1e9..=1e9)         | 8 B  |

A struct with three `Integer` fields occupies 96 B; the equivalent with `u8` + `u16` +
`i64` fields occupies 16 B — 6× smaller. In schemas with many integer fields (e.g.,
Kerberos KDC-REQ-BODY, SNMP PDUs) this significantly reduces cache pressure when
processing large volumes of messages.

**Decode overhead per field** (run date: 2026-03-08):

| Type                         | 1-byte wire | 2/4-byte wire |
| ---------------------------- | ----------- | ------------- |
| `Integer` (unconstrained)    | 13.7 ns     | 14.2 ns       |
| `u8_constrained` (0..=200)   | 20.6 ns     | 20.9 ns       |
| `u16_constrained` (0..=10k)  | 21.9 ns     | 22.5 ns       |
| `i16_constrained` (±1 000)   | 21.9 ns     | 23.0 ns       |
| `i64_constrained` (±1e9)     | 20.7 ns     | 22.0 ns       |

Each constrained decode adds ~7 ns over raw `Integer`: one `as_i64()` call (sign-extend
bytes to `i64`), one `try_from()` narrowing cast for sub-`i64` types, and one range check
in `new()`. The overhead is uniform across widths — the dominant factor is the extra
function calls, not the value size or wire width.

**Encode overhead per field:**

| Type                   | Time    |
| ---------------------- | ------- |
| `Integer` (baseline)   | 39.5 ns |
| `u8_constrained`       | 41.9 ns |
| `u16_constrained`      | 42.9 ns |
| `i16_constrained`      | 41.7 ns |
| `i64_constrained`      | 42.6 ns |

Encode adds ~3 ns: the constrained path calls `Integer::from_i64(self.0 as i64)` to
create a temporary `Integer` before encoding, whereas raw `Integer` encodes its stored
bytes directly. Both paths avoid heap allocation because `Integer` uses a 16-byte
inline `SmallVec`.

**Three-field struct decode/encode:**

| Struct                               | Decode   | Encode    |
| ------------------------------------ | -------- | --------- |
| `IntegerStruct` (3×`Integer`)        | 62.5 ns  | 87.0 ns   |
| `ConstrainedStruct` (u8 + u16 + i64) | 80.3 ns  | 105.2 ns  |

The overhead scales linearly: 3 fields × ~6 ns per-field decode overhead ≈ 18 ns extra.
For certificate parsing (one struct at a time, hot cache) the extra 18 ns is negligible;
for bulk message processing where many structs are simultaneously live, the 6× struct-size
reduction materially improves cache efficiency.

### OctetString Encode/Decode

| Size     | Encode   | Decode   |
| -------- | -------- | -------- |
| 16 bytes | 31.3 ns  | 19.1 ns  |
| 64 bytes | 69.3 ns  | 19.5 ns  |
| 256 bytes | 73.5 ns | 22.2 ns  |
| 1024 bytes | 83.0 ns | 26.9 ns |

**Decode is nearly constant-time** with respect to payload size: `OctetStringRef<'a>`
borrows a slice of the input buffer with no copy. The small growth from 16-byte to
1024-byte decode (19.1 → 26.9 ns) is from cache-line effects on the returned slice
struct, not from reading the content bytes.

**Encode** grows with payload size because the encoder must copy the bytes into the output
buffer. The encode path uses `OctetStringRef` internally to avoid a redundant allocation
before copying, which accounts for the significant improvement in the 1024-byte case
compared to earlier measurements that used an owned `OctetString` (which added a heap
allocation before the copy).

### Sequence Encode/Decode

| Operation                  | Time     |
| -------------------------- | -------- |
| Encode simple (3 elements) | 87.9 ns  |
| Encode nested (2 levels)   | 140.8 ns |
| Decode simple (3 elements) | 12.8 ns  |
| Roundtrip complex sequence | 149.2 ns |

**Sequence decode is O(1)**: `Sequence` captures raw content bytes as a borrowed slice at
decode time. The 12.8 ns covers only tag+length parsing and content-slice setup — no
elements are decoded. Elements are decoded lazily on first iteration.

Sequence encode uses a backpatching strategy: the encoder writes a placeholder length, encodes
all child elements, then patches the length field. The nested (2-level) encode (140.8 ns)
is roughly twice the simple encode (87.9 ns) because both the outer and inner sequences
require length-field backpatching.

The roundtrip complex sequence (149.2 ns) is lower than encode+decode separately because
Criterion measures total wall-clock time including the decode half, which is O(1).

### Derive Macro Overhead

| Operation | Manual   | Derived  | Overhead |
| --------- | -------- | -------- | -------- |
| Encode    | 77.3 ns  | 77.4 ns  | ~0%      |
| Decode    | 62.9 ns  | 64.1 ns  | +2%      |
| Roundtrip | 128.4 ns | 134.0 ns | +4%      |

Derive macros generate code that is indistinguishable from hand-written implementations
within Criterion's measurement noise. The +2% decode overhead and +4% roundtrip overhead
are within the confidence interval of the measurement and should not be treated as
meaningful regressions. The compiler fully inlines and specialises the generated trait
implementations.

---

## Memory Usage

- **Stack per parse:** ~2 KB (traditional certs), ~4 KB (post-quantum)
- **Heap allocations (parse-only):** zero for Distinguished Names, OIDs, BIT STRINGs, and
  OCTET STRINGs — all stored as borrowed slices from the input buffer
- **Heap allocations (parse+fields):** two `String` allocations for `format_dn()` (issuer
  and subject), plus string-type copies for string attributes within each DN
- **L1 cache hit rate:** > 95% for certificate parsing; the hot-cache per-cert benchmark
  (487–520 ns) is within 5% of the whole-store average (482–519 ns per cert)

---

## Benchmark Methodology

### Setup

- **Tool:** Criterion.rs 0.8 (Rust), `time.perf_counter` (Python)
- **Criterion samples:** 100 per benchmark, 20 for whole-store benchmarks
- **Warmup:** 3 s; measurement window: 5 s (adaptive for whole-store)
- **Build:** `--release` profile, full optimisations, no debug symbols
- **CPU isolation:** benchmarks run on an otherwise idle system; no explicit CPU pinning

### Test Certificates

- **Traditional:** PyCA cryptography PKITS (RSA-2048/ECDSA, 914–968 bytes)
- **Post-quantum:** IETF LAMPS ML-DSA reference certs (3,992–7,479 bytes)
- **CA store:** Mozilla NSS `certdata.txt` (180 root CAs); CCADB V4 all-certs download
  (9,898 root + intermediate CAs, multiple decade-spanning CSV endpoints); ML-DSA synthetic
  CA hierarchy generated by `tests/vectors/generate_mldsa_certs.py` (9,889 certs mirroring
  the CCADB hierarchy with ML-DSA-65/87 signatures, requires OpenSSL 3.4+)

### Measurement Scope

| Benchmark                          | What is timed                                              | What is excluded                              |
| ---------------------------------- | ---------------------------------------------------------- | --------------------------------------------- |
| Library comparison (parse-only)    | DER decoding + ASN.1 struct population                     | File I/O, PEM decode, signature verification  |
| Library comparison (parse+fields)  | Parse + all named field reads                              | —                                             |
| Element vs Typed (lazy)            | Outer SEQUENCE tag+length only                             | Child element decoding                        |
| Element vs Typed (eager)           | Full recursive decode of all elements                      | File I/O, PEM decode                          |
| Element vs Typed (typed)           | Full RFC 5280 typed decode                                 | File I/O, PEM decode                          |
| Bindings overhead                  | Per binding layer parse / parse+fields                     | —                                             |
| PKCS#7/12 (Rust)                   | Certificate extraction from container (DER in, `Vec<DER>` out) | File I/O, signature verification          |
| PKCS#7/12 (Python)                 | `pem_to_der()` / `pkcs12_certs_from_der()` call           | File I/O                                      |
| CA store (whole-store)             | Iterating + parsing all certs in dataset                   | Dataset loading (done once before loop)       |
| CA store (per-cert)                | Single-cert parse, bytes hot in L1/L2                      | —                                             |
| Python parse-only                  | `Certificate.from_der()` call                              | GIL amortised over loop                       |
| Python parse+fields                | `from_der()` + all field accesses, new cert per iteration  | —                                             |
| Python field access (warm)         | All getters on a pre-parsed cert; caches already populated | Parse cost, first-access decode               |

### Reproducing

```bash
# Library comparison (parse-only + parse+fields, traditional + ML-DSA)
BENCH_COMPARE_FEATURES=bench-compare ./contrib/ci/local-ci.sh bench-compare

# Element vs Typed API (no extra feature flag needed)
cargo bench -p synta-bench --bench comparison_typed

# Bindings overhead (rust_typed, rust_element, c_ffi)
./contrib/ci/local-ci.sh bench-bindings

# CA store benchmarks with C library comparisons (CCADB and Mozilla)
BENCH_CA_FEATURES=bench-nss,bench-ossl,bench-openssl \
  ./contrib/ci/local-ci.sh bench-ca-roots

# ML-DSA synthetic CA hierarchy (generate first if not present)
python3 tests/vectors/generate_mldsa_certs.py   # requires OpenSSL 3.4+
SYNTA_CERT_DB=mldsa BENCH_CA_FEATURES=bench-nss,bench-ossl,bench-openssl \
  ./contrib/ci/local-ci.sh bench-ca-roots

# Python benchmark (X.509 certificate parsing)
cd synta-python && maturin develop --release && cd ..
python python/bench_certificate.py             # parse-only, parse+fields
python python/bench_certificate.py --per-field # also per-field getter breakdown

# Python benchmark (PKCS#7 / PKCS#12 certificate extraction)
python python/bench_pkcs.py
```

Criterion writes HTML reports to `target/criterion/`. Use `--show-results` with
`local-ci.sh` to display a summary table after the run completes.

---

## Recommendations

### When to choose synta

- **Parse-only throughput** (TLS chain checking, CT log scanning, bulk certificate filtering):
  synta is fastest by 3× over the next-best pure-Rust library and 16–33× over C libraries.
- **Parse + all fields**: synta leads all pure-Rust implementations; access is structured
  (named fields, typed return values) rather than offset-based.
- **Post-quantum certificates**: parse time is size-independent — a 7 KB ML-DSA-87 cert
  parses as fast as a 900 B RSA cert due to zero-copy `BitStringRef<'a>` for large payloads.
- **No C dependencies**: all pure Rust; no linking to OpenSSL, NSS, or libtasn1.

**Best practices for maximum performance:**

1. Use typed structures with derive macros (`#[derive(Asn1Sequence)]`) rather than generic
   `Element` — 3.3× faster than equivalent `element_eager` traversal.
2. Use `identify_signature_algorithm()` and `identify_public_key_algorithm()` for OID names
   — returns `&'static str` with no allocation.
3. Use `format_dn()` only when the string representation is actually needed — it allocates.
   Use `issuer_raw()` / `subject_raw()` for byte-level comparison or caching.
4. Use zero-copy types (`BitStringRef<'a>`, `OctetStringRef<'a>`, `RawDer<'a>`) for large
   fields to avoid allocation at parse time.

### When to choose x509-parser

- Need typed access to certificate extensions as an indexed, named collection.
- Need a mature, widely deployed pure-Rust implementation with broad ecosystem adoption.

### When to choose cryptography-x509

- Already using the PyCA `cryptography` Python package and need its full API (signature
  verification, extension parsing, key operations, PEM/DER serialisation).
- Python-first workflow where cryptography ecosystem compatibility matters more than
  parse throughput.

---

## See Also

- [POST_QUANTUM_OIDS.md](POST_QUANTUM_OIDS.md) — post-quantum cryptography OID reference
- [../synta-bench/README.md](../synta-bench/README.md) — benchmark suite documentation
- [../python/bench_certificate.py](../python/bench_certificate.py) — Python X.509 benchmark script
- [../python/bench_pkcs.py](../python/bench_pkcs.py) — Python PKCS#7/12 benchmark script
- [../README.md](../README.md) — main documentation with quick start and examples