synta 0.2.5 - Docs.rs

# Performance


Measured with `python/bench_certificate.py`. Run after `maturin develop --release` or
`maturin build --release --interpreter python3` and installing the wheel.
The benchmark uses Criterion's **Linear** sampling mode (100 samples, linearly
increasing iteration counts; see `python/criterion_compat.py`), so timings are
directly comparable to the Rust Criterion benchmarks in `synta-bench`.

> **Note:** Always verify the installed `.so` is a release build after
> `maturin develop --release` — the file size should be ≈1.4 MB (release)
> not ≈20 MB (debug). If in doubt, copy `target/release/lib_synta.so`
> manually to `python/synta/_synta.abi3.so`.

## Parse-only (call only, no field access)

Two Python parse-only modes are benchmarked, corresponding to the two Rust variants:

| Certificate set | `synta` lazy (`from_der`) | `synta_full` eager (`full_from_der`) | `cryptography.x509` | Rust `rust_shallow` | Rust `rust_typed` |
|----------------|---------------------------|--------------------------------------|---------------------|---------------------|-------------------|
| Traditional (~900 B, 5 certs) | **0.16 µs** | 0.72–0.76 µs | 1.62–1.68 µs | 0.019–0.020 µs | 0.50–0.51 µs |
| ML-DSA-44 (3,992 B) | **0.17 µs** | 0.74 µs | 1.55 µs | 0.020 µs | 0.54 µs |
| ML-DSA-65 (5,521 B) | **0.17 µs** | 0.73 µs | 1.55 µs | 0.019 µs | 0.49 µs |
| ML-DSA-87 (7,479 B) | **0.17 µs** | 0.73 µs | 1.54 µs | 0.020 µs | 0.49 µs |

**Comparison note:** `synta` (`from_der`) and Rust `rust_shallow` perform the same
4-operation envelope scan. The ~0.14 µs gap between them (0.16 µs vs 0.019 µs) is
pure PyO3 overhead: GIL acquisition, `Py<PyBytes>` setup, and constructing a
`PyCertificate` struct with 17 `OnceLock` fields. `synta_full` (`full_from_der`) and
Rust `rust_typed` both do a complete RFC 5280 decode; the ~0.23 µs gap is the same
PyO3 overhead amortised over the full parse.

For parse-only workloads the honest comparison with `cryptography.x509` (also lazy) is
`synta` vs `cryptography.x509` — **~10× faster** for traditional certs, **~9× faster**
for ML-DSA certs. For full-decode workloads the apples-to-apples comparison is
`synta_full` vs `cryptography.x509` — **~2.3× faster**.

`from_der` holds a `Py<PyBytes>` reference to the caller's bytes object — no copy of
the DER data. Parse time is flat across ML-DSA cert sizes: the large signature BIT
STRING is not decoded at all during the shallow scan.

## Parse + access all fields (cold cache — fresh cert each iteration)

| Certificate set | `synta` (PyO3, 19 fields) | `cryptography.x509` (9 fields) |
|----------------|---------------------------|--------------------------------|
| Traditional (~900 B, 5 certs) | **3.30–3.42 µs** | 15.90–16.29 µs |
| ML-DSA-44 (3,992 B) | **3.71 µs** | 13.41 µs |
| ML-DSA-65 (5,521 B) | **3.63 µs** | 13.14 µs |
| ML-DSA-87 (7,479 B) | **3.71 µs** | 13.38 µs |

synta is **~4.8× faster** than `cryptography.x509` for traditional certs (parse+fields,
19 vs 9 fields). ML-DSA certs measure **~3.6× faster** — the LAMPS WG test certs are
simpler (fewer extensions) than the NIST PKITS certs, so `cryptography.x509` processes
fewer extension fields, reducing its parse+fields time. Requires `cryptography` ≥ 44.0
(FIPS 204) for ML-DSA `public_key()` decoding; older versions raise `UnsupportedAlgorithm`.

## Field access only (warm cache — same cert, repeated access)

| Certificate set | `synta` (PyO3, 19 fields) | `cryptography.x509` stock | `cryptography.x509` perf-opt |
|----------------|---------------------------|---------------------------|------------------------------|
| Traditional (~900 B, 5 certs) | **0.41–0.43 µs** | ~14.8 µs | 0.91–0.95 µs |
| ML-DSA-44 (3,992 B) | **0.41 µs** | ~11.2 µs | 2.21–2.28 µs |
| ML-DSA-65 (5,521 B) | **0.41 µs** | ~11.2 µs | 2.21–2.28 µs |
| ML-DSA-87 (7,479 B) | **0.42 µs** | ~11.1 µs | 2.21–2.28 µs |

`synta` warm-cache access (~0.42 µs) is 19 `clone_ref` calls — essentially free per field.

**Stock `cryptography.x509`** memoises only `extensions` (via `PyOnceLock`). The other
8 fields (`subject`, `issuer`, `serial_number`, `not_valid_before_utc`,
`not_valid_after_utc`, `signature`, `signature_hash_algorithm`, `public_key`) are
re-derived from the zero-copy in-memory ASN.1 structure on every Python access, each
allocating a new Python object. Warm and cold times are therefore close (~11–15 µs).

**`cryptography` [PR #14441](https://github.com/pyca/cryptography/pull/14441)** adds
`OnceLock` caching for `issuer`, `subject`, `public_key`, and `signature_algorithm` on
`Certificate`, and caching on `CertificationRequest`, `CertificateList`, and
`OCSPResponse`. This brings traditional warm access from ~14.8 µs down to
**0.91–0.95 µs** — a ~16× improvement. synta remains **~2.2× faster** for traditional
warm access and **~5× faster** for ML-DSA (larger key objects have higher `PyBytes` copy
cost even on the cached path). Parse-only and cold parse+fields times are unaffected by
the caching changes.

## PKCS#7 and PKCS#12 extraction

Measured with `python/bench_pkcs.py`. Benchmark IDs match the Rust Criterion IDs in
`synta-bench/benches/pkcs_formats.rs` exactly (`pkcs7/{synta,cryptography}/value`,
`pkcs12/{synta,cryptography}/value`).

| Benchmark ID | Input | `synta` | `cryptography` | Speedup |
|---|---|---|---|---|
| `pkcs7/…/amazon_roots` | 1,848 B DER/BER, 2 certs | **814 ns** (Rust) / **1.27 µs** (Py) | 41.7 µs | ~33× |
| `pkcs7/…/pem_isrg` | 1,992 B PEM, 1 cert | **4.13 µs** (Rust) / **4.14 µs** (Py) | 32.1 µs | ~8× |
| `pkcs12/…/unencrypted_3certs` | 3,539 B, 3 certs | **1.13 µs** (Rust) / **1.80 µs** (Py) | 134 µs | ~75× |
| `pkcs12/…/unencrypted_1cert_with_key` | 756 B, 1 cert + key | **667 ns** (Rust) / **0.96 µs** (Py) | — | — |

The `unencrypted_1cert_with_key` vector (`cert-none-key-none.p12`) uses a non-standard
format that `cryptography` cannot parse; it is benchmarked as synta-only to exercise
the key-bag-skipping code path.

The ~0.3–0.7 µs gap between Rust and Python times is the usual PyO3 overhead: GIL
acquisition, `PyBytes` allocation for each returned certificate, and `PyList` construction.
The `pem_isrg` Python time (4.14 µs) is close to the Rust time because
PEM decoding and base64 allocation are included in the timed loop.

**`cryptography` comparison note:** The `amazon_roots` DER benchmark triggers a
`UserWarning` from `cryptography` — the file uses BER indefinite-length encoding
(`0x30 0x80…`), which cryptography handles via an internal fallback with a deprecation
warning. synta accepts BER transparently with no warning.

## Architecture notes

**Parse-only:** `synta.Certificate.from_der()` holds a `Py<PyBytes>` strong
reference to the caller's bytes object (no copy). It performs only a 4-operation shallow
envelope scan (outer SEQUENCE tag+length, TBSCertificate SEQUENCE tag+length) — equivalent
to `synta_certificate::validate_envelope()` in Rust — and records the TBS byte range.
The full recursive `Certificate::decode()` is deferred to the first getter call.
`synta.Certificate.full_from_der()` performs the same shallow scan and then immediately
triggers the full decode, so all 19 field caches are warm before any Python code accesses them.

**Parse+fields:** All 19 getters cache their Python object in an
`OnceLock<Py<T>>`. String-returning getters (`issuer`, `subject`, `signature_algorithm`,
`signature_algorithm_oid`, `public_key_algorithm`, `public_key_algorithm_oid`, `not_before`,
`not_after`) store `Py<PyString>`. Bytes-returning getters (`signature_value`, `public_key`,
`tbs_bytes`, `issuer_raw_der`, `subject_raw_der`) store `Py<PyBytes>`. Optional getters
(`signature_algorithm_params`, `public_key_algorithm_params`, `extensions_der`) store
`Option<Py<PyBytes>>`. `to_der` skips the lock entirely — direct `clone_ref` of the
stored `Py<PyBytes>` that was passed to `from_der`.

**`#[pyclass(frozen)]`:** `PyCertificate` is declared `frozen`, which removes the
8-byte PyO3 borrow-tracking field (`borrow_flag: Cell<isize>`) from every instance
and eliminates per-getter `Cell::get/set` calls. This gives a **~26% speedup** on the
warm-path field-access benchmark.

**ML-DSA parse+fields:** The first-call `PyBytes` copies of `signature_value`
(2,420–4,627 bytes) and `public_key` (1,312–1,952 bytes) add ~0.3 µs over traditional
certs (3.63–3.71 µs vs 3.30–3.42 µs). Subsequent accesses use `clone_ref` at
~0.41 µs total regardless of cert size.

For the full binding-layer comparison (Rust `rust_shallow` / `rust_typed` / `rust_element` /
`c_ffi` / Python `synta` / `synta_full` / `cryptography_x509`, parse-only and parse+fields)
see the [Performance book](../../../perf/src/quick-reference.md).

See also [Development](development.md) for how to run benchmarks.