synta 0.2.5

ASN.1 parser, decoder, and encoder library with DER/BER support and C FFI
Documentation
# Element vs Typed API and Bindings Overhead

## Element vs Typed API

```bash
cargo bench -p synta-bench --bench comparison_typed
```

(`parsing_comparison` and `roundtrip` Criterion groups)

Three parsing modes are measured for an apples-to-apples comparison within synta:

| API                          | Mode           | Operation                                   | Avg time     |
| ---------------------------- | -------------- | ------------------------------------------- | ------------ |
| `Element` (`element_lazy`)   | Lazy O(1)      | Outer SEQUENCE tag+length only              | ~20 ns       |
| `Element` (`element_eager`)  | Full traversal | Recursive decode of every nested element    | ~1.59 µs     |
| `Certificate` (`typed`)      | Full RFC 5280  | Typed decode + named field construction     | ~481 ns      |
| `Element` (roundtrip)        | Lazy capture   | Lazy decode + re-emit raw captured bytes    | ~104 ns      |
| `Certificate` (roundtrip)    | Full cycle     | Full decode + re-encode all fields          | ~2.16 µs     |

**`element_lazy`** (~20 ns) is O(1): the decoder reads the outer SEQUENCE tag and length,
captures a borrowed slice of the content bytes, and returns. No child elements are decoded.
This is the cost of "entering" the certificate without inspecting it — useful as a baseline
for how fast the parser can recognise a DER boundary.

**`element_eager`** (~1.59 µs) is the fair comparison with typed decoding: a `traverse()`
function recursively matches every `Sequence`, `Set`, and `Tagged` variant and iterates
their children until all leaf values have been visited — the same set of DER elements that
typed decoding processes. At ~1.59 µs it is **3.3× slower** than typed decoding. The
overhead is runtime dynamic dispatch (`match el { ... }` at every node) and the per-element
`Result`-wrapping iterator protocol, repeated across all nesting levels. Typed decoding
replaces this with monomorphised, inlined call sites generated at compile time from the
`#[derive(Asn1Sequence)]` macro, so no dispatch decisions remain at runtime.

**`typed`** (~481 ns) performs the same DER traversal as `element_eager` via specialised
paths that also construct named struct fields and validate encoding constraints at compile
time.

**Roundtrip comparison** reveals another dimension:
- `element` roundtrip (~104 ns): lazy decode captures the certificate's raw bytes in O(1);
  the encoder re-emits them unchanged. The entire certificate is treated as an opaque blob.
- `typed` roundtrip (~2.16 µs): full decode constructs every named field, then re-encodes
  each field individually. The 20× difference vs element roundtrip reflects the cost of
  re-serialising a structured type rather than copying raw bytes.

> **Implication for the `bindings` benchmark**: `rust_element` performs the equivalent of
> `element_eager` — a full recursive traversal.  Its parse-only cost (~1.67 µs for
> traditional certs) is higher than `typed` for the same reason — generic traversal overhead
> — not because of the binding layer itself.

## Bindings Overhead

```bash
cargo bench -p synta-bench --bench bindings --features bench-bindings
```

All three binding layers parse the same `synta-certificate` backend; the numbers isolate
the cost introduced by each layer's API contract.

| Layer          | API                                          | Parse depth                         |
| -------------- | -------------------------------------------- | ----------------------------------- |
| `rust_typed`   | `synta_certificate::Certificate`             | Full RFC 5280                       |
| `rust_element` | `synta::Element` + full recursive traversal  | Equivalent to `element_eager`       |
| `c_ffi`        | `synta_certificate_parse_der` (`synta-ffi`)  | Full RFC 5280, copies fields to C-owned buffers |

`c_ffi` does everything `rust_typed` does, then additionally copies every field value into
a heap-allocated `Vec<u8>` to produce an opaque `SyntaCertificate` struct that can cross
the C ABI boundary safely. This means up to 14 allocations per certificate, each touching
heap memory. The owned-buffer copy cost grows with the size of large fields (signature,
public key), which is why `c_ffi` is more expensive on ML-DSA certs than on traditional ones.

### Traditional X.509 — Parse Only

| Binding        | cert_00    | cert_01    | cert_02    | cert_03    | cert_04    | Avg          |
| -------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 489.84 ns  | 473.68 ns  | 495.69 ns  | 499.09 ns  | 488.81 ns  | **489 ns**   |
| `rust_element` | 1552.5 ns  | 1693.8 ns  | 1702.7 ns  | 1687.7 ns  | 1693.8 ns  | **1666 ns**  |
| `c_ffi`        | 1746.7 ns  | 1813.0 ns  | 1788.2 ns  | 1816.6 ns  | 1834.5 ns  | **1800 ns**  |

`rust_element` is ~8% faster than `c_ffi` at parse time because `rust_element` holds
borrowed slices from the input buffer with no allocation, whereas `c_ffi` copies each field
into an owned buffer. The gap is relatively small here because the traditional cert's
signature and public key are only ~71 and ~294 bytes respectively.

### Traditional X.509 — Parse + All Fields

| Binding        | cert_00    | cert_01    | cert_02    | cert_03    | cert_04    | Avg          |
| -------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 1329.5 ns  | 1347.2 ns  | 1337.2 ns  | 1321.9 ns  | 1242.1 ns  | **1316 ns**  |
| `rust_element` | 1584.6 ns  | 1663.9 ns  | 1682.0 ns  | 1707.0 ns  | 1628.1 ns  | **1653 ns**  |
| `c_ffi`        | 2274.0 ns  | 2261.7 ns  | 2357.5 ns  | 2271.1 ns  | 2337.5 ns  | **2300 ns**  |

`rust_typed` is fastest: `identify_*()` returns `&'static str` for OID names with no
allocation; `issuer_raw` and `subject_raw` are zero-copy slices requiring only a pointer
read; `format_dn()` allocates but is the same cost across all bindings. `rust_element`'s
parse+fields cost converges with its parse-only cost because the field-access traversal is
already part of the recursive decode. `c_ffi` adds owned buffer copies on top of
`rust_typed`'s parse+fields cost, reaching 2.3 µs.

### Post-Quantum (ML-DSA) — Parse Only

| Binding        | ML-DSA-44  | ML-DSA-65  | ML-DSA-87  | Avg          |
| -------------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 472.77 ns  | 491.67 ns  | 475.44 ns  | **480 ns**   |
| `rust_element` | 1277.6 ns  | 1274.0 ns  | 1330.0 ns  | **1294 ns**  |
| `c_ffi`        | 1915.4 ns  | 1962.8 ns  | 2014.3 ns  | **1964 ns**  |

`rust_element` is faster on ML-DSA (1.29 µs) than on traditional certs (1.67 µs) because
ML-DSA certs have shorter Distinguished Names — less traversal work despite larger overall
cert size. `c_ffi` is slower on ML-DSA (1.96 µs vs 1.80 µs) because copying the
signature field (2,420–4,627 bytes) touches significantly more cache lines than copying a
71-byte RSA signature.

### Post-Quantum (ML-DSA) — Parse + All Fields

| Binding        | ML-DSA-44  | ML-DSA-65  | ML-DSA-87  | Avg          |
| -------------- | ---------- | ---------- | ---------- | ------------ |
| `rust_typed`   | 1030.2 ns  | 1064.7 ns  | 1126.4 ns  | **1074 ns**  |
| `rust_element` | 1293.0 ns  | 1301.8 ns  | 1279.2 ns  | **1291 ns**  |
| `c_ffi`        | 2468.4 ns  | 2607.7 ns  | 2642.1 ns  | **2573 ns**  |

`rust_typed` ML-DSA parse+fields (1.07 µs) is faster than traditional (1.32 µs) because
the shorter ML-DSA Distinguished Names reduce `format_dn()` cost. `rust_element` trails
`rust_typed` by only ~20% on ML-DSA (vs ~25% on traditional), for the same reason.
`c_ffi`'s ML-DSA cost grows 12% over its traditional cost, driven by the larger signature
buffer copy.