# Element vs Typed API and Bindings Overhead
## Element vs Typed API
```bash
cargo bench -p synta-bench --bench comparison_typed
```
(`parsing_comparison` and `roundtrip` Criterion groups)
Three parsing modes are measured for an apples-to-apples comparison within synta:
| `Element` (`element_lazy`) | Lazy O(1) | Outer SEQUENCE tag+length only | ~20 ns |
| `Element` (`element_eager`) | Full traversal | Recursive decode of every nested element | ~1.59 µs |
| `Certificate` (`typed`) | Full RFC 5280 | Typed decode + named field construction | ~481 ns |
| `Element` (roundtrip) | Lazy capture | Lazy decode + re-emit raw captured bytes | ~104 ns |
| `Certificate` (roundtrip) | Full cycle | Full decode + re-encode all fields | ~2.16 µs |
**`element_lazy`** (~20 ns) is O(1): the decoder reads the outer SEQUENCE tag and length,
captures a borrowed slice of the content bytes, and returns. No child elements are decoded.
This is the cost of "entering" the certificate without inspecting it — useful as a baseline
for how fast the parser can recognise a DER boundary.
**`element_eager`** (~1.59 µs) is the fair comparison with typed decoding: a `traverse()`
function recursively matches every `Sequence`, `Set`, and `Tagged` variant and iterates
their children until all leaf values have been visited — the same set of DER elements that
typed decoding processes. At ~1.59 µs it is **3.3× slower** than typed decoding. The
overhead is runtime dynamic dispatch (`match el { ... }` at every node) and the per-element
`Result`-wrapping iterator protocol, repeated across all nesting levels. Typed decoding
replaces this with monomorphised, inlined call sites generated at compile time from the
`#[derive(Asn1Sequence)]` macro, so no dispatch decisions remain at runtime.
**`typed`** (~481 ns) performs the same DER traversal as `element_eager` via specialised
paths that also construct named struct fields and validate encoding constraints at compile
time.
**Roundtrip comparison** reveals another dimension:
- `element` roundtrip (~104 ns): lazy decode captures the certificate's raw bytes in O(1);
the encoder re-emits them unchanged. The entire certificate is treated as an opaque blob.
- `typed` roundtrip (~2.16 µs): full decode constructs every named field, then re-encodes
each field individually. The 20× difference vs element roundtrip reflects the cost of
re-serialising a structured type rather than copying raw bytes.
> **Implication for the `bindings` benchmark**: `rust_element` performs the equivalent of
> `element_eager` — a full recursive traversal. Its parse-only cost (~1.67 µs for
> traditional certs) is higher than `typed` for the same reason — generic traversal overhead
> — not because of the binding layer itself.
## Bindings Overhead
```bash
cargo bench -p synta-bench --bench bindings --features bench-bindings
```
All three binding layers parse the same `synta-certificate` backend; the numbers isolate
the cost introduced by each layer's API contract.
| `rust_typed` | `synta_certificate::Certificate` | Full RFC 5280 |
| `rust_element` | `synta::Element` + full recursive traversal | Equivalent to `element_eager` |
| `c_ffi` | `synta_certificate_parse_der` (`synta-ffi`) | Full RFC 5280, copies fields to C-owned buffers |
`c_ffi` does everything `rust_typed` does, then additionally copies every field value into
a heap-allocated `Vec<u8>` to produce an opaque `SyntaCertificate` struct that can cross
the C ABI boundary safely. This means up to 14 allocations per certificate, each touching
heap memory. The owned-buffer copy cost grows with the size of large fields (signature,
public key), which is why `c_ffi` is more expensive on ML-DSA certs than on traditional ones.
### Traditional X.509 — Parse Only
| `rust_typed` | 489.84 ns | 473.68 ns | 495.69 ns | 499.09 ns | 488.81 ns | **489 ns** |
| `rust_element` | 1552.5 ns | 1693.8 ns | 1702.7 ns | 1687.7 ns | 1693.8 ns | **1666 ns** |
| `c_ffi` | 1746.7 ns | 1813.0 ns | 1788.2 ns | 1816.6 ns | 1834.5 ns | **1800 ns** |
`rust_element` is ~8% faster than `c_ffi` at parse time because `rust_element` holds
borrowed slices from the input buffer with no allocation, whereas `c_ffi` copies each field
into an owned buffer. The gap is relatively small here because the traditional cert's
signature and public key are only ~71 and ~294 bytes respectively.
### Traditional X.509 — Parse + All Fields
| `rust_typed` | 1329.5 ns | 1347.2 ns | 1337.2 ns | 1321.9 ns | 1242.1 ns | **1316 ns** |
| `rust_element` | 1584.6 ns | 1663.9 ns | 1682.0 ns | 1707.0 ns | 1628.1 ns | **1653 ns** |
| `c_ffi` | 2274.0 ns | 2261.7 ns | 2357.5 ns | 2271.1 ns | 2337.5 ns | **2300 ns** |
`rust_typed` is fastest: `identify_*()` returns `&'static str` for OID names with no
allocation; `issuer_raw` and `subject_raw` are zero-copy slices requiring only a pointer
read; `format_dn()` allocates but is the same cost across all bindings. `rust_element`'s
parse+fields cost converges with its parse-only cost because the field-access traversal is
already part of the recursive decode. `c_ffi` adds owned buffer copies on top of
`rust_typed`'s parse+fields cost, reaching 2.3 µs.
### Post-Quantum (ML-DSA) — Parse Only
| `rust_typed` | 472.77 ns | 491.67 ns | 475.44 ns | **480 ns** |
| `rust_element` | 1277.6 ns | 1274.0 ns | 1330.0 ns | **1294 ns** |
| `c_ffi` | 1915.4 ns | 1962.8 ns | 2014.3 ns | **1964 ns** |
`rust_element` is faster on ML-DSA (1.29 µs) than on traditional certs (1.67 µs) because
ML-DSA certs have shorter Distinguished Names — less traversal work despite larger overall
cert size. `c_ffi` is slower on ML-DSA (1.96 µs vs 1.80 µs) because copying the
signature field (2,420–4,627 bytes) touches significantly more cache lines than copying a
71-byte RSA signature.
### Post-Quantum (ML-DSA) — Parse + All Fields
| `rust_typed` | 1030.2 ns | 1064.7 ns | 1126.4 ns | **1074 ns** |
| `rust_element` | 1293.0 ns | 1301.8 ns | 1279.2 ns | **1291 ns** |
| `c_ffi` | 2468.4 ns | 2607.7 ns | 2642.1 ns | **2573 ns** |
`rust_typed` ML-DSA parse+fields (1.07 µs) is faster than traditional (1.32 µs) because
the shorter ML-DSA Distinguished Names reduce `format_dn()` cost. `rust_element` trails
`rust_typed` by only ~20% on ML-DSA (vs ~25% on traditional), for the same reason.
`c_ffi`'s ML-DSA cost grows 12% over its traditional cost, driven by the larger signature
buffer copy.