synta 0.2.6

ASN.1 parser, decoder, and encoder library with DER/BER support and C FFI
Documentation
# Decoder


## Constructor

```python
Decoder(data: bytes, encoding: Encoding)
```

Creates a streaming decoder over `data`.  The internal position starts at 0.
Each `decode_*` call advances the position past the decoded element.

## Primitive decode methods

| Method | Returns | ASN.1 type | Tag |
|---|---|---|---|
| `decode_integer()` | `Integer` | INTEGER | `0x02` |
| `decode_octet_string()` | `OctetString` | OCTET STRING | `0x04` |
| `decode_oid()` | `ObjectIdentifier` | OBJECT IDENTIFIER | `0x06` |
| `decode_bit_string()` | `BitString` | BIT STRING | `0x03` |
| `decode_boolean()` | `Boolean` | BOOLEAN | `0x01` |
| `decode_utc_time()` | `UtcTime` | UTCTime | `0x17` |
| `decode_generalized_time()` | `GeneralizedTime` | GeneralizedTime | `0x18` |
| `decode_null()` | `Null` | NULL | `0x05` |
| `decode_real()` | `Real` | REAL | `0x09` |
| `decode_utf8_string()` | `Utf8String` | UTF8String | `0x0c` |
| `decode_printable_string()` | `PrintableString` | PrintableString | `0x13` |
| `decode_ia5_string()` | `IA5String` | IA5String | `0x16` |
| `decode_numeric_string()` | `NumericString` | NumericString | `0x12` |
| `decode_teletex_string()` | `TeletexString` | TeletexString / T61String | `0x14` |
| `decode_visible_string()` | `VisibleString` | VisibleString | `0x1a` |
| `decode_general_string()` | `GeneralString` | GeneralString | `0x1b` |
| `decode_universal_string()` | `UniversalString` | UniversalString | `0x1c` |
| `decode_bmp_string()` | `BmpString` | BMPString | `0x1e` |
| `decode_any()` | any Python object | any element ||
| `decode_any_str()` | `str` | any string type ||

### `decode_any()` dispatch table

`decode_any()` dispatches on the tag at the current position:

| ASN.1 Type | Python value |
|---|---|
| BOOLEAN | `Boolean` |
| INTEGER | `Integer` |
| BIT STRING | `BitString` |
| OCTET STRING | `OctetString` |
| NULL | `Null` |
| OBJECT IDENTIFIER | `ObjectIdentifier` |
| UTF8String | `Utf8String` |
| PrintableString | `PrintableString` |
| IA5String | `IA5String` |
| NumericString | `NumericString` |
| TeletexString | `TeletexString` |
| VisibleString | `VisibleString` |
| GeneralString | `GeneralString` |
| UniversalString | `UniversalString` |
| BmpString | `BmpString` |
| UTCTime | `UtcTime` |
| GeneralizedTime | `GeneralizedTime` |
| SEQUENCE / SET | `list` of the above |
| Tagged | `TaggedElement` |
| Unknown universal | `RawElement` |

### `decode_any_str()` encoding table

`decode_any_str()` reads one TLV and decodes it as a native Python `str`,
applying the correct encoding for each of the nine ASN.1 string types:

| Tag | Type | Decoding |
|-----|------|----------|
| 12 | UTF8String | UTF-8 (lossy) |
| 18 | NumericString | UTF-8 |
| 19 | PrintableString | UTF-8 |
| 20 | TeletexString / T61String | Latin-1 (each byte → U+0000–U+00FF) |
| 22 | IA5String | UTF-8 |
| 26 | VisibleString | UTF-8 |
| 27 | GeneralString | UTF-8 |
| 28 | UniversalString | UCS-4 big-endian |
| 30 | BMPString | UCS-2 big-endian |

Raises `ValueError` for any other tag; raises `EOFError` if the decoder is
empty.  This is the single-call replacement for the duck-typing probe on
`decode_any()`:

```python
# Before — three-way probe:
val = decoder.decode_any()
if hasattr(val, 'as_str'):
    s = val.as_str()
elif hasattr(val, 'to_bytes'):
    s = val.to_bytes().decode('latin-1')
else:
    raise ValueError(f"not a string: {type(val)}")

# After — one call, correct encoding for all nine types:
s = decoder.decode_any_str()
```

## Structured / container decode methods

| Method | Signature | Returns | Description |
|---|---|---|---|
| `decode_sequence` | `()` | `Decoder` | Consume a SEQUENCE TLV; return child decoder over its contents. |
| `decode_set` | `()` | `Decoder` | Consume a SET TLV; return child decoder over its contents. |
| `decode_explicit_tag` | `(tag_num: int)` | `Decoder` | Strip an explicit context-specific tag `[tag_num]`; return child decoder over the content. |
| `decode_implicit_tag` | `(tag_num: int, tag_class: str)` | `Decoder` | Strip an implicit tag; return child decoder over the **value bytes only** (no tag/length). `tag_class` is `"Context"`, `"Application"`, `"Private"`, or `"Universal"`. |
| `decode_raw_tlv` | `()` | `bytes` | Read the next complete TLV (tag + length + value) as raw bytes and advance past it. |

## Introspection helpers

| Method | Returns | Description |
|---|---|---|
| `peek_tag()` | `tuple[int, str, bool]` | `(tag_number, tag_class, is_constructed)` — does **not** advance the position. Raises `EOFError` if no data remains. |
| `remaining_bytes()` | `bytes` | All bytes from the current position to the end. Useful after `decode_implicit_tag` to retrieve bare primitive value bytes. |
| `is_empty()` | `bool` | `True` when the current position equals the data length. |
| `position()` | `int` | Current byte offset. |
| `remaining()` | `int` | Number of bytes left. |

## Full class stub

```python
class Decoder:
    def __init__(self, data: bytes, encoding: Encoding) -> None: ...

    # Primitive types
    def decode_integer(self) -> Integer: ...
    def decode_octet_string(self) -> OctetString: ...
    def decode_oid(self) -> ObjectIdentifier: ...
    def decode_bit_string(self) -> BitString: ...
    def decode_boolean(self) -> Boolean: ...
    def decode_real(self) -> Real: ...
    def decode_null(self) -> Null: ...
    def decode_utc_time(self) -> UtcTime: ...
    def decode_generalized_time(self) -> GeneralizedTime: ...

    # String types
    def decode_utf8_string(self) -> Utf8String: ...
    def decode_printable_string(self) -> PrintableString: ...
    def decode_ia5_string(self) -> IA5String: ...
    def decode_numeric_string(self) -> NumericString: ...       # tag 18
    def decode_teletex_string(self) -> TeletexString: ...      # tag 20
    def decode_visible_string(self) -> VisibleString: ...      # tag 26
    def decode_general_string(self) -> GeneralString: ...      # tag 27
    def decode_universal_string(self) -> UniversalString: ...  # tag 28
    def decode_bmp_string(self) -> BmpString: ...              # tag 30

    # Constructed / tagged
    def decode_sequence(self) -> Decoder: ...
    # Reads a SEQUENCE TLV, advances past it, and returns a new Decoder over
    # the content bytes.  Raises ValueError if the next element is not a SEQUENCE.

    def decode_explicit_tag(self, tag_num: int) -> Decoder: ...
    # Reads an explicit context-specific tag [tag_num], advances past it, and
    # returns a new Decoder over the tagged content.
    # Raises ValueError if the tag number does not match.

    def decode_set(self) -> Decoder: ...
    # Reads a SET TLV (tag 0x31), advances past it, and returns a new Decoder
    # over the content bytes.  Raises ValueError if the next element is not a SET.

    def decode_implicit_tag(self, tag_num: int, tag_class: str) -> Decoder: ...
    # Strips an implicit tag of the given number and class and returns a new
    # Decoder over the raw value bytes.  tag_class must be "Universal",
    # "Context", "Application", or "Private".  Raises ValueError on mismatch.
    # The caller must know the original type and call the appropriate decode_*
    # method on the returned Decoder.
    #
    # Example:
    #   raw_decoder = decoder.decode_implicit_tag(0, "Context")
    #   value = raw_decoder.decode_integer()

    def peek_tag(self) -> tuple[int, str, bool]: ...
    # Returns (tag_number, tag_class, is_constructed) of the next element without
    # consuming any bytes.  Raises EOFError if the decoder is empty.
    # Use for CHOICE dispatch or optional-field detection:
    #   tag_num, tag_class, _ = decoder.peek_tag()
    #   if tag_class == "Context" and tag_num == 0:
    #       version = decoder.decode_explicit_tag(0)

    def decode_raw_tlv(self) -> bytes: ...
    # Reads the complete next TLV (tag + length + value bytes) as a bytes object
    # and advances past it.  Useful when the element type is unknown or when
    # decoding should be deferred:
    #   tlv = decoder.decode_raw_tlv()
    #   inner = synta.Decoder(tlv, synta.Encoding.DER)

    def remaining_bytes(self) -> bytes: ...
    # Returns all remaining bytes from the current position without advancing.
    # Primarily useful after decode_implicit_tag() for **primitive** implicit
    # types whose raw value bytes cannot be decoded with the decode_* methods
    # (those expect a full TLV, but implicit stripping leaves only the value):
    #
    #   # Decode dNSName [2] IMPLICIT IA5String
    #   child = decoder.decode_implicit_tag(2, "Context")
    #   dns_name = child.remaining_bytes().decode("ascii")
    #
    #   # Decode iPAddress [7] IMPLICIT OCTET STRING
    #   child = decoder.decode_implicit_tag(7, "Context")
    #   ip_bytes = child.remaining_bytes()   # 4 or 16 raw bytes

    # Dynamic decoding
    def decode_any(self) -> object: ...
    # Returns a typed Python object.  Sequence/Set → list.
    # Tagged elements → TaggedElement.
    # Unknown universal tags → RawElement.

    def decode_any_str(self) -> str: ...
    # Decode any ASN.1 string type as a Python str (correct encoding per type).
    # Raises ValueError for non-string tags; EOFError if empty.

    # State
    def is_empty(self) -> bool: ...
    def position(self) -> int: ...
    def remaining(self) -> int: ...
```

## Usage examples

### Decoding ASN.1 data

```python
import synta

# Decode an INTEGER
data = b'\x02\x01\x2A'  # DER-encoded INTEGER 42
decoder = synta.Decoder(data, synta.Encoding.DER)
integer = decoder.decode_integer()
print(integer.to_int())  # Output: 42

# Decode an OBJECT IDENTIFIER
oid_data = b'\x06\x09\x2a\x86\x48\x86\xf7\x0d\x01\x01\x01'
decoder = synta.Decoder(oid_data, synta.Encoding.DER)
oid = decoder.decode_oid()
print(str(oid))  # Output: 1.2.840.113549.1.1.1

# Decode an OCTET STRING
octet_data = b'\x04\x05hello'
decoder = synta.Decoder(octet_data, synta.Encoding.DER)
octet_string = decoder.decode_octet_string()
print(octet_string.to_bytes())  # Output: b'hello'

# Decode a NULL
null_data = b'\x05\x00'
decoder = synta.Decoder(null_data, synta.Encoding.DER)
null = decoder.decode_null()

# Decode a REAL (IEEE 754 double)
real_data = b'\x09\x01\x40'  # PLUS-INFINITY
decoder = synta.Decoder(real_data, synta.Encoding.DER)
r = decoder.decode_real()
import math
assert math.isinf(float(r))

# Decode any element dynamically
data = b'\x02\x01\x2A'
decoder = synta.Decoder(data, synta.Encoding.DER)
obj = decoder.decode_any()  # Returns Integer, OctetString, list (Sequence/Set), etc.
```

### Decoding SEQUENCE structures

Use `decode_sequence()` to enter a SEQUENCE and get a child `Decoder`
positioned over the content bytes.  Iterate with typed `decode_*` methods
and `is_empty()`.

```python
import synta

# Encoded SEQUENCE { INTEGER 42, BOOLEAN TRUE }
data = b'\x30\x06\x02\x01\x2a\x01\x01\xff'

decoder = synta.Decoder(data, synta.Encoding.DER)
child = decoder.decode_sequence()    # advances past the outer SEQUENCE TLV
assert decoder.is_empty()

while not child.is_empty():
    obj = child.decode_any()         # INTEGER, then BOOLEAN

# Decode an explicit context tag [1] wrapping an INTEGER
tagged_data = b'\xa1\x05\x02\x03\x00\x00\x63'  # [1] EXPLICIT INTEGER 99
decoder = synta.Decoder(tagged_data, synta.Encoding.DER)
child = decoder.decode_explicit_tag(1)   # raises ValueError if tag != [1]
integer = child.decode_integer()
assert integer.to_int() == 99
```