Skip to main content

Module text

Module text 

Source
Expand description

DVB-SI text decoding — ETSI EN 300 468 Annex A.

Covers the full Annex A Table A.3 selector set: the default Latin table (Figure A.1, an ISO 6937 superset — see iso_6937_single), ISO 8859-n (single-byte 0x01–0x0B and extended 0x10 forms), UCS-2 BE (0x11), KS X 1001 Korean (0x12, decoded as EUC-KR), GB-2312 Simplified Chinese (0x13, decoded via GBK which is a GB-2312 superset), Big5 Traditional Chinese (0x14), UTF-8 (0x15), and the 0x1F encoding_type_id escape (no ids are registered for broadcast use — yields U+FFFD). Reserved selectors (0x08, 0x0C–0x0F, 0x16–0x1E) yield U+FFFD per byte.

Glyph mappings are pinned to EN 300 468 V1.19.1 (2025-02) Figure A.1 “Character code table 00 - Latin alphabet with Unicode equivalents” (PDF p. 159, vendored at specs/etsi_en_300_468_v01.19.01_dvb_si.pdf; transcription in dvb-si/docs/en_300_468.md).

DvbText wraps the raw wire bytes and decodes only on demand — parsing stays zero-copy; decoding happens when you call DvbText::decode, Display, or serde:

use dvb_si::text::{DvbText, LangCode};

// Leading 0x15 is the Annex A UTF-8 selector; "café" follows.
let name = DvbText::new(&[0x15, b'c', b'a', b'f', 0xC3, 0xA9]);
assert_eq!(name.decode(), "café");
assert_eq!(name.raw(), &[0x15, b'c', b'a', b'f', 0xC3, 0xA9]); // selector kept

// A selector-less default-Latin (ISO 6937) sequence: combining acute + e → é.
assert_eq!(DvbText::new(&[0xC2, b'e']).decode(), "é");

// LangCode is 3 raw bytes (ISO 639-2 / ISO 3166) decoded lossily on demand.
assert_eq!(LangCode(*b"fre").as_str(), "fre");

Structs§

DvbText
Borrowed DVB-encoded text (EN 300 468 Annex A). Wraps the raw selector + body bytes; decoding happens only on DvbText::decode / Display / serde — never in the parse hot path.
LangCode
ISO 639-2 language code or ISO 3166 country code — 3 raw bytes.

Functions§

decode
Convenience wrapper returning Cow::Borrowed for pure-ASCII input, Cow::Owned otherwise.
decode_dvb_string
Decode a DVB text payload (e.g. short_event_descriptor event_name_char) into an owned UTF-8 String. The first byte may be a charset indicator per ETSI EN 300 468 Annex A Table A.3.