Crate codes_iana_charset

Expand description

This package contains an implementation of the IANA CHARSET specification.

These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. These names are expressed in ANSI_X3.4-1968 which is commonly called US-ASCII or simply ASCII. The character set most commonly use in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged.

The character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters.

The MIBenum value is a unique value for use in MIBs to identify coded character sets.

The value space for MIBenum values has been divided into three regions. The first region (3-999) consists of coded character sets that have been standardized by some standard setting organization. This region is intended for standards that do not have subset implementations. The second region (1000-1999) is for the Unicode and ISO/IEC 10646 coded character sets together with a specification of a (set of) sub-repertoires that may occur. The third region (>1999) is intended for vendor specific coded character sets.

§Example

use codes_iana_charset as charset;

let latin_1 = charset::CHARSET_4;
assert_eq!(latin_1.id(), 4);
assert_eq!(latin_1.name(), "ISO_8859-1:1987");
assert_eq!(
    latin_1.source(),
    "[ISO-IR: International Register of Escape Sequences] Note: The current registration authority is IPSJ/ITSCJ, Japan.",
);
assert_eq!(latin_1.preferred_alias(), Some("ISO-8859-1"));
assert_eq!(latin_1.aliases(), &[
    "iso-ir-100",
    "ISO_8859-1",
    "ISO-8859-1",
    "latin1",
    "l1",
    "IBM819",
    "CP819",
    "csISOLatin1"
]);
assert_eq!(latin_1.reference(), Some("[RFC1345][Keld_Simonsen]"));

Note that the implementation of FromStr takes into account all aliases.

use codes_iana_charset as charset;
use std::str::FromStr;

let latin_1 = charset::CHARSET_4;

let iso_8859_1 = charset::CharacterSetCode::from_str("ISO_8859-1").unwrap();

assert_eq!(latin_1, iso_8859_1);

let some_charset = charset::CharacterSetCode::try_from(4).unwrap();

assert_eq!(some_charset, iso_8859_1);

§Features

Structs§

CharacterSetCode: This type is used to encapsulate the numeric MIB enum for IANA-defined Character Sets.

Enums§

CharacterSetCodeError: Common Error type, mainly used for FromStr failures.

Constants§

ALL_CODES: Provides an array of all defined CharacterSetCode codes, useful for queries.
CHARSET_3: US-ASCII
CHARSET_4: ISO_8859-1:1987
CHARSET_5: ISO_8859-2:1987
CHARSET_6: ISO_8859-3:1988
CHARSET_7: ISO_8859-4:1988
CHARSET_8: ISO_8859-5:1988
CHARSET_9: ISO_8859-6:1987
CHARSET_10: ISO_8859-7:1987
CHARSET_11: ISO_8859-8:1988
CHARSET_12: ISO_8859-9:1989
CHARSET_13: ISO-8859-10
CHARSET_14: ISO_6937-2-add
CHARSET_15: JIS_X0201
CHARSET_16: JIS_Encoding
CHARSET_17: Shift_JIS
CHARSET_18: Extended_UNIX_Code_Packed_Format_for_Japanese
CHARSET_19: Extended_UNIX_Code_Fixed_Width_for_Japanese
CHARSET_20: BS_4730
CHARSET_21: SEN_850200_C
CHARSET_22: IT
CHARSET_23: ES
CHARSET_24: DIN_66003
CHARSET_25: NS_4551-1
CHARSET_26: NF_Z_62-010
CHARSET_27: ISO-10646-UTF-1
CHARSET_28: ISO_646.basic:1983
CHARSET_29: INVARIANT
CHARSET_30: ISO_646.irv:1983
CHARSET_31: NATS-SEFI
CHARSET_32: NATS-SEFI-ADD
CHARSET_33: NATS-DANO
CHARSET_34: NATS-DANO-ADD
CHARSET_35: SEN_850200_B
CHARSET_36: KS_C_5601-1987
CHARSET_37: ISO-2022-KR
CHARSET_38: EUC-KR
CHARSET_39: ISO-2022-JP
CHARSET_40: ISO-2022-JP-2
CHARSET_41: JIS_C6220-1969-jp
CHARSET_42: JIS_C6220-1969-ro
CHARSET_43: PT
CHARSET_44: greek7-old
CHARSET_45: latin-greek
CHARSET_46: NF_Z_62-010_(1973)
CHARSET_47: Latin-greek-1
CHARSET_48: ISO_5427
CHARSET_49: JIS_C6226-1978
CHARSET_50: BS_viewdata
CHARSET_51: INIS
CHARSET_52: INIS-8
CHARSET_53: INIS-cyrillic
CHARSET_54: ISO_5427:1981
CHARSET_55: ISO_5428:1980
CHARSET_56: GB_1988-80
CHARSET_57: GB_2312-80
CHARSET_58: NS_4551-2
CHARSET_59: videotex-suppl
CHARSET_60: PT2
CHARSET_61: ES2
CHARSET_62: MSZ_7795.3
CHARSET_63: JIS_C6226-1983
CHARSET_64: greek7
CHARSET_65: ASMO_449
CHARSET_66: iso-ir-90
CHARSET_67: JIS_C6229-1984-a
CHARSET_68: JIS_C6229-1984-b
CHARSET_69: JIS_C6229-1984-b-add
CHARSET_70: JIS_C6229-1984-hand
CHARSET_71: JIS_C6229-1984-hand-add
CHARSET_72: JIS_C6229-1984-kana
CHARSET_73: ISO_2033-1983
CHARSET_74: ANSI_X3.110-1983
CHARSET_75: T.61-7bit
CHARSET_76: T.61-8bit
CHARSET_77: ECMA-cyrillic
CHARSET_78: CSA_Z243.4-1985-1
CHARSET_79: CSA_Z243.4-1985-2
CHARSET_80: CSA_Z243.4-1985-gr
CHARSET_81: ISO_8859-6-E
CHARSET_82: ISO_8859-6-I
CHARSET_83: T.101-G2
CHARSET_84: ISO_8859-8-E
CHARSET_85: ISO_8859-8-I
CHARSET_86: CSN_369103
CHARSET_87: JUS_I.B1.002
CHARSET_88: IEC_P27-1
CHARSET_89: JUS_I.B1.003-serb
CHARSET_90: JUS_I.B1.003-mac
CHARSET_91: greek-ccitt
CHARSET_92: NC_NC00-10:81
CHARSET_93: ISO_6937-2-25
CHARSET_94: GOST_19768-74
CHARSET_95: ISO_8859-supp
CHARSET_96: ISO_10367-box
CHARSET_97: latin-lap
CHARSET_98: JIS_X0212-1990
CHARSET_99: DS_2089
CHARSET_100: us-dk
CHARSET_101: dk-us
CHARSET_102: KSC5636
CHARSET_103: UNICODE-1-1-UTF-7
CHARSET_104: ISO-2022-CN
CHARSET_105: ISO-2022-CN-EXT
CHARSET_106: UTF-8
CHARSET_109: ISO-8859-13
CHARSET_110: ISO-8859-14
CHARSET_111: ISO-8859-15
CHARSET_112: ISO-8859-16
CHARSET_113: GBK
CHARSET_114: GB18030
CHARSET_115: OSD_EBCDIC_DF04_15
CHARSET_116: OSD_EBCDIC_DF03_IRV
CHARSET_117: OSD_EBCDIC_DF04_1
CHARSET_118: ISO-11548-1
CHARSET_119: KZ-1048
CHARSET_1000: ISO-10646-UCS-2
CHARSET_1001: ISO-10646-UCS-4
CHARSET_1002: ISO-10646-UCS-Basic
CHARSET_1003: ISO-10646-Unicode-Latin1
CHARSET_1004: ISO-10646-J-1
CHARSET_1005: ISO-Unicode-IBM-1261
CHARSET_1006: ISO-Unicode-IBM-1268
CHARSET_1007: ISO-Unicode-IBM-1276
CHARSET_1008: ISO-Unicode-IBM-1264
CHARSET_1009: ISO-Unicode-IBM-1265
CHARSET_1010: UNICODE-1-1
CHARSET_1011: SCSU
CHARSET_1012: UTF-7
CHARSET_1013: UTF-16BE
CHARSET_1014: UTF-16LE
CHARSET_1015: UTF-16
CHARSET_1016: CESU-8
CHARSET_1017: UTF-32
CHARSET_1018: UTF-32BE
CHARSET_1019: UTF-32LE
CHARSET_1020: BOCU-1
CHARSET_1021: UTF-7-IMAP
CHARSET_2000: ISO-8859-1-Windows-3.0-Latin-1
CHARSET_2001: ISO-8859-1-Windows-3.1-Latin-1
CHARSET_2002: ISO-8859-2-Windows-Latin-2
CHARSET_2003: ISO-8859-9-Windows-Latin-5
CHARSET_2004: hp-roman8
CHARSET_2005: Adobe-Standard-Encoding
CHARSET_2006: Ventura-US
CHARSET_2007: Ventura-International
CHARSET_2008: DEC-MCS
CHARSET_2009: IBM850
CHARSET_2010: IBM852
CHARSET_2011: IBM437
CHARSET_2012: PC8-Danish-Norwegian
CHARSET_2013: IBM862
CHARSET_2014: PC8-Turkish
CHARSET_2015: IBM-Symbols
CHARSET_2016: IBM-Thai
CHARSET_2017: HP-Legal
CHARSET_2018: HP-Pi-font
CHARSET_2019: HP-Math8
CHARSET_2020: Adobe-Symbol-Encoding
CHARSET_2021: HP-DeskTop
CHARSET_2022: Ventura-Math
CHARSET_2023: Microsoft-Publishing
CHARSET_2024: Windows-31J
CHARSET_2025: GB2312
CHARSET_2026: Big5
CHARSET_2027: macintosh
CHARSET_2028: IBM037
CHARSET_2029: IBM038
CHARSET_2030: IBM273
CHARSET_2031: IBM274
CHARSET_2032: IBM275
CHARSET_2033: IBM277
CHARSET_2034: IBM278
CHARSET_2035: IBM280
CHARSET_2036: IBM281
CHARSET_2037: IBM284
CHARSET_2038: IBM285
CHARSET_2039: IBM290
CHARSET_2040: IBM297
CHARSET_2041: IBM420
CHARSET_2042: IBM423
CHARSET_2043: IBM424
CHARSET_2044: IBM500
CHARSET_2045: IBM851
CHARSET_2046: IBM855
CHARSET_2047: IBM857
CHARSET_2048: IBM860
CHARSET_2049: IBM861
CHARSET_2050: IBM863
CHARSET_2051: IBM864
CHARSET_2052: IBM865
CHARSET_2053: IBM868
CHARSET_2054: IBM869
CHARSET_2055: IBM870
CHARSET_2056: IBM871
CHARSET_2057: IBM880
CHARSET_2058: IBM891
CHARSET_2059: IBM903
CHARSET_2060: IBM904
CHARSET_2061: IBM905
CHARSET_2062: IBM918
CHARSET_2063: IBM1026
CHARSET_2064: EBCDIC-AT-DE
CHARSET_2065: EBCDIC-AT-DE-A
CHARSET_2066: EBCDIC-CA-FR
CHARSET_2067: EBCDIC-DK-NO
CHARSET_2068: EBCDIC-DK-NO-A
CHARSET_2069: EBCDIC-FI-SE
CHARSET_2070: EBCDIC-FI-SE-A
CHARSET_2071: EBCDIC-FR
CHARSET_2072: EBCDIC-IT
CHARSET_2073: EBCDIC-PT
CHARSET_2074: EBCDIC-ES
CHARSET_2075: EBCDIC-ES-A
CHARSET_2076: EBCDIC-ES-S
CHARSET_2077: EBCDIC-UK
CHARSET_2078: EBCDIC-US
CHARSET_2079: UNKNOWN-8BIT
CHARSET_2080: MNEMONIC
CHARSET_2081: MNEM
CHARSET_2082: VISCII
CHARSET_2083: VIQR
CHARSET_2084: KOI8-R
CHARSET_2085: HZ-GB-2312
CHARSET_2086: IBM866
CHARSET_2087: IBM775
CHARSET_2088: KOI8-U
CHARSET_2089: IBM00858
CHARSET_2090: IBM00924
CHARSET_2091: IBM01140
CHARSET_2092: IBM01141
CHARSET_2093: IBM01142
CHARSET_2094: IBM01143
CHARSET_2095: IBM01144
CHARSET_2096: IBM01145
CHARSET_2097: IBM01146
CHARSET_2098: IBM01147
CHARSET_2099: IBM01148
CHARSET_2100: IBM01149
CHARSET_2101: Big5-HKSCS
CHARSET_2102: IBM1047
CHARSET_2103: PTCP154
CHARSET_2104: Amiga-1251
CHARSET_2105: KOI7-switched
CHARSET_2106: BRF
CHARSET_2107: TSCII
CHARSET_2108: CP51932
CHARSET_2109: windows-874
CHARSET_2250: windows-1250
CHARSET_2251: windows-1251
CHARSET_2252: windows-1252
CHARSET_2253: windows-1253
CHARSET_2254: windows-1254
CHARSET_2255: windows-1255
CHARSET_2256: windows-1256
CHARSET_2257: windows-1257
CHARSET_2258: windows-1258
CHARSET_2259: TIS-620
CHARSET_2260: CP50220
IANA_CHARSET: An instance of the Standard struct defined in the codes_agency package that describes the ISO-10383 specification.