Module ucm

Module ucm 

Source
Expand description

This module contains a UniCode Mapping (.ucm) file format parser and all of the data files available in the Unicode Consortium’s icu-data repository. For a list, see KNOWN_CHARSETS.

Most uses of this library should look like this:

use icu_data::ucm::{request_mapping_file, parser::parse as parse_ucm};

let f = request_mapping_file("java-EUC_JP-1.3_P").unwrap(); // holds the .ucm file as a String
let enc = parse_ucm(&f).unwrap(); // holds an `Encoding`
/* ... */

If you only want a single encoding, they’re all in the module named mappings. They are all lazy_static types, so are only evaluated when used. The evaluation of them can panic, because it is just the code above, but they all work on my machine, and will only ever panic if Brotli decompression or tar metadata parsing fails.

Example:

use icu_data::ucm::mappings;
assert_eq!(mappings::JAVA_EUC_JP_1_3_P.codepoints.len(), 13139);

Modules§

mappings
Lazilly evaluated static’s, holding an Encoding for each encoding
parser
A .ucm file format (UniCode Mapping) Pest parser

Structs§

Codepoint
This represents a CHARMAP row in a .ucm (UniCode Mapping) file.
Encoding
This represents a single .ucm (UniCode Mapping) file.

Enums§

EquivalenceType
The “equivalence type” of the Unicode codepoint to the bytestring in the Encoding. The equivalence types are defined by the Unicode consortium as such:
IcuDataError
Error type. You should only ever expect to see UnknownMappingRequested unless you’re doing development on the library.

Statics§

KNOWN_CHARSETS
This is a list of all of the encodings known to request_mapping_file, without .ucm.

Traits§

PestParser
A trait with a single method that parses strings.

Functions§

request_mapping_file
Given the name of an encoding known to ICU, return its raw UCM data as a String.