Expand description
A library for parsing the Unicode character database.
Modules§
- extracted
- Types for parsing files in the
extractedsubdirectory of the Unicode Character Database download.
Structs§
- Age
- A single row in the
DerivedAge.txtfile. - Arabic
Shaping - Represents a single row in the
ArabicShaping.txtfile. - Bidi
Mirroring - Represents a single row in the
BidiMirroring.txtfile. - Case
Fold - A single row in the
CaseFolding.txtfile. - Codepoint
- A single Unicode codepoint.
- Codepoint
Iter - An iterator over a range of Unicode codepoints.
- Codepoint
Range - A range of Unicode codepoints. The range is inclusive; both ends of the range are guaranteed to be valid codepoints.
- Core
Property - A single row in the
DerivedCoreProperties.txtfile. - Derived
Normalization Property - A single row in the
DerivedNormalizationProps.txtfile. - East
Asian Width - A single row in the
EastAsianWidth.txtfile, describing the value of theEast_Asian_Widthproperty. - Emoji
Property - A single row in the
emoji-data.txtfile. - Error
- Represents any kind of error that can occur while parsing the UCD.
- Grapheme
Cluster Break - A single row in the
auxiliary/GraphemeBreakProperty.txtfile. - Grapheme
Cluster Break Test - A single row in the
auxiliary/GraphemeBreakTest.txtfile. - Jamo
Short Name - A single row in the
Jamo.txtfile. - Line
Break Test - A single row in the
auxiliary/LineBreakTest.txtfile. - Name
Alias - A single row in the
NameAliases.txtfile. - Property
- A single row in the
PropList.txtfile. - Property
Alias - A single row in the
PropertyAliases.txtfile. - Property
Value Alias - A single row in the
PropertyValueAliases.txtfile. - Script
- A single row in the
Scripts.txtfile. - Script
Extension - A single row in the
ScriptExtensions.txtfile. - Sentence
Break - A single row in the
auxiliary/SentenceBreakProperty.txtfile. - Sentence
Break Test - A single row in the
auxiliary/SentenceBreakTest.txtfile. - Special
Case Mapping - A single row in the
SpecialCasing.txtfile. - UcdLine
Parser - A line oriented parser for a particular UCD file.
- Unicode
Data - Represents a single row in the
UnicodeData.txtfile. - Unicode
Data Decomposition - Represents a decomposition mapping of a single row in the
UnicodeData.txtfile. - Unicode
Data Expander - An iterator adapter that expands rows in
UnicodeData.txt. - Word
Break - A single row in the
auxiliary/WordBreakProperty.txtfile. - Word
Break Test - A single row in the
auxiliary/WordBreakTest.txtfile.
Enums§
- Case
Status - The status of a particular case mapping.
- Codepoints
- A representation of either a single codepoint or a range of codepoints.
- Error
Kind - The kind of error that occurred while parsing the UCD.
- Name
Alias Label - The label of a name alias.
- Unicode
Data Decomposition Tag - The formatting tag on a decomposition mapping.
- Unicode
Data Numeric - A numeric value corresponding to characters with
Numeric_Type=Numeric.
Traits§
- UcdFile
- Describes a single UCD file.
- UcdFile
ByCodepoint - Describes a single UCD file where every record in the file is associated with one or more codepoints.
Functions§
- parse
- Parse a particular file in the UCD into a sequence of rows.
- parse_
by_ codepoint - Parse a particular file in the UCD into a map from codepoint to the record.
- parse_
many_ by_ codepoint - Parse a particular file in the UCD into a map from codepoint to all records associated with that codepoint.
- ucd_
directory_ version - Given a path pointing at the root of the
ucd_dir, attempts to determine it’s unicode version.