Expand description
A library for parsing the Unicode character database.
Modules§
- extracted
- Types for parsing files in the
extracted
subdirectory of the Unicode Character Database download.
Structs§
- Age
- A single row in the
DerivedAge.txt
file. - Arabic
Shaping - Represents a single row in the
ArabicShaping.txt
file. - Bidi
Mirroring - Represents a single row in the
BidiMirroring.txt
file. - Case
Fold - A single row in the
CaseFolding.txt
file. - Codepoint
- A single Unicode codepoint.
- Codepoint
Iter - An iterator over a range of Unicode codepoints.
- Codepoint
Range - A range of Unicode codepoints. The range is inclusive; both ends of the range are guaranteed to be valid codepoints.
- Core
Property - A single row in the
DerivedCoreProperties.txt
file. - Derived
Normalization Property - A single row in the
DerivedNormalizationProps.txt
file. - East
Asian Width - A single row in the
EastAsianWidth.txt
file, describing the value of theEast_Asian_Width
property. - Emoji
Property - A single row in the
emoji-data.txt
file. - Error
- Represents any kind of error that can occur while parsing the UCD.
- Grapheme
Cluster Break - A single row in the
auxiliary/GraphemeBreakProperty.txt
file. - Grapheme
Cluster Break Test - A single row in the
auxiliary/GraphemeBreakTest.txt
file. - Jamo
Short Name - A single row in the
Jamo.txt
file. - Line
Break Test - A single row in the
auxiliary/LineBreakTest.txt
file. - Name
Alias - A single row in the
NameAliases.txt
file. - Property
- A single row in the
PropList.txt
file. - Property
Alias - A single row in the
PropertyAliases.txt
file. - Property
Value Alias - A single row in the
PropertyValueAliases.txt
file. - Script
- A single row in the
Scripts.txt
file. - Script
Extension - A single row in the
ScriptExtensions.txt
file. - Sentence
Break - A single row in the
auxiliary/SentenceBreakProperty.txt
file. - Sentence
Break Test - A single row in the
auxiliary/SentenceBreakTest.txt
file. - Special
Case Mapping - A single row in the
SpecialCasing.txt
file. - UcdLine
Parser - A line oriented parser for a particular UCD file.
- Unicode
Data - Represents a single row in the
UnicodeData.txt
file. - Unicode
Data Decomposition - Represents a decomposition mapping of a single row in the
UnicodeData.txt
file. - Unicode
Data Expander - An iterator adapter that expands rows in
UnicodeData.txt
. - Word
Break - A single row in the
auxiliary/WordBreakProperty.txt
file. - Word
Break Test - A single row in the
auxiliary/WordBreakTest.txt
file.
Enums§
- Case
Status - The status of a particular case mapping.
- Codepoints
- A representation of either a single codepoint or a range of codepoints.
- Error
Kind - The kind of error that occurred while parsing the UCD.
- Name
Alias Label - The label of a name alias.
- Unicode
Data Decomposition Tag - The formatting tag on a decomposition mapping.
- Unicode
Data Numeric - A numeric value corresponding to characters with
Numeric_Type=Numeric
.
Traits§
- UcdFile
- Describes a single UCD file.
- UcdFile
ByCodepoint - Describes a single UCD file where every record in the file is associated with one or more codepoints.
Functions§
- parse
- Parse a particular file in the UCD into a sequence of rows.
- parse_
by_ codepoint - Parse a particular file in the UCD into a map from codepoint to the record.
- parse_
many_ by_ codepoint - Parse a particular file in the UCD into a map from codepoint to all records associated with that codepoint.
- ucd_
directory_ version - Given a path pointing at the root of the
ucd_dir
, attempts to determine it’s unicode version.