Crate ucd_parse

Source
Expand description

A library for parsing the Unicode character database.

Modules§

extracted
Types for parsing files in the extracted subdirectory of the Unicode Character Database download.

Structs§

Age
A single row in the DerivedAge.txt file.
ArabicShaping
Represents a single row in the ArabicShaping.txt file.
BidiMirroring
Represents a single row in the BidiMirroring.txt file.
CaseFold
A single row in the CaseFolding.txt file.
Codepoint
A single Unicode codepoint.
CodepointIter
An iterator over a range of Unicode codepoints.
CodepointRange
A range of Unicode codepoints. The range is inclusive; both ends of the range are guaranteed to be valid codepoints.
CoreProperty
A single row in the DerivedCoreProperties.txt file.
DerivedNormalizationProperty
A single row in the DerivedNormalizationProps.txt file.
EastAsianWidth
A single row in the EastAsianWidth.txt file, describing the value of the East_Asian_Width property.
EmojiProperty
A single row in the emoji-data.txt file.
Error
Represents any kind of error that can occur while parsing the UCD.
GraphemeClusterBreak
A single row in the auxiliary/GraphemeBreakProperty.txt file.
GraphemeClusterBreakTest
A single row in the auxiliary/GraphemeBreakTest.txt file.
JamoShortName
A single row in the Jamo.txt file.
LineBreakTest
A single row in the auxiliary/LineBreakTest.txt file.
NameAlias
A single row in the NameAliases.txt file.
Property
A single row in the PropList.txt file.
PropertyAlias
A single row in the PropertyAliases.txt file.
PropertyValueAlias
A single row in the PropertyValueAliases.txt file.
Script
A single row in the Scripts.txt file.
ScriptExtension
A single row in the ScriptExtensions.txt file.
SentenceBreak
A single row in the auxiliary/SentenceBreakProperty.txt file.
SentenceBreakTest
A single row in the auxiliary/SentenceBreakTest.txt file.
SpecialCaseMapping
A single row in the SpecialCasing.txt file.
UcdLineParser
A line oriented parser for a particular UCD file.
UnicodeData
Represents a single row in the UnicodeData.txt file.
UnicodeDataDecomposition
Represents a decomposition mapping of a single row in the UnicodeData.txt file.
UnicodeDataExpander
An iterator adapter that expands rows in UnicodeData.txt.
WordBreak
A single row in the auxiliary/WordBreakProperty.txt file.
WordBreakTest
A single row in the auxiliary/WordBreakTest.txt file.

Enums§

CaseStatus
The status of a particular case mapping.
Codepoints
A representation of either a single codepoint or a range of codepoints.
ErrorKind
The kind of error that occurred while parsing the UCD.
NameAliasLabel
The label of a name alias.
UnicodeDataDecompositionTag
The formatting tag on a decomposition mapping.
UnicodeDataNumeric
A numeric value corresponding to characters with Numeric_Type=Numeric.

Traits§

UcdFile
Describes a single UCD file.
UcdFileByCodepoint
Describes a single UCD file where every record in the file is associated with one or more codepoints.

Functions§

parse
Parse a particular file in the UCD into a sequence of rows.
parse_by_codepoint
Parse a particular file in the UCD into a map from codepoint to the record.
parse_many_by_codepoint
Parse a particular file in the UCD into a map from codepoint to all records associated with that codepoint.
ucd_directory_version
Given a path pointing at the root of the ucd_dir, attempts to determine it’s unicode version.