Crate icu_properties
source · [−]Expand description
icu_properties
is one of the ICU4X
components.
This component provides definitions of Unicode Properties and APIs for retrieving property data in an appropriate data structure.
APIs that return a UnicodeSet
exist for binary properties and certain enumerated
properties. See the sets
module for more details.
APIs that return a CodePointTrie
exist for certain enumerated properties. See the
maps
module for more details.
Examples
Property data as UnicodeSet
s
use icu::properties::{maps, sets, GeneralCategory};
let provider = icu_testdata::get_provider();
// A binary property as a `UnicodeSet`
let payload =
sets::get_emoji(&provider)
.expect("The data should be valid");
let data_struct = payload.get();
let emoji = &data_struct.inv_list;
assert!(emoji.contains('🎃')); // U+1F383 JACK-O-LANTERN
assert!(!emoji.contains('木')); // U+6728
// An individual enumerated property value as a `UnicodeSet`
let payload = maps::get_general_category(&provider)
.expect("The data should be valid");
let data_struct = payload.get();
let gc = &data_struct.code_point_trie;
let line_sep = gc.get_set_for_value(GeneralCategory::LineSeparator);
assert!(line_sep.contains_u32(0x2028));
assert!(!line_sep.contains_u32(0x2029));
Property data as CodePointTrie
s
use icu::properties::{maps, Script};
let provider = icu_testdata::get_provider();
let payload =
maps::get_script(&provider)
.expect("The data should be valid");
let data_struct = payload.get();
let script = &data_struct.code_point_trie;
assert_eq!(script.get('🎃' as u32), Script::Common); // U+1F383 JACK-O-LANTERN
assert_eq!(script.get('木' as u32), Script::Han); // U+6728
Modules
This module exposes tooling for running the unicode bidi algorithm using ICU4X data.
The functions in this module return a CodePointTrie
representing, for
each code point in the entire range of code points, the property values
for a particular Unicode property.
Data provider struct definitions for this ICU4X component.
Data and APIs for supporting both Script and Script_Extensions property values in an efficient structure.
The functions in this module return a UnicodeSet
containing
the set of characters with a particular Unicode property.
Structs
Enumerated property Bidi_Class
Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
Enumerated property East_Asian_Width.
Groupings of multiple General_Category property values.
Enumerated property Grapheme_Cluster_Break.
Enumerated property Line_Break.
Enumerated property Script.
Enumerated property Sentence_Break. See “Default Sentence Boundary Specification” in UAX #29 for the summary of each property value: https://www.unicode.org/reports/tr29/#Default_Word_Boundaries.
Enumerated property Word_Break.
Enums
Selection constants for Unicode properties.
These constants are used to select one of the Unicode properties.
See UProperty
in ICU4C.
Enumerated property General_Category.