pub trait UTFStringExtensions {
// Required methods
fn count_graphemes(&self) -> usize;
fn get_grapheme(&self, index: usize) -> &str;
fn get_graphemes(&self) -> Vec<&str>;
fn get_grapheme_chunk(&self, offset: usize) -> Vec<&str>;
// Provided methods
fn take_grapheme<'a>(
&self,
graphemes: &Vec<&'a str>,
index: usize,
) -> RUMString { ... }
fn get_grapheme_window(
&self,
min: usize,
max: usize,
offset: usize,
) -> RUMString { ... }
fn get_grapheme_string(&self, end_pattern: &str, offset: usize) -> RUMString { ... }
fn find_grapheme(&self, pattern: &str, offset: usize) -> &str { ... }
fn truncate(&self, max_size: usize) -> RUMString { ... }
}
Expand description
Implemented indexing trait for String and str which uses the UnicodeSegmentation facilities to enable grapheme iteration by default. There could be some performance penalty, but it will allow for native Unicode support to the best extent possible.
We also enable decoding from Encoding Standard encodings to UTF-8.
Required Methods§
fn count_graphemes(&self) -> usize
Sourcefn get_grapheme(&self, index: usize) -> &str
fn get_grapheme(&self, index: usize) -> &str
Return a grapheme unit which could span multiple Unicode codepoints or “characters”.
§Note
If the grapheme requested does not exists, this method will return a blank string.
Instead of just retrieving a codepoint as character, I decided to take it a step further and have support for grapheme selection such that characters in written language like sanskrit can be properly selected and evaluated.
[!CAUTION] This can be an extremely slow operation over large strings since each call to this method will need to rescan the input string every time we need to look up a grapheme. Unfortunately, this is a side effect of convenience. To improve performance, call .get_graphemes() once and then call take_grapheme() over that iterator.