Trait UTFStringExtensions

Source

pub trait UTFStringExtensions {
    // Required methods
    fn count_graphemes(&self) -> usize;
    fn get_grapheme(&self, index: usize) -> &str;
    fn get_graphemes(&self) -> Vec<&str>;
    fn get_grapheme_chunk(&self, offset: usize) -> Vec<&str>;

    // Provided methods
    fn take_grapheme<'a>(
        &self,
        graphemes: &Vec<&'a str>,
        index: usize,
    ) -> RUMString { ... }
    fn get_grapheme_window(
        &self,
        min: usize,
        max: usize,
        offset: usize,
    ) -> RUMString { ... }
    fn get_grapheme_string(&self, end_pattern: &str, offset: usize) -> RUMString { ... }
    fn find_grapheme(&self, pattern: &str, offset: usize) -> &str { ... }
    fn truncate(&self, max_size: usize) -> RUMString { ... }
}

Expand description

Implemented indexing trait for String and str which uses the UnicodeSegmentation facilities to enable grapheme iteration by default. There could be some performance penalty, but it will allow for native Unicode support to the best extent possible.

We also enable decoding from Encoding Standard encodings to UTF-8.

Required Methods§

Source

fn count_graphemes(&self) -> usize

Source

fn get_grapheme(&self, index: usize) -> &str

Return a grapheme unit which could span multiple Unicode codepoints or “characters”.

§Note

    If the grapheme requested does not exists, this method will return a blank string.

Instead of just retrieving a codepoint as character, I decided to take it a step further and have support for grapheme selection such that characters in written language like sanskrit can be properly selected and evaluated.

[!CAUTION] This can be an extremely slow operation over large strings since each call to this method will need to rescan the input string every time we need to look up a grapheme. Unfortunately, this is a side effect of convenience. To improve performance, call .get_graphemes() once and then call take_grapheme() over that iterator.

Source