Trait textwrap::word_separators::WordSeparator[][src]

pub trait WordSeparator: WordSeparatorClone + Debug {
    fn find_words<'a>(
        &self,
        line: &'a str
    ) -> Box<dyn Iterator<Item = Word<'a>> + 'a>; }
Expand description

Describes where words occur in a line of text.

The simplest approach is say that words are separated by one or more ASCII spaces (' '). This works for Western languages without emojis. A more complex approach is to use the Unicode line breaking algorithm, which finds break points in non-ASCII text.

The line breaks occur between words, please see the WordSplitter trait for options of how to handle hyphenation of individual words.

Examples

use textwrap::core::Word;
use textwrap::word_separators::{WordSeparator, AsciiSpace};

let words = AsciiSpace.find_words("Hello World!").collect::<Vec<_>>();
assert_eq!(words, vec![Word::from("Hello "), Word::from("World!")]);

Required methods

Find all words in line.

Implementations on Foreign Types

Implementors

Split line into words separated by regions of ' ' characters.

Examples

use textwrap::core::Word;
use textwrap::word_separators::{AsciiSpace, WordSeparator};

let words = AsciiSpace.find_words("Hello   World!").collect::<Vec<_>>();
assert_eq!(words, vec![Word::from("Hello   "),
                       Word::from("World!")]);

Split line into words using Unicode break properties.

This word separator uses the Unicode line breaking algorithm described in Unicode Standard Annex #14 to find legal places to break lines. There is a small difference in that the U+002D (Hyphen-Minus) and U+00AD (Soft Hyphen) don’t create a line break: to allow a line break at a hyphen, use the HyphenSplitter. Soft hyphens are not currently supported.

Examples

Unlike AsciiSpace, the Unicode line breaking algorithm will find line break opportunities between some characters with no intervening whitespace:

#[cfg(feature = "unicode-linebreak")] {
use textwrap::word_separators::{WordSeparator, UnicodeBreakProperties};
use textwrap::core::Word;

assert_eq!(UnicodeBreakProperties.find_words("Emojis: 😂😍").collect::<Vec<_>>(),
           vec![Word::from("Emojis: "),
                Word::from("😂"),
                Word::from("😍")]);

assert_eq!(UnicodeBreakProperties.find_words("CJK: 你好").collect::<Vec<_>>(),
           vec![Word::from("CJK: "),
                Word::from("你"),
                Word::from("好")]);
}

A U+2060 (Word Joiner) character can be inserted if you want to manually override the defaults and keep the characters together:

#[cfg(feature = "unicode-linebreak")] {
use textwrap::word_separators::{UnicodeBreakProperties, WordSeparator};
use textwrap::core::Word;

assert_eq!(UnicodeBreakProperties.find_words("Emojis: 😂\u{2060}😍").collect::<Vec<_>>(),
           vec![Word::from("Emojis: "),
                Word::from("😂\u{2060}😍")]);
}

The Unicode line breaking algorithm will also automatically suppress break breaks around certain punctuation characters::

#[cfg(feature = "unicode-linebreak")] {
use textwrap::word_separators::{UnicodeBreakProperties, WordSeparator};
use textwrap::core::Word;

assert_eq!(UnicodeBreakProperties.find_words("[ foo ] bar !").collect::<Vec<_>>(),
           vec![Word::from("[ foo ] "),
                Word::from("bar !")]);
}