Crate words_count

Source
Expand description

§Words Count

Count the words and characters, with or without whitespaces.

The algorithm is roughly aligned with the way LibreOffice is counting words. This means that it does not exactly match the Unicode Text Segmentation standard.

§Examples

use words_count::WordsCount;

assert_eq!(WordsCount {
    words: 20,
    characters: 31,
    whitespaces: 2,
    cjk: 18,
}, words_count::count("Rust是由 Mozilla 主導開發的通用、編譯型程式語言。"));
let result = words_count::count_separately("apple banana apple");

assert_eq!(2, result.len());
assert_eq!(Some(&2), result.get("apple"));

Structs§

WordsCount

Constants§

NEWLINE
A WordsCount equivalent to words_count::count(“\n”).

Functions§

count
Count the words in the given string. In general, every non-CJK string of characters between two whitespaces is a word. Dashes (at least two dashes) are word limit, too. A CJK character is considered to be an independent word.
count_separately
Count the words separately in the given string. In general, every non-CJK string of characters between two whitespaces is a word. Dashes (at least two dashes) are word limit, too. A CJK character is considered to be an independent word. Punctuations are not handled.