Expand description

ICU text boundary analysis support for Rust

This crate provides a Rust implementation of the ICU text boundary analysis APIs in ubrk.h. Character (grapheme cluster), word, line-break, and sentence iterators are available.


Sample code use is given below.

use rust_icu_sys as sys;
use rust_icu_ubrk as ubrk;

let text = "The lazy dog jumped over the fox.";
let mut iter =
    ubrk::UBreakIterator::try_new(sys::UBreakIteratorType::UBRK_WORD, "en", text)

assert_eq!(0, iter.first());
assert_eq!(None, iter.previous());
assert_eq!(0, iter.current());

let text_len = text.len() as i32;
assert_eq!(iter.last_boundary(), text_len);
assert_eq!(None, iter.next());
assert_eq!(iter.current(), text_len);

let word_start = text.find("jumped").unwrap() as i32;
let word_end = word_start + 6;
assert!(!iter.is_boundary(word_start + 3));
assert_eq!(word_end, iter.following(word_start + 3));
assert_eq!(word_end, iter.current());
assert_eq!(Some(word_start), iter.previous());
assert_eq!(word_start, iter.current());
assert_eq!(Some(word_end), iter.next());
assert_eq!(word_end, iter.current());
assert_eq!(word_start, iter.preceding(word_start + 3));
assert_eq!(word_start, iter.current());

// Reset to first boundary and consume `iter`.
let boundaries: Vec<i32> = iter.collect();
assert_eq!(vec![3, 4, 8, 9, 12, 13, 19, 20, 24, 25, 28, 29, 32, 33], boundaries);

See the ICU user guide and the C API documentation in the ubrk.h header for details.


Iterator over the locales for which text breaking information is available.

Rust wrapper for the ICU UBreakIterator type.


Returned by break iterator to indicate that all text boundaries have been returned.