Crate rust_icu_ubrk
source · [−]Expand description
ICU text boundary analysis support for Rust
This crate provides a Rust implementation of the ICU text boundary analysis APIs
in ubrk.h
. Character (grapheme cluster), word, line-break, and sentence iterators
are available.
Examples
Sample code use is given below.
use rust_icu_sys as sys;
use rust_icu_ubrk as ubrk;
let text = "The lazy dog jumped over the fox.";
let mut iter =
ubrk::UBreakIterator::try_new(sys::UBreakIteratorType::UBRK_WORD, "en", text)
.unwrap();
assert!(iter.is_boundary(0));
assert_eq!(0, iter.first());
assert_eq!(None, iter.previous());
assert_eq!(0, iter.current());
let text_len = text.len() as i32;
assert!(iter.is_boundary(text_len));
assert_eq!(iter.last_boundary(), text_len);
assert_eq!(None, iter.next());
assert_eq!(iter.current(), text_len);
let word_start = text.find("jumped").unwrap() as i32;
let word_end = word_start + 6;
assert!(iter.is_boundary(word_start));
assert!(iter.is_boundary(word_end));
assert!(!iter.is_boundary(word_start + 3));
assert_eq!(word_end, iter.following(word_start + 3));
assert_eq!(word_end, iter.current());
assert_eq!(Some(word_start), iter.previous());
assert_eq!(word_start, iter.current());
assert_eq!(Some(word_end), iter.next());
assert_eq!(word_end, iter.current());
assert_eq!(word_start, iter.preceding(word_start + 3));
assert_eq!(word_start, iter.current());
// Reset to first boundary and consume `iter`.
iter.first();
let boundaries: Vec<i32> = iter.collect();
assert_eq!(vec![3, 4, 8, 9, 12, 13, 19, 20, 24, 25, 28, 29, 32, 33], boundaries);
See the ICU user guide
and the C API documentation in the
ubrk.h
header
for details.
Structs
Iterator over the locales for which text breaking information is available.
Rust wrapper for the ICU UBreakIterator
type.
Constants
Returned by break iterator to indicate that all text boundaries have been returned.