Crate rust_icu_ubrk
source ·Expand description
§ICU text boundary analysis support for Rust
This crate provides a Rust implementation of the ICU text boundary analysis APIs
in ubrk.h
. Character (grapheme cluster), word, line-break, and sentence iterators
are available.
§Examples
Sample code use is given below.
use rust_icu_sys as sys;
use rust_icu_ubrk as ubrk;
let text = "The lazy dog jumped over the fox.";
let mut iter =
ubrk::UBreakIterator::try_new(sys::UBreakIteratorType::UBRK_WORD, "en", text)
.unwrap();
assert!(iter.is_boundary(0));
assert_eq!(0, iter.first());
assert_eq!(None, iter.previous());
assert_eq!(0, iter.current());
let text_len = text.len() as i32;
assert!(iter.is_boundary(text_len));
assert_eq!(iter.last_boundary(), text_len);
assert_eq!(None, iter.next());
assert_eq!(iter.current(), text_len);
let word_start = text.find("jumped").unwrap() as i32;
let word_end = word_start + 6;
assert!(iter.is_boundary(word_start));
assert!(iter.is_boundary(word_end));
assert!(!iter.is_boundary(word_start + 3));
assert_eq!(word_end, iter.following(word_start + 3));
assert_eq!(word_end, iter.current());
assert_eq!(Some(word_start), iter.previous());
assert_eq!(word_start, iter.current());
assert_eq!(Some(word_end), iter.next());
assert_eq!(word_end, iter.current());
assert_eq!(word_start, iter.preceding(word_start + 3));
assert_eq!(word_start, iter.current());
// Reset to first boundary and consume `iter`.
iter.first();
let boundaries: Vec<i32> = iter.collect();
assert_eq!(vec![3, 4, 8, 9, 12, 13, 19, 20, 24, 25, 28, 29, 32, 33], boundaries);
See the ICU user guide
and the C API documentation in the
ubrk.h
header
for details.
Structs§
- Iterator over the locales for which text breaking information is available.
- Rust wrapper for the ICU
UBreakIterator
type.
Constants§
- Returned by break iterator to indicate that all text boundaries have been returned.