Crate rust_icu_ubrk

source ·
Expand description

§ICU text boundary analysis support for Rust

This crate provides a Rust implementation of the ICU text boundary analysis APIs in ubrk.h. Character (grapheme cluster), word, line-break, and sentence iterators are available.

§Examples

Sample code use is given below.

use rust_icu_sys as sys;
use rust_icu_ubrk as ubrk;

let text = "The lazy dog jumped over the fox.";
let mut iter =
    ubrk::UBreakIterator::try_new(sys::UBreakIteratorType::UBRK_WORD, "en", text)
        .unwrap();

assert!(iter.is_boundary(0));
assert_eq!(0, iter.first());
assert_eq!(None, iter.previous());
assert_eq!(0, iter.current());

let text_len = text.len() as i32;
assert!(iter.is_boundary(text_len));
assert_eq!(iter.last_boundary(), text_len);
assert_eq!(None, iter.next());
assert_eq!(iter.current(), text_len);

let word_start = text.find("jumped").unwrap() as i32;
let word_end = word_start + 6;
assert!(iter.is_boundary(word_start));
assert!(iter.is_boundary(word_end));
assert!(!iter.is_boundary(word_start + 3));
assert_eq!(word_end, iter.following(word_start + 3));
assert_eq!(word_end, iter.current());
assert_eq!(Some(word_start), iter.previous());
assert_eq!(word_start, iter.current());
assert_eq!(Some(word_end), iter.next());
assert_eq!(word_end, iter.current());
assert_eq!(word_start, iter.preceding(word_start + 3));
assert_eq!(word_start, iter.current());

// Reset to first boundary and consume `iter`.
iter.first();
let boundaries: Vec<i32> = iter.collect();
assert_eq!(vec![3, 4, 8, 9, 12, 13, 19, 20, 24, 25, 28, 29, 32, 33], boundaries);

See the ICU user guide and the C API documentation in the ubrk.h header for details.

Structs§

  • Iterator over the locales for which text breaking information is available.
  • Rust wrapper for the ICU UBreakIterator type.

Constants§

  • Returned by break iterator to indicate that all text boundaries have been returned.