[][src]Crate rust_icu_ubrk

ICU text boundary analysis support for Rust

This crate provides a Rust implementation of the ICU text boundary analysis APIs in ubrk.h. Character (grapheme cluster), word, line-break, and sentence iterators are available.

Examples

Sample code use is given below.

use rust_icu_sys as sys;
use rust_icu_ubrk as ubrk;

let text = "The lazy dog jumped over the fox.";
let mut iter =
    ubrk::UBreakIterator::try_new(sys::UBreakIteratorType::UBRK_WORD, "en", text)
        .unwrap();

assert!(iter.is_boundary(0));
assert_eq!(0, iter.first());
assert_eq!(None, iter.previous());
assert_eq!(0, iter.current());

let text_len = text.len() as i32;
assert!(iter.is_boundary(text_len));
assert_eq!(iter.last_boundary(), text_len);
assert_eq!(None, iter.next());
assert_eq!(iter.current(), text_len);

let word_start = text.find("jumped").unwrap() as i32;
let word_end = word_start + 6;
assert!(iter.is_boundary(word_start));
assert!(iter.is_boundary(word_end));
assert!(!iter.is_boundary(word_start + 3));
assert_eq!(word_end, iter.following(word_start + 3));
assert_eq!(word_end, iter.current());
assert_eq!(Some(word_start), iter.previous());
assert_eq!(word_start, iter.current());
assert_eq!(Some(word_end), iter.next());
assert_eq!(word_end, iter.current());
assert_eq!(word_start, iter.preceding(word_start + 3));
assert_eq!(word_start, iter.current());

// Reset to first boundary and consume `iter`.
iter.first();
let boundaries: Vec<i32> = iter.collect();
assert_eq!(vec![3, 4, 8, 9, 12, 13, 19, 20, 24, 25, 28, 29, 32, 33], boundaries);

See the ICU user guide and the C API documentation in the ubrk.h header for details.

Structs

Locales

Iterator over the locales for which text breaking information is available.

UBreakIterator

Rust wrapper for the ICU UBreakIterator type.

Constants

UBRK_DONE

Returned by break iterator to indicate that all text boundaries have been returned.