Crate hyphenation [] [src]

Text hyphenation in a variety of languages.

Usage

A typical import comprises the Hyphenation trait, the Standard hyphenator, and the Language enum. This exposes the crate's core functionality, and the set of available languages.

extern crate hyphenation;
use hyphenation::{Hyphenation, Standard, Language};

To begin with, we must initiate the Corpus for our working language.

let english_us = hyphenation::load(Language::English_US).unwrap();

Our English Corpus can now be used by hyphenators: iterators which segment text in accordance with hyphenation practices, as described by the corpus.

The simplest (and, presently, only) hyphenator is Standard:

let h: Standard = "hyphenation".hyphenate(&english_us);

The Standard hyphenator does not allocate new strings, returning slices instead.

let v: Vec<&str> = h.collect();
assert_eq!(v, vec!["hy", "phen", "ation"]);

While hyphenation is performed on a per-word basis, convenience calls for Hyphenation to work with full text by default.

let h2: Standard = "Word hyphenation by computer.".hyphenate(&english_us);
let v2: Vec<&str> = h2.collect();
assert_eq!(v2, vec!["Word hy", "phen", "ation by com", "puter."]);

Moreover, hyphenators expose some simple methods to render hyphenated text: punctuate() and punctuate_with(string), which mark hyphenation opportunities respectively with soft hyphens (Unicode U+00AD SOFT HYPHEN) and any given string.

let h3 = "anfractuous".hyphenate(&english_us);
let s3: String = h2.clone().punctuate().collect();
assert_eq!(s3, "an\u{ad}frac\u{ad}tu\u{ad}ous".to_owned());

let s4: String = h2.punctuate_with("-").collect()
assert_eq!(s4, "an-frac-tu-ous".to_owned());

If we would rather manipulate our text in other ways, we may employ opportunities(), which returns the byte indices of hyphenation opportunities within the string. (Internally, opportunities() is the fundamental method required by Hyphenation; other functionality is implemented on top of it.)

let indices = "hyphenation".opportunities(&english_us);
assert_eq!(indices, vec![2, 6]);

Reexports

pub use hyphenator::{Hyphenation, Standard};
pub use language::{Language, Corpus};
pub use load::{language as load};

Modules

exception

Data structures and methods for parsing and applying exceptions, which assign predetermined scores to specific words.

hyphenator

Hyphenating iterators.

language

Languages we can hyphenate and their default parameters, as provided by the TeX hyph-utf8 package.

load

IO operations for pattern and exception data provided by hyph-UTF8 and stored in the patterns folder.

pattern

Data structures and methods for parsing and applying Knuth-Liang hyphenation patterns.