hyphenation 0.5.0

Standard Knuth-Liang hyphenation based on the TeX UTF-8 patterns.
Documentation

hyphenation

Standard Knuth-Liang hyphenation based on the TeX UTF-8 patterns.

[dependencies]
hyphenation = "0.5.0"

Documentation

Quickstart

use hyphenation::{Hyphenation, Standard};
use hyphenation::Language::{English_US};

// Load hyphenation data for American English from the pattern repository.
let english_us = hyphenation::load(English_US).unwrap();

// The byte indices of valid hyphenation points within a word.
let indices = "hyphenation".opportunities(&english_us);
assert_eq!(indices, vec![2, 6]);

// An iterator that breaks a word according to standard hyphenation practices.
let h: Standard = "hyphenation".hyphenate(&english_us);
                // hy-phen-ation

// Collect the lazy hyphenator `h` into substring slices over the original string.
let v: Vec<&str> = h.collect();
assert_eq!(v, vec!["hy", "phen", "ation"]);


// Hyphenation works with full text as well as individual words.
use hyphenation::FullTextHyphenation;

let text_indices = "Word hyphenation by computer.".fulltext_opportunities(&english_us);
assert_eq!(text_indices, vec![7, 11, 23]);

let h2: Standard = "Word hyphenation by computer.".fulltext_hyphenate(&english_us);
let v2: Vec<&str> = h2.collect();
assert_eq!(v2, vec!["Word hy", "phen", "ation by com", "puter."]);


// Mark hyphenation opportunities with soft hyphens,
// and render the result to a new String.
let h3 = "anfractuous".hyphenate(&english_us);
let s3: String = h3.punctuate().collect();
assert_eq!(s3, "an\u{ad}frac\u{ad}tu\u{ad}ous".to_owned());

Unicode Normalization

hyphenation operates on strings in Normalization Form C, as described by the Unicode Standard Annex #15 and provided by the unicode-normalization crate.

This form is ubiquitous, and you probably need not worry about it. Nevertheless, it would be best to ensure NFC when working with any of the following languages:

  • Assamese
  • Bengali
  • Church Slavonic
  • Greek (Ancient, Monotonic, Polytonic)
  • Punjabi
  • Sanskrit

Pattern Data

The script used to parse, normalize, and convert the TeX hyphenation patterns may be found at ndr-qef/hyph-utf8.json.

License

hyphenation © 2016 ndr-qef, dual-licensed under the terms of either:

  • The Apache License, Version 2.0
  • The MIT license

texhyphen hyphenation patterns © their respective owners; see lic.txt files for licensing information.