[−][src]Crate kl_hyphenate
A library for the hyphenation of UTF-8 strings
Usage
A typical import comprises the Hyphenator
trait, the Standard
dictionary type, and the Language
enum. This exposes the crate's core
functionality, as well as the set of available languages.
extern crate kl_hyphenate; use kl_hyphenate::{Hyphenator, Standard, Language};
To begin with, we must initiate the hyphenation dictionary for our working
language. Dictionaries come bundled with the hyphenation
crate, but they
must still be loaded into memory. The most convenient way to do so is the
Load
trait.
use kl_hyphenate::Load; let path_to_dict = "/path/to/english-dictionary.bincode"; let en_us = Standard::from_path(Language::EnglishUS, path_to_dict) ?;
Our English dictionary can now be used as a Hyphenator
.
Hyphenators
As the primary interface of this library, hyphenators take care of seeking out opportunities for hyphenation within individual words.
let hyphenated = en_us.hyphenate("anfractuous");
The hyphenate
method computes the indices of valid word breaks and wraps
them in a a small intermediate structure that can be further used to iterate
over word segments.
let breaks = &hyphenated.breaks; assert_eq!(breaks, &[2, 6, 8]); let hyphenated_segments : Vec<&str>= hyphenated.iter().collect() assert_eq!(hyphenated_segments, &["an-", "frac-", "tu-", "ous"]);
Both the Standard
and Extended
hyphenators are case-insensitive and
prioritize existing soft hyphens (U+00AD) over dictionary hyphenation.
let word = "ribonuclease"; let word_shy = "ri\u{00ad}bo\u{00ad}nu\u{00ad}cle\u{00ad}ase"; let by_dictionary : Vec<&str> = en_us.hyphenate(word).into_iter().segments().collect(); let by_shy : Vec<&str> = en_us.hyphenate(word_shy).into_iter().segments().collect(); assert_eq!(by_dictionary, vec!["ri", "bonu", "cle", "ase"]); assert_eq!(by_shy, vec!["ri", "\u{00ad}bo", "\u{00ad}nu", "\u{00ad}cle", "\u{00ad}ase"]); assert_ne!(by_dictionary, by_shy);
Identifying "words"
Knuth–Liang hyphenation operates at the level of individual words, but there can be ambiguity as to what constitutes a word. All hyphenation dictionaries handle the expected set of word-forming graphemes from their respective alphabets, but some also accept punctuation marks such as hyphens and apostrophes, and are thus capable of handling hyphen-joined compound words or elisions. Even so, it's generally preferable to handle punctuation at the level of segmentation, as it affords greater control over the final result (such as where to break hyphen-joined compounds, or whether to set a leading hyphen on new lines).
Re-exports
pub use hyphenator::Hyphenator; |
pub use iter::Iter; |
pub use load::Load; |
Modules
extended | Extended Knuth-Liang hyphenation |
hyphenator | Methods for hyphenation dictionaries |
iter | Hyphenating iterators over strings. |
load | Reading and loading hyphenation dictionaries |
score | Evaluating potential hyphenation opportunities |
Structs
Standard | A dictionary for standard Knuth–Liang hyphenation. |
Enums
Language | The set of languages available for hyphenation. |