1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
/*! A library for the hyphenation of UTF-8 strings

## Usage

A typical import comprises the [`Hyphenator`] trait, the [`Standard`]
dictionary type, and the [`Language`] enum. This exposes the crate's core
functionality, as well as the set of available languages.

```ignore
extern crate hyphenation;

use hyphenation::{Hyphenator, Standard, Language};
```

To begin with, we must initiate the hyphenation dictionary for our working
language. Dictionaries come bundled with the `hyphenation` crate, but they
must still be loaded into memory. The most convenient way to do so is the
[`Load`] trait.

```ignore
use hyphenation::Load;

let path_to_dict = "/path/to/english-dictionary.bincode";
let en_us = Standard::from_path(Language::EnglishUS, path_to_dict) ?;
```

Our English dictionary can now be used as a [`Hyphenator`].


### Hyphenators

As the primary interface of this library, hyphenators take care of seeking
out opportunities for hyphenation within individual words.

```ignore
let hyphenated = en_us.hyphenate("anfractuous");
```

The [`hyphenate`] method computes the indices of valid word breaks and wraps
them in a a small intermediate structure that can be further used to [iterate]
over word segments.

```ignore
let breaks = &hyphenated.breaks;
assert_eq!(breaks, &[2, 6, 8]);

let hyphenated_segments : Vec<&str>= hyphenated.iter().collect()
assert_eq!(hyphenated_segments, &["an-", "frac-", "tu-", "ous"]);
```

Both the [`Standard`] and [`Extended`] hyphenators are case-insensitive and
prioritize existing soft hyphens (U+00AD) over dictionary hyphenation.

```ignore
let word = "ribonuclease";
let word_shy = "ri\u{00ad}bo\u{00ad}nu\u{00ad}cle\u{00ad}ase";

let by_dictionary : Vec<&str> = en_us.hyphenate(word).into_iter().segments().collect();
let by_shy : Vec<&str> = en_us.hyphenate(word_shy).into_iter().segments().collect();

assert_eq!(by_dictionary, vec!["ri", "bonu", "cle", "ase"]);
assert_eq!(by_shy, vec!["ri", "\u{00ad}bo", "\u{00ad}nu", "\u{00ad}cle", "\u{00ad}ase"]);
assert_ne!(by_dictionary, by_shy);
```


## Identifying "words"

Knuth–Liang hyphenation operates at the level of individual words, but there
can be ambiguity as to what constitutes a *word*. All hyphenation dictionaries
handle the expected set of word-forming graphemes from their respective
alphabets, but some also accept punctuation marks such as hyphens and
apostrophes, and are thus capable of handling hyphen-joined compound words or
elisions. Even so, it's generally preferable to handle punctuation at the
level of segmentation, as it affords greater control over the final result
(such as where to break hyphen-joined compounds, or whether to set a leading
hyphen on new lines).


[`Hyphenator`]: hyphenator/trait.Hyphenator.html
[`Standard`]: struct.Standard.html
[`Language`]: enum.Language.html
[`Load`]: load/trait.Load.html
[`hyphenate`]: hyphenator/trait.Hyphenator#tymethod.hyphenate.html
[iterate]: iter/struct.Hyphenating.html
[`Extended`]: extended/struct.Extended.html
*/

extern crate atlatl;
extern crate bincode;
extern crate hyphenation_commons;


#[cfg(feature = "embed_all")] mod resources;
mod case_folding;
pub mod hyphenator;
pub mod extended;
pub mod iter;
pub mod load;
pub mod score;

pub use hyphenation_commons::Language;
pub use hyphenation_commons::dictionary::Standard;
pub use hyphenator::Hyphenator;
pub use iter::Iter;
pub use load::Load;