truecase/
lib.rs

1//! This is a simple statistical truecasing library.
2//!
3//! _Truecasing_ is restoration of original letter cases in text:
4//! for example, turning all-uppercase, or all-lowercase text into
5//! one that has proper sentence casing (capital first letter,
6//! capitalized names etc).
7//!
8//! This crate attempts to solve this problem by gathering statistics
9//! from a set of training sentences, then using those statistics
10//! to truecase sentences with broken casings. It comes with a
11//! command-line utility that makes training a model easy.
12//!
13//! # Quick usage example
14//!
15//! ```
16//! use truecase::{Model, ModelTrainer};
17//!
18//! // build a statistical model from sample sentences
19//! let mut trainer = ModelTrainer::new();
20//! trainer.add_sentence("There are very few writers as good as Shakespeare");
21//! trainer.add_sentence("You and I will have to disagree about this");
22//! trainer.add_sentence("She never came back from USSR");
23//! let model = trainer.into_model();
24//!
25//! // use gathered statistics to restore case in caseless text
26//! let truecased_text = model.truecase("i don't think shakespeare was born in ussr");
27//! assert_eq!(truecased_text, "I don't think Shakespeare was born in USSR");
28//! ```
29//!
30//! # Building a model a model using the CLI tool
31//!
32//! 1. Create a file containing training sentences. Each sentence
33//!    must be on its own line and have proper casing. The bigger
34//!    the training set, the better and more accurate the model will be.
35//!
36//! 2. Use `truecase` CLI tool to build a model. This may take some time,
37//!    depending on the size of the training set. The following command will
38//!    read training data from `training_sentences.txt` file and write
39//!    the model into `model.json` file.
40//!
41//!    ```bash
42//!    truecase train -i training_sentences.txt -o model.json
43//!    ```
44//!
45//!    Run `truecase train --help` for more details.
46//!
47
48mod errors;
49mod tokenizer;
50mod trainer;
51mod truecase;
52mod utils;
53
54pub use crate::errors::{ModelLoadingError, ModelSavingError};
55pub use crate::trainer::ModelTrainer;
56pub use crate::truecase::Model;