Crate truecase [−] [src]
This is a simple statistical truecasing library.
Truecasing is restoration of original letter cases in text: for example, turning all-uppercase, or all-lowercase text into one that has proper sentence casing (capital first letter, capitalized names etc).
This crate attempts to solve this problem by gathering statistics from a set of training sentences, then using those statistics to truecase sentences with broken casings. It comes with a command-line utility that makes training a model easy.
Training a model using the CLI tool
Create a file containing training sentences. Each sentence must be on its own line and have proper casing. The bigger the training set, the better and more accurate the model will be.
Use
truecase
CLI tool to build a model. This may take some time, depending on the size of the training set. The following command will read training data fromtraining_sentences.txt
file and write the model intomodel.json
file.truecase train -i training_sentences.txt -o model.json
Run
truecase train --help
for more details.
Training a model from Rust
use truecase::ModelTrainer; let mut trainer = ModelTrainer::new(); trainer.add_sentence("Here's a sample training sentence for truecasing"); trainer.add_sentences_from_file("training_data.txt")?; let model = trainer.into_model(); model.save_to_file("model.json")?;
See also ModelTrainer
.
Using a model to truecase text
use truecase::Model; let model = Model::load_from_file("model.json")?; let truecased_text = model.truecase("i don't think shakespeare would approve of this sample text"); assert_eq!(truecase_text, "I don't think Shakespeare would approve of this sample text");
See also Model
.
For truecasing using the CLI tool, see truecase truecase --help
.
Structs
Model |
Truecasing model itself. |
ModelTrainer |
Trainer for new truecasing models. |