Expand description
§Vaporetto
Vaporetto is a fast and lightweight pointwise prediction based tokenizer.
§Examples
use std::fs::File;
use vaporetto::{Model, Predictor, Sentence};
let f = File::open("../resources/model.bin")?;
let model = Model::read(f)?;
let predictor = Predictor::new(model, true)?;
let mut buf = String::new();
let mut s = Sentence::default();
s.update_raw("まぁ社長は火星猫だ")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
assert_eq!(
"まぁ/名詞/マー 社長/名詞/シャチョー は/助詞/ワ 火星/名詞/カセー 猫/名詞/ネコ だ/助動詞/ダ",
buf,
);
s.update_raw("まぁ良いだろう")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
assert_eq!(
"まぁ/副詞/マー 良い/形容詞/ヨイ だろう/助動詞/ダロー",
buf,
);Tag prediction requires crate feature tag-prediction.
Training requires crate feature train. For more details, see Trainer.
Modules§
- errors
- Definition of errors.
Structs§
- Kytea
Model kytea - Model data created by KyTea.
- Model
- Model data.
- Predictor
- Predictor created from the model.
- Sentence
- Sentence data containing boundary and tag annotations.
- Token
- A Token information.
- Token
Iterator - Iterator returned by
Sentence::iter_tokens(). - Trainer
train - Trainer.
- Word
Weight Record - Record of weights for each word.
Enums§
- Character
Boundary - Boundary type.
- Character
Type - Character type.
- Solver
Type train - Solver type.
Constants§
- VERSION
- Version number of this library.