Expand description
Vaporetto
Vaporetto is a fast and lightweight pointwise prediction based tokenizer.
Examples
use std::fs::File;
use vaporetto::{Model, Predictor, Sentence};
let f = File::open("../resources/model.bin")?;
let model = Model::read(f)?;
let predictor = Predictor::new(model, true)?;
let mut buf = String::new();
let mut s = Sentence::default();
s.update_raw("まぁ社長は火星猫だ")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
assert_eq!(
"まぁ/名詞/マー 社長/名詞/シャチョー は/助詞/ワ 火星/名詞/カセー 猫/名詞/ネコ だ/助動詞/ダ",
buf,
);
s.update_raw("まぁ良いだろう")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
assert_eq!(
"まぁ/副詞/マー 良い/形容詞/ヨイ だろう/助動詞/ダロー",
buf,
);
Tag prediction requires crate feature tag-prediction
.
Training requires crate feature train
. For more details, see Trainer
.
Modules
- Definition of errors.
Structs
- KyteaModel
kytea
Model data created by KyTea. - Model data.
- Predictor created from the model.
- Sentence data containing boundary and tag annotations.
- A Token information.
- Iterator returned by
Sentence::iter_tokens()
. - Trainer
train
Trainer. - Record of weights for each word.
Enums
- Boundary type.
- Character type.
- SolverType
train
Solver type.
Constants
- Version number of this library.