langid-rs
NGram-based text classifier written in Rust.
This is not fully ready for use because it lacks pre-trained models and proper documentation.
Usage
Classifying using pre-trained models
Use the glob crate to get a list of files. Filenames will be used as names for models.
extern crate langid;
extern crate glob;
use Classifier;
use glob;
Training and classifying on the fly
extern crate langid;
use Classifier;
Training
Run cargo install langid to get the langid CLI utility.
langid train [-o FILE] <FILE FILE...>
Create a model based on input text files. Write to stdout or to the file specified by -o or --output.
Credits
Implements algorithm described by William B. Cavnar and John M. Trenkle, “N-Gram-Based Text Categorization”, 1994.