langid 0.0.1

NGram-based language identification
Documentation
langid-rs
=========

[![Build Status](https://travis-ci.org/despawnerer/langid-rs.svg?branch=master)](https://travis-ci.org/despawnerer/langid-rs)

NGram-based text classifier written in Rust.

This is not fully ready for use because it lacks pre-trained models and proper documentation.


Usage
-----

### Classifying using pre-trained models

Use the [glob](https://crates.io/crates/glob) crate to get a list of files. Filenames will be used as names for models.

```rust
extern crate langid;
extern crate glob;

use langid::Classifier;
use glob::glob;


fn main() {
	let paths = glob("./language_profiles/*.json").unwrap().filter_map(Result::ok);
	let classifier = Classifier::from_files(paths);

    let language = classifier.classify("Sample text that you want classified.");
    println!("Sample language: {}", language);
}
```


### Training and classifying on the fly

```rust
extern crate langid;

use langid::Classifier;


fn main() {
	let first_language_training_text = "...";
	let second_language_training_text = "...";

	let mut classifier = Classifier::new();
    classifier.train(first_language_training_text, "first");
    classifier.train(second_language_training_text, "second");

    let language = classifier.classify("Sample in the first language.");
    println!("Sample language: {}", language);
}
```


### Training

Run `cargo install langid` to get the `langid` CLI utility.

	langid train [-o FILE] <FILE FILE...>

Create a model based on input text files. Write to stdout or to the file specified by `-o` or `--output`.


Credits
-------

Implements algorithm described by William B. Cavnar and John M. Trenkle, “N-Gram-Based Text Categorization”, 1994.