bayesian 0.3.0

A naive Bayesian classifier with optional TF-IDF support
Documentation

bayesian

A simple, fast Naive Bayes classifier with support for TF-IDF weighting and binary serialization.

Installation

cargo add bayesian

Usage

Class labels can be any type that implements Eq + Hash + Clone (an enum is the most natural choice).

use bayesian::Classifier;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum Category {
    Spam,
    Ham,
}

fn main() {
    use Category::*;

    let mut classifier = Classifier::new(vec![Spam, Ham]);

    // Train
    classifier.learn(&["buy", "cheap", "pills", "now", "offer"], &Spam);
    classifier.learn(&["free", "prize", "winner", "click", "claim"], &Spam);
    classifier.learn(&["hey", "are", "you", "coming", "to", "the", "meeting"], &Ham);
    classifier.learn(&["let", "me", "know", "if", "you", "need", "anything"], &Ham);

    // Classify
    let doc = vec!["free", "offer", "click", "now"];
    println!("{:?}", classifier.classify(&doc)); // => Spam

    // Probability scores (sum to 1.0, index matches classifier.classes())
    let probs = classifier.prob_scores(&doc);
    for (class, prob) in classifier.classes().iter().zip(&probs) {
        println!("{class:?}: {:.1}%", prob * 100.0);
    }
}

TF-IDF

Plain Naive Bayes is always available. Call build_tfidf() to compute TF-IDF weights from your training data and unlock the _tfidf family of methods.

You can continue learning and call build_tfidf() again at any time — it recomputes from scratch without discarding raw counts.

classifier.learn(&words, &class);   // accumulates both raw counts and TF samples
classifier.build_tfidf();           // compute weights from all learned documents

let class = classifier.classify_tfidf(&doc);
let probs = classifier.prob_scores_tfidf(&doc);
let scores = classifier.log_scores_tfidf(&doc);

Serialization

Serialize to an in-memory binary blob or directly to a file.

// In-memory
let bytes = classifier.serialize().expect("serialize failed");
let restored = Classifier::<Category>::from_data(&bytes).expect("deserialize failed");

// File
classifier.serialize_to_file("model.bin").expect("write failed");
let restored = Classifier::<Category>::from_file("model.bin").expect("read failed");

serialize / from_data require C: serde::Serialize + serde::DeserializeOwned. Add #[derive(serde::Serialize, serde::Deserialize)] to your class label type.