# bayesian
A simple, fast Naive Bayes classifier with support for TF-IDF weighting and binary serialization.
## Installation
```bash
cargo add bayesian
```
## Usage
Class labels can be any type that implements `Eq + Hash + Clone` (an enum is the most natural choice).
```rust
use bayesian::Classifier;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum Category {
Spam,
Ham,
}
fn main() {
use Category::*;
let mut classifier = Classifier::new(vec![Spam, Ham]);
// Train
classifier.learn(&["buy", "cheap", "pills", "now", "offer"], &Spam);
classifier.learn(&["free", "prize", "winner", "click", "claim"], &Spam);
classifier.learn(&["hey", "are", "you", "coming", "to", "the", "meeting"], &Ham);
classifier.learn(&["let", "me", "know", "if", "you", "need", "anything"], &Ham);
// Classify
let doc = vec!["free", "offer", "click", "now"];
println!("{:?}", classifier.classify(&doc)); // => Spam
// Probability scores (sum to 1.0, index matches classifier.classes())
let probs = classifier.prob_scores(&doc);
for (class, prob) in classifier.classes().iter().zip(&probs) {
println!("{class:?}: {:.1}%", prob * 100.0);
}
}
```
## TF-IDF
Plain Naive Bayes is always available. Call `build_tfidf()` to compute TF-IDF
weights from your training data and unlock the `_tfidf` family of methods.
You can continue learning and call `build_tfidf()` again at any time — it
recomputes from scratch without discarding raw counts.
```rust
classifier.learn(&words, &class); // accumulates both raw counts and TF samples
classifier.build_tfidf(); // compute weights from all learned documents
let class = classifier.classify_tfidf(&doc);
let probs = classifier.prob_scores_tfidf(&doc);
let scores = classifier.log_scores_tfidf(&doc);
```
## Serialization
Serialize to an in-memory binary blob or directly to a file.
```rust
// In-memory
let bytes = classifier.serialize().expect("serialize failed");
let restored = Classifier::<Category>::from_data(&bytes).expect("deserialize failed");
// File
classifier.serialize_to_file("model.bin").expect("write failed");
let restored = Classifier::<Category>::from_file("model.bin").expect("read failed");
```
`serialize` / `from_data` require `C: serde::Serialize + serde::DeserializeOwned`. Add
`#[derive(serde::Serialize, serde::Deserialize)]` to your class label type.