bayespam/
lib.rs

1//! # bayespam
2//!
3//! A simple bayesian spam classifier.
4//!
5//! ## About
6//!
7//! Bayespam is inspired by [Naive Bayes classifiers](https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering), a popular statistical technique of e-mail filtering.
8//!
9//! Here, the message to be identified is cut into simple words, also called tokens.
10//! That are compared to all the corpus of messages (spam or not), to determine the frequency of different tokens in both categories.
11//!
12//! A probabilistic formula is used to calculate the probability that the message is a spam.
13//! When the probability is high enough, the classifier categorizes the message as likely a spam, otherwise as likely a ham.
14//! The probability threshold is fixed at 0.8 by default.
15//!
16//! ## Usage
17//!
18//! Add to your `Cargo.toml` manifest:
19//!
20//! ```ini
21//! [dependencies]
22//! bayespam = "1.1.0"
23//! ```
24//!
25//! ### Use a pre-trained model
26//!
27//! Add a `model.json` file to your **package root**.
28//! Then, you can use it to **score** and **identify** messages:
29//!
30//! ```
31//! extern crate bayespam;
32//!
33//! use bayespam::classifier;
34//!
35//! fn main() -> Result<(), std::io::Error> {
36//!     // Identify a typical spam message
37//!     let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
38//!     let is_spam = classifier::identify(spam)?;
39//!     assert!(is_spam);
40//!
41//!     // Identify a typical ham message
42//!     let ham = "Hi Bob, can you send me your machine learning homework?";
43//!     let is_spam = classifier::identify(ham)?;
44//!     assert!(!is_spam);
45//!
46//!     Ok(())
47//! }
48//! ```
49//!
50//! ### Train your own model
51//!
52//! You can train a new model from scratch:
53//!
54//! ```
55//! extern crate bayespam;
56//!
57//! use bayespam::classifier::Classifier;
58//!
59//! fn main() {
60//!     // Create a new classifier with an empty model
61//!     let mut classifier = Classifier::new();
62//!
63//!     // Train the classifier with a new spam example
64//!     let spam = "Don't forget our special promotion: -30% on men shoes, only today!";
65//!     classifier.train_spam(spam);
66//!
67//!     // Train the classifier with a new ham example
68//!     let ham = "Hi Bob, don't forget our meeting today at 4pm.";
69//!     classifier.train_ham(ham);
70//!
71//!     // Identify a typical spam message
72//!     let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
73//!     let is_spam = classifier.identify(spam);
74//!     assert!(is_spam);
75//!
76//!     // Identify a typical ham message
77//!     let ham = "Hi Bob, can you send me your machine learning homework?";
78//!     let is_spam = classifier.identify(ham);
79//!     assert!(!is_spam);
80//! }
81//! ```
82
83pub mod classifier;