bayespam
A simple bayesian spam classifier.
About
Bayesam is inspired by Naive Bayes classifiers, a popular statistical technique of e-mail filtering.
Here, the message to be identified is cut into simple words, also called tokens.
That are compared to all the corpus of messages (spam or not), to determine the frequency of different tokens in both categories.
A probabilistic formula is used to calculate the probability that the message is a spam.
When the probability is high enough, the classifier categorizes the message as likely a spam, otherwise as likely a ham.
The probability threshold is fixed at 0.8 by default.
Documentation
Learn more about Bayespam here: https://docs.rs/bayespam.
Usage
Add to your Cargo.toml
:
[dependencies]
bayespam = "1.0.0"
Use the pre-trained model provided
extern crate bayespam;
use bayespam::classifier;
fn main() -> Result<(), std::io::Error> {
let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
let score = classifier::score(spam)?;
let is_spam = classifier::identify(spam)?;
println!("{:.4?}", score);
println!("{:?}", is_spam);
let ham = "Hi Bob, can you send me your machine learning homework?";
let score = classifier::score(ham)?;
let is_spam = classifier::identify(ham)?;
println!("{:.4?}", score);
println!("{:?}", is_spam);
Ok(())
}
$> cargo run
0.9999
true
0.0604
false
Train your own model and save it as JSON into a file
extern crate bayespam;
use bayespam::classifier::Classifier;
use std::fs::File;
fn main() -> Result<(), std::io::Error> {
let mut classifier = Classifier::new();
let spam = "Don't forget our special promotion: -30% on men shoes, only today!";
classifier.train_spam(spam);
let ham = "Hi Bob, don't forget our meeting today at 4pm.";
classifier.train_ham(ham);
let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
let score = classifier.score(spam);
let is_spam = classifier.identify(spam);
println!("{:.4}", score);
println!("{}", is_spam);
let ham = "Hi Bob, can you send me your machine learning homework?";
let score = classifier.score(ham);
let is_spam = classifier.identify(ham);
println!("{:.4}", score);
println!("{}", is_spam);
let mut file = File::create("my_super_model.json")?;
classifier.save(&mut file, false)?;
Ok(())
}
$> cargo run
0.9851
true
0.0100
false
$> cat my_super_model.json
{"token_table":{"forget":{"ham":1,"spam":1},"only":{"ham":0,"spam":1},"meeting":{"ham":1,"spam":0},"our":{"ham":1,"spam":1},"dont":{"ham":1,"spam":1},"bob":{"ham":1,"spam":0},"men":{"ham":0,"spam":1},"today":{"ham":1,"spam":1},"shoes":{"ham":0,"spam":1},"special":{"ham":0,"spam":1},"promotion:":{"ham":0,"spam":1}}}
Contribution
Contributions via issues or pull requests are appreciated.
License
Bayespam is distributed under the terms of the MIT License.