Module easy_ml::naive_bayes

Expand description

Naïve Bayes examples

Naïve Bayes

The naïve bayes assumption is that all features in the labelled data are independent of each other given the class they correspond to. This means the probability of some input given a class can be computed as the product of each individual feature in that input conditioned on that class.

By Baye’s Theorum we can relate the probability of the class given the input to the probability of the input given the class. As a classifier only needs to determine which class some input is most likely to be we can compare just the product of the probability of the input given a class and the probability of that class.

Bayes’ Theorum

posterior = ( prior x likelihood ) / evidence

P(C_k | x) = ( P(C_k) * P(x | C_k) ) / P(x)

P(C_k | x) ∝ P(C_k) * P(x | C_k)

where C_k is the kth class and x is the input to classify.

Classifier

taking logs on Bayes’ rule yields

log(P(C_k | x)) ∝ log(P(C_k)) + log(P(x | C_k))

given the naïve bayes assumption

log(P(C_k | x)) ∝ log(P(C_k)) + the sum over all i features of (log(P(x_i | C_k)))

Then to determine the class we take the class corresponding to the largest log(P(C_k | x)).

Computing the individual probabilities of a feature conditioned on a class depends on what the type of data is.

For categorical data this is simply occurances in class / total occurances. In practise laplacian smoothing (adding one occurance for each class to the computed probabilities) may be used to avoid computing a probability of 0 when some category doesn’t have any samples for a class.

For continuous data we can model the feature as distributed according to a Gaussian distribution.

Simple Naïve Bayes Example with F-1 score analysis

Naïve Bayes can be done by hand (with a calculator) which is what the below example will show. We have a list of data about the environment and want to predict if we should go outside based on the conditions. Some of these are categorical values and others are real valued.

use easy_ml::distributions::Gaussian;
use easy_ml::linear_algebra;

#[derive(Clone, Copy, PartialEq, Debug)]
enum Weather {
    Stormy, Rainy, Cloudy, Clear, Sunny
}

type WindSpeed = f64;

#[derive(Clone, Copy, PartialEq, Debug)]
enum Pandemic {
    Pandemic, NoPandemic
}

#[derive(Clone, Copy, PartialEq, Debug)]
enum Decision {
    GoOut, StayIn
}

#[derive(Clone, Copy, PartialEq, Debug)]
struct Observation {
    weather: Weather,
    wind: WindSpeed,
    pandemic: Pandemic,
    decision: Decision,
}

impl Observation {
    fn new(
    weather: Weather, wind: WindSpeed, pandemic: Pandemic, decision: Decision
) -> Observation {
        Observation {
            weather,
            wind,
            pandemic,
            decision
        }
    }
}

let observations = vec![
    Observation::new(Weather::Clear, 0.5, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Clear, 0.9, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Clear, 0.8, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Stormy, 0.7, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Rainy, 0.1, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Rainy, 0.5, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Rainy, 0.6, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Rainy, 0.7, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Cloudy, 0.3, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Cloudy, 0.5, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Cloudy, 0.2, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Cloudy, 0.8, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Sunny, 0.3, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Sunny, 0.9, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Sunny, 0.5, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Sunny, 0.5, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Sunny, 0.5, Pandemic::Pandemic, Decision::StayIn),
    Observation::new(Weather::Clear, 0.1, Pandemic::Pandemic, Decision::StayIn),
    Observation::new(Weather::Clear, 0.9, Pandemic::Pandemic, Decision::StayIn)
];

fn predict(
    observations: &Vec<Observation>, weather: Weather, wind: WindSpeed, pandemic: Pandemic
) -> Decision {
    let total = observations.len() as f64;
    // first compute the number of each class in the data
    let total_stay_in = observations.iter()
        .filter(|observation| observation.decision == Decision::StayIn)
        .count() as f64;
    let total_go_out = observations.iter()
        .filter(|observation| observation.decision == Decision::GoOut)
        .count() as f64;

    let weather_log_probability_stay_in = {
        // compute how many rows in the data are this weather and stay in
        let total = observations.iter()
            .filter(|observation| observation.decision == Decision::StayIn)
            .filter(|observation| observation.weather == weather)
            .count() as f64;
        // there are 5 variants for the weather and we use laplacian smoothing
        // to avoid introducing zero probabilities, +1 / +5 treats the data
        // as if there is one stay in for each weather type.
        ((total + 1.0) / (total_stay_in + 5.0)).ln()
    };

    let weather_log_probability_go_out = {
        // compute how many rows in the data are this weather and go out
        let total = observations.iter()
            .filter(|observation| observation.decision == Decision::GoOut)
            .filter(|observation| observation.weather == weather)
            .count() as f64;
        // there are 5 variants for the weather and we use laplacian smoothing
        // to avoid introducing zero probabilities, +1 / +5 treats the data
        // as if there is one go out for each weather type.
        ((total + 1.0) / (total_go_out + 5.0)).ln()
    };

    // we're modelling the wind as a Gaussian so we get a probability by
    // computing the probability density for the wind we have, if it is
    // the same as the mean wind for stay in then it will be closest to 1
    // and the further away the closer to 0 it will be
    let wind_speed_model_stay_in: Gaussian<WindSpeed> = Gaussian::approximating(
        observations.iter()
        .filter(|observation| observation.decision == Decision::StayIn)
        .map(|observation| observation.wind)
    );
    let wind_log_probability_stay_in = wind_speed_model_stay_in.probability(&wind);

    let wind_speed_model_go_out: Gaussian<WindSpeed> = Gaussian::approximating(
        observations.iter()
        .filter(|observation| observation.decision == Decision::GoOut)
        .map(|observation| observation.wind)
    );
    let wind_log_probability_go_out = wind_speed_model_go_out.probability(&wind);

    let pandemic_log_probability_stay_in = {
        // compute how many rows in the data are this pandemic and stay in
        let total = observations.iter()
            .filter(|observation| observation.decision == Decision::StayIn)
            .filter(|observation| observation.pandemic == pandemic)
            .count() as f64;
        // there are 2 variants for the pandemic type and we use laplacian smoothing
        // to avoid introducing zero probabilities, +1 / +2 treats the data
        // as if there is one stay in for each pandemic type.
        ((total + 1.0) / (total_stay_in + 2.0)).ln()
    };

    let pandemic_log_probability_go_out = {
        // compute how many rows in the data are this pandemic and go out
        let total = observations.iter()
            .filter(|observation| observation.decision == Decision::GoOut)
            .filter(|observation| observation.pandemic == pandemic)
            .count() as f64;
        // there are 2 variants for the pandemic type and we use laplacian smoothing
        // to avoid introducing zero probabilities, +1 / +2 treats the data
        // as if there is one go out for each pandemic type.
        ((total + 1.0) / (total_go_out + 2.0)).ln()
    };

    let prior_log_probability_stay_in = total_stay_in / total;
    let prior_log_probability_go_out = total_go_out / total;

    let posterior_log_probability_stay_in = (
        prior_log_probability_stay_in
        + weather_log_probability_stay_in
        + wind_log_probability_stay_in
        + pandemic_log_probability_stay_in
    );
    let posterior_log_probability_go_out = (
        prior_log_probability_go_out
        + weather_log_probability_go_out
        + wind_log_probability_go_out
        + pandemic_log_probability_go_out
    );

    if posterior_log_probability_go_out > posterior_log_probability_stay_in {
        Decision::GoOut
    } else {
        Decision::StayIn
    }
}

let test_data = vec![
    Observation::new(Weather::Sunny, 0.8, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Sunny, 0.2, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Stormy, 0.2, Pandemic::NoPandemic, Decision::StayIn),
    Observation::new(Weather::Cloudy, 0.3, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Rainy, 0.8, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Rainy, 0.5, Pandemic::NoPandemic, Decision::GoOut),
    Observation::new(Weather::Stormy, 0.6, Pandemic::Pandemic, Decision::StayIn),
    Observation::new(Weather::Rainy, 0.1, Pandemic::Pandemic, Decision::StayIn),
];

let predictions = test_data.iter()
    .map(|data| predict(&observations, data.weather, data.wind, data.pandemic))
    .collect::<Vec<Decision>>();

println!("Test data and predictions\n{:?}", test_data.iter()
    .cloned()
    .zip(predictions.clone())
    .collect::<Vec<(Observation, Decision)>>());

println!("Accuracy: {:?}", test_data.iter()
    .zip(predictions.clone())
    .map(|(data, decision)| if data.decision == decision { 1.0 } else { 0.0 })
    .sum::<f64>() / (test_data.len() as f64));

// To compute Recall and Precision it is neccessary to decide which class should be
// considered the positive and which should be the negative. For medical diagnosis
// this is always diagnosing something as present. For this example we take GoOut to
// be the positive case.

// True Positives are where the model predicts the positive class when it is
// the correct decision, eg in this scenario, going outside when it should
// decide to go outside
let true_positives = test_data.iter()
    .cloned()
    .zip(predictions.clone())
    .filter(|(data, _)| data.decision == Decision::GoOut)
    .filter(|(_, decision)| decision == &Decision::GoOut)
    .count() as f64;

// False Positives are when the model predicts the positive class when it is
// not the correct decision, eg in this scenario, going outside when it should
// decide to stay in
let false_positives = test_data.iter()
    .cloned()
    .zip(predictions.clone())
    .filter(|(data, _)| data.decision == Decision::StayIn)
    .filter(|(_, decision)| decision == &Decision::GoOut)
    .count() as f64;

// True Negatives are when the model predicts the negative class when it is
// the correct decision, eg in this scenario, staying in when it should
// decide to stay in
// there's a pandemic and no good reason to go outside
let true_negatives = test_data.iter()
    .cloned()
    .zip(predictions.clone())
    .filter(|(data, _)| data.decision == Decision::StayIn)
    .filter(|(_, decision)| decision == &Decision::StayIn)
    .count() as f64;

// False Negatives are when the model predicts the negative class when it is
// not the correct decision, eg in this scenario, staying in when it should
// decide to go outside
// there's a pandemic and no good reason to go outside
let false_negatives = test_data.iter()
    .cloned()
    .zip(predictions.clone())
    .filter(|(data, _)| data.decision == Decision::GoOut)
    .filter(|(_, decision)| decision == &Decision::StayIn)
    .count() as f64;

// Precision measures how good the model is at identifying the positive
// class (you can trivially get 100% precision by never predicting the
// positive class, as this means you can't get a false positive).
let precision = true_positives / (true_positives + false_positives);

// Recall is the true positive rate which is how good the model is at
// identifying the positive class out of all the positive cases (you can
// trivially get 100% recall by always predicting the positive class).
let recall = true_positives / (true_positives + false_negatives);

// The F-1 score is the harmonic mean of precision and recall and
// averages them for an accuracy measure.
// In this case the two classes are roughly equally likely, so the F-1
// score and Accuracy are similar. However, if the model had learned
// to always predict GoOut, then it would still have an accuracy of
// roughly 50% because of the equal likelihood, whereas its F-1 score
// would be much lower than 50%.
let f1_score = linear_algebra::f1_score(precision, recall);
println!("F1-Score: {:?}", f1_score);

assert!(f1_score > 0.8);

3 Class Naïve Bayes Example

See submodule for 3 Class Naïve Bayes Example

Modules

three_class

3 Class Naïve Bayes Example