miniboosts 0.1.0

MiniBoosts: A collection of boosting algorithms written in Rust 🦀
Documentation

MiniBoosts

A collection of boosting algorithms written in Rust 🦀.

boosting comparison

This library uses Gurobi optimizer, so you must acquire a license to use this library.

Note that you need to put gurobi.lic in your home directory; otherwise, the compile fails. See this repository for details.

Features

Currently, I implemented the following Boosters and Weak Learners. You can combine them arbitrarily.

Classification

  • Boosters

    • AdaBoost by Freund and Schapire, 1997
    • AdaBoostV by Rätsch and Warmuth, 2005
    • TotalBoost by Warmuth, Liao, and Rätsch, 2006
    • LPBoost by Demiriz, Bennett, and Shawe-Taylor, 2002
    • SmoothBoost by Rocco A. Servedio, 2003
    • SoftBoost by Warmuth, Glocer, and Rätsch, 2007
    • ERLPBoost by Warmuth and Glocer, and Vishwanathan, 2008
    • CERLPBoost (The Corrective ERLPBoost) by Shalev-Shwartz and Singer, 2010
    • MLPBoost by Mitsuboshi, Hatano, and Takimoto, 2022
  • Weak Learners

    • DTree (Decision Tree)
    • GaussianNB (Naive Bayes), beta version
    • WLUnion, a union of multiple weak learners.

Regression

  • Weak Learner
    • RTree (Regression Tree)

Future work

  • Boosters

  • Weak Learners

    • Bag of words
    • TF-IDF
    • Two-Layer Neural Network
    • RBF-Net
  • Others

    • Parallelization
    • LP/QP solver (This work allows you to use this library without a license).

How to use

You can see the document by cargo doc --open command.

This library uses the DataFrame of polars crate, so that you need to import polars.

You need to write the following line to Cargo.toml.

miniboosts = { git = "https://github.com/rmitsuboshi/miniboosts" }

Here is a sample code:

use polars::prelude::*;
use miniboosts::prelude::*;


fn main() {
    // Set file name
    let file = "/path/to/input/data.csv";

    // Read a CSV file
    // Note that each feature of `data`, except the target column,
    // must be the `f64` type with no missing values.
    let mut data = CsvReader::from_path(file)
        .unwrap()
        .has_header(true)
        .finish()
        .unwrap();


    // Pick the target class. Each element is 1 or -1 of type `i64`.
    let target: Series = data.drop_in_place(&"class").unwrap();


    // Set tolerance parameter
    let tol: f64 = 0.01;


    // Initialize Booster
    let mut booster = AdaBoost::init(&data, &target)
        .tolerance(tol); // Set the tolerance parameter.


    // Initialize Weak Learner
    // For decision tree, the default `max_depth` is `None` so that 
    // The tree grows extremely large.
    let weak_learner = DTree::init(&data, &target)
        .max_depth(2) // Specify the max depth (default is not specified)
        .criterion(Criterion::Edge); // Choose the split rule that maximizes the edge.


    // Run boosting algorithm
    // Each booster returns a combined hypothesis.
    let f = booster.run(&weak_learner);


    // Get the batch prediction for all examples in `data`.
    let predictions = f.predict_all(&data);


    // You can predict the `i`th instance.
    let i = 0_usize;
    let prediction = f.predict(&data, i);
}

If you use boosting for soft margin optimization, initialize booster like this:

let n_sample = df.shape().0;
let nu = n_sample as f64 * 0.2;
let lpboost = LPBoost::init(&data, &target)
    .tolerance(tol)
    .nu(nu); // Setting the capping parameter.

Note that the capping parameter must satisfies 1 <= nu && nu <= n_sample.