Crate miniboosts

Expand description

A crate that provides some boosting algorithms. All the boosting algorithm in this crate, except LPBoost, has theoretical iteration bound until finding a combined hypothesis.

This crate includes three types of boosting algorithms.

Empirical risk minimizing (ERM) boosting
- AdaBoost,
- GraphSepBoost.
Hard margin maximizing boosting
- AdaBoostV,
- TotalBoost.
Soft margin maximizing boosting
- LPBoost,
- SoftBoost,
- SmoothBoost,
- ERLPBoost,
- CERLPBoost,
- MLPBoost.

This crate also includes some Weak Learners.

Classification
- DecisionTree,
- NeuralNetwork,
- GaussianNB,
- BadBaseLearner (The bad base learner for LPBoost).
Regression
- RegressionTree. Note that the current implement is not efficient.

Example

The following code shows a small example for running LPBoost.
See also:

use miniboosts::prelude::*;
 
// Read the training sample from the CSV file.
// We use the column named `class` as the label.
let path = "path/to/dataset.csv";
let sample = SampleReader::new()
    .file(path)
    .has_header(true)
    .target_feature("class")
    .read()
    .unwrap();
 
// Get the number of training examples.
let n_sample = data.shape().0 as f64;
 
// Initialize `LPBoost` and set the tolerance parameter as `0.01`.
// This means `booster` returns a hypothesis whose training error is
// less than `0.01` if the traing examples are linearly separable.
// Note that the default tolerance parameter is set as `1 / n_sample`,
// where `n_sample = data.shape().0` is 
// the number of training examples in `data`.
// Further, at the end of this chain,
// LPBoost calls `LPBoost::nu` to set the capping parameter 
// as `0.1 * n_sample`, which means that, 
// at most, `0.1 * n_sample` examples are regarded as outliers.
let booster = LPBoost::init(&sample)
    .tolerance(0.01)
    .nu(0.1 * n_sample);
 
// Set the weak learner with setting parameters.
let weak_learner = DecisionTreeBuilder::new(&sample)
    .max_depth(2)
    .criterion(Criterion::Entropy)
    .build();
 
// Run `LPBoost` and obtain the resulting hypothesis `f`.
let f = booster.run(&weak_learner);
 
// Get the predictions on the training set.
let predictions = f.predict_all(&data);
 
// Calculate the training loss.
let target = sample.target();
let training_loss = target.into_iter()
    .zip(predictions)
    .map(|(&y, fx)| if y as i64 == fx { 0.0 } else { 1.0 })
    .sum::<f64>()
    / n_sample;

println!("Training Loss is: {training_loss}");

Re-exports

pub use research::Logger;
pub use research::LoggerBuilder;
pub use research::objective_functions::SoftMarginObjective;
pub use research::objective_functions::HardMarginObjective;
pub use research::objective_functions::ExponentialLoss;

Modules

prelude
Exports the standard boosting algorithms and traits.
research
This directory provides some features for research
Measure the followings of boosting algorithm per iteration

Structs

AdaBoost
The AdaBoost algorithm proposed by Robert E. Schapire and Yoav Freund.
AdaBoostV
The AdaBoostV algorithm, proposed by Rätsch and Warmuth.
AdaBoostV, also known as AdaBoost_{ν}^{★}, is a boosting algorithm proposed in the following paper:
BadBaseLearner
The worst-case weak leaerner for LPBoost.
BadBaseLearnerBuilder
A struct that builds BadBaseLearner.
BadClassifier
A hypothesis returned by BadBaseLearner. This struct is user for demonstrating the worst-case LPBoost behavior.
CERLPBoost
The Corrective ERLPBoost algorithm, proposed in the following paper:
CombinedHypothesis
A struct that the boosting algorithms in this library return. You can read/write this struct by Serde trait.
DecisionTree
The Decision Tree algorithm.
Given a set of training examples for classification and a distribution over the set, DecisionTree outputs a decision tree classifier named DecisionTreeClassifier under the specified parameters.
DecisionTreeBuilder
A struct that builds DecisionTree. DecisionTreeBuilder keeps parameters for constructing DecisionTree.
DecisionTreeClassifier
Decision tree classifier. This struct is just a wrapper of Node.
ERLPBoost
The ERLPBoost algorithm proposed in the following paper:
GBM
The Gradient Boosting Machine proposed in the following paper:
GaussianNB
A factory that produces a GaussianNBClassifier for a given distribution over training examples. The struct name comes from scikit-learn.
GraphSepBoost
The Graph Separation Boosting algorithm proposed by Robert E. Schapire and Yoav Freund.
LPBoost
The LPBoost algorithm proposed by Demiriz, Bennett, and Shawe-Taylor.
LPBoost is originally proposed in the following paper:
MLPBoost
The MLPBoost algorithm, shorthand of Modified LPBoost algorithm, proposed in the following paper:
NBayesClassifier
Naive Bayes classifier.
NNClassifier
A wrapper for NNHypothesis.
NNHypothesis
A neural network hypothesis, produced by NeuralNetwork.
NNRegressor
A wrapper for NNHypothesis.
NaiveAggregation
The naive aggregation rule. See the following paper for example:
NeuralNetwork
A neural network weak learner. Since this is just a weak learner, a shallow network is preferred. Of course, you can use a deep network if you don’t care about running time.
RegressionTree
RegressionTree is the factory that generates a RegressionTreeClassifier for a given distribution over examples.
RegressionTreeBuilder
A struct that builds RegressionTree. RegressionTreeBuilder keeps parameters for constructing RegressionTree.
RegressionTreeRegressor
Regression Tree regressor. This struct is just a wrapper of Node.
Sample
Struct Sample holds a batch sample with dense/sparse format.
SampleReader
A struct that returns Sample. Using this struct, one can read a CSV/SVMLIGHT format file to Sample. Other formats are not supported yet.
SmoothBoost
SmoothBoost. Variable names, such as kappa, gamma, and theta, come from the original paper.
Note that SmoothBoost needs to know the weak learner guarantee gamma.
See Figure 1 in this paper: Smooth Boosting and Learning with Malicious Noise by Rocco A. Servedio.
SoftBoost
The SoftBoost algorithm proposed in the following paper:
TotalBoost
The TotalBoost algorithm proposed in the following paper: Manfred K. Warmuth, Jun Liao, and Gunnar Rätsch - Totally corrective boosting algorithms that maximize the margin

Enums

Activation
Activation functions available to neural networks.
Criterion
Splitting criteria for growing decision tree.
FWType
Some useful functions / traits FWType updates. These options correspond to the Frank-Wolfe strategies.
Feature
An enumeration of sparse/dense feature.
GBMLoss
Some useful functions / traits Some well-known loss functions.
LossType
The type of loss (error) function.
NNLoss
Loss functions available to Neural networks.

Traits

Booster
Booster trait The trait Booster defines the standard framework of Boosting. Here, the standard framework is defined as a repeated game between Booster and Weak Learner of the following form:
Classifier
A trait that defines the behavor of classifier. You only need to implement confidence method.
LossFunction
Some useful functions / traits This trait defines the loss functions.
Regressor
A trait that defines the behavor of regressor. You only need to implement predict method.
WeakLearner
An interface that returns a struct of type Hypothesis.