Crate miniboosts

Crate miniboosts 

Source
Expand description

A crate that provides some boosting algorithms. All the boosting algorithm in this crate, except LPBoost, has theoretical iteration bound until finding a combined hypothesis.

This crate includes three types of boosting algorithms.

This crate also includes some Weak Learners.

§Example

The following code shows a small example for running LPBoost.
See also:

use miniboosts::prelude::*;
 
// Read the training sample from the CSV file.
// We use the column named `class` as the label.
let path = "path/to/dataset.csv";
let sample = SampleReader::new()
    .file(path)
    .has_header(true)
    .target_feature("class")
    .read()
    .unwrap();
 
// Get the number of training examples.
let n_sample = data.shape().0 as f64;
 
// Initialize `LPBoost` and set the tolerance parameter as `0.01`.
// This means `booster` returns a hypothesis whose training error is
// less than `0.01` if the traing examples are linearly separable.
// Note that the default tolerance parameter is set as `1 / n_sample`,
// where `n_sample = data.shape().0` is 
// the number of training examples in `data`.
// Further, at the end of this chain,
// LPBoost calls `LPBoost::nu` to set the capping parameter 
// as `0.1 * n_sample`, which means that, 
// at most, `0.1 * n_sample` examples are regarded as outliers.
let booster = LPBoost::init(&sample)
    .tolerance(0.01)
    .nu(0.1 * n_sample);
 
// Set the weak learner with setting parameters.
let weak_learner = DecisionTreeBuilder::new(&sample)
    .max_depth(2)
    .criterion(Criterion::Entropy)
    .build();
 
// Run `LPBoost` and obtain the resulting hypothesis `f`.
let f = booster.run(&weak_learner);
 
// Get the predictions on the training set.
let predictions = f.predict_all(&data);
 
// Calculate the training loss.
let target = sample.target();
let training_loss = target.into_iter()
    .zip(predictions)
    .map(|(&y, fx)| if y as i64 == fx { 0.0 } else { 1.0 })
    .sum::<f64>()
    / n_sample;

println!("Training Loss is: {training_loss}");

Re-exports§

pub use research::Logger;
pub use research::LoggerBuilder;
pub use research::CrossValidation;
pub use research::objective_functions::SoftMarginObjective;
pub use research::objective_functions::HardMarginObjective;
pub use research::objective_functions::ExponentialLoss;

Modules§

prelude
Exports the standard boosting algorithms and traits.
research
This directory provides some features for research
Measure the followings of boosting algorithm per iteration

Structs§

AdaBoost
The AdaBoost algorithm proposed by Robert E. Schapire and Yoav Freund.
AdaBoostV
The AdaBoostV algorithm, proposed by Rätsch and Warmuth.
AdaBoostV, also known as AdaBoost_{ν}^{★}, is a boosting algorithm proposed in the following paper:
BadBaseLearner
The worst-case weak leaerner for LPBoost.
BadBaseLearnerBuilder
A struct that builds BadBaseLearner.
BadClassifier
A hypothesis returned by BadBaseLearner. This struct is user for demonstrating the worst-case LPBoost behavior.
CERLPBoost
The Corrective ERLPBoost algorithm, proposed in the following paper:
DecisionTree
The Decision Tree algorithm.
Given a set of training examples for classification and a distribution over the set, DecisionTree outputs a decision tree classifier named DecisionTreeClassifier under the specified parameters.
DecisionTreeBuilder
A struct that builds DecisionTree. DecisionTreeBuilder keeps parameters for constructing DecisionTree.
DecisionTreeClassifier
Decision tree classifier. This struct is just a wrapper of Node.
ERLPBoost
The ERLPBoost algorithm proposed in the following paper:
GBM
The Gradient Boosting Machine proposed in the following paper:
GaussianNB
A factory that produces a GaussianNBClassifier for a given distribution over training examples. The struct name comes from scikit-learn.
GraphSepBoost
The Graph Separation Boosting algorithm proposed by Robert E. Schapire and Yoav Freund.
LPBoost
The LPBoost algorithm proposed by Demiriz, Bennett, and Shawe-Taylor.
LPBoost is originally proposed in the following paper:
MLPBoost
The MLPBoost algorithm, shorthand of Modified LPBoost algorithm, proposed in the following paper:
MadaBoost
The MadaBoost algorithm proposed by Carlos Domingo and Osamu Watanabe, 2000.
NBayesClassifier
Naive Bayes classifier.
NNClassifier
A wrapper for NNHypothesis.
NNHypothesis
A neural network hypothesis, produced by NeuralNetwork.
NNRegressor
A wrapper for NNHypothesis.
NaiveAggregation
The naive aggregation rule. See the following paper for example:
NeuralNetwork
A neural network weak learner. Since this is just a weak learner, a shallow network is preferred. Of course, you can use a deep network if you don’t care about running time.
RegressionTree
RegressionTree is the factory that generates a RegressionTreeClassifier for a given distribution over examples.
RegressionTreeBuilder
A struct that builds RegressionTree. RegressionTreeBuilder keeps parameters for constructing RegressionTree.
RegressionTreeRegressor
Regression Tree regressor. This struct is just a wrapper of Node.
Sample
Struct Sample holds a batch sample with dense/sparse format.
SampleReader
A struct that returns Sample. Using this struct, one can read a CSV/SVMLIGHT format file to Sample. Other formats are not supported yet.
SmoothBoost
SmoothBoost. Variable names, such as kappa, gamma, and theta, come from the original paper.
Note that SmoothBoost needs to know the weak learner guarantee gamma.
See Figure 1 in this paper: Smooth Boosting and Learning with Malicious Noise by Rocco A. Servedio.
SoftBoost
The SoftBoost algorithm proposed in the following paper:
TotalBoost
The TotalBoost algorithm proposed in the following paper: Manfred K. Warmuth, Jun Liao, and Gunnar Rätsch - Totally corrective boosting algorithms that maximize the margin
WeightedMajority
A struct that the boosting algorithms in this library return. You can read/write this struct by Serde trait.

Enums§

Activation
Activation functions available to neural networks.
Criterion
Splitting criteria for growing decision tree.
FWType
Some useful functions / traits FWType updates. These options correspond to the Frank-Wolfe strategies.
Feature
An enumeration of sparse/dense feature.
GBMLoss
Some useful functions / traits Some well-known loss functions.
NNLoss
Loss functions available to Neural networks.

Traits§

Booster
Booster trait The trait Booster defines the standard framework of Boosting. Here, the standard framework is defined as a repeated game between Booster and Weak Learner of the following form:
Classifier
A trait that defines the behavor of classifier. You only need to implement confidence method.
LossFunction
Some useful functions / traits This trait defines the loss functions.
Regressor
A trait that defines the behavor of regressor. You only need to implement predict method.
WeakLearner
An interface that returns a struct of type Hypothesis.