Crate miniboosts

source ·
Expand description

A crate that provides some boosting algorithms. All the boosting algorithm in this crate, except LPBoost, has theoretical iteration bound until finding a combined hypothesis.

This crate includes three types of boosting algorithms.

This crate also includes some Weak Learners.

Example

The following code shows a small example for running LPBoost.
See also:

use miniboosts::prelude::*;
 
// Read the training sample from the CSV file.
// We use the column named `class` as the label.
let path = "path/to/dataset.csv";
let sample = SampleReader::new()
    .file(path)
    .has_header(true)
    .target_feature("class")
    .read()
    .unwrap();
 
// Get the number of training examples.
let n_sample = data.shape().0 as f64;
 
// Initialize `LPBoost` and set the tolerance parameter as `0.01`.
// This means `booster` returns a hypothesis whose training error is
// less than `0.01` if the traing examples are linearly separable.
// Note that the default tolerance parameter is set as `1 / n_sample`,
// where `n_sample = data.shape().0` is 
// the number of training examples in `data`.
// Further, at the end of this chain,
// LPBoost calls `LPBoost::nu` to set the capping parameter 
// as `0.1 * n_sample`, which means that, 
// at most, `0.1 * n_sample` examples are regarded as outliers.
let booster = LPBoost::init(&sample)
    .tolerance(0.01)
    .nu(0.1 * n_sample);
 
// Set the weak learner with setting parameters.
let weak_learner = DecisionTreeBuilder::new(&sample)
    .max_depth(2)
    .criterion(Criterion::Entropy)
    .build();
 
// Run `LPBoost` and obtain the resulting hypothesis `f`.
let f = booster.run(&weak_learner);
 
// Get the predictions on the training set.
let predictions = f.predict_all(&data);
 
// Calculate the training loss.
let target = sample.target();
let training_loss = target.into_iter()
    .zip(predictions)
    .map(|(&y, fx)| if y as i64 == fx { 0.0 } else { 1.0 })
    .sum::<f64>()
    / n_sample;

println!("Training Loss is: {training_loss}");

Re-exports

Modules

  • Exports the standard boosting algorithms and traits.
  • This directory provides some features for research
    Measure the followings of boosting algorithm per iteration

Structs

  • The AdaBoost algorithm proposed by Robert E. Schapire and Yoav Freund.
  • The AdaBoostV algorithm, proposed by Rätsch and Warmuth.
    AdaBoostV, also known as AdaBoost_{ν}^{★}, is a boosting algorithm proposed in the following paper:
  • The worst-case weak leaerner for LPBoost.
  • A struct that builds BadBaseLearner.
  • A hypothesis returned by BadBaseLearner. This struct is user for demonstrating the worst-case LPBoost behavior.
  • The Corrective ERLPBoost algorithm, proposed in the following paper:
  • A struct that the boosting algorithms in this library return. You can read/write this struct by Serde trait.
  • The Decision Tree algorithm.
    Given a set of training examples for classification and a distribution over the set, DecisionTree outputs a decision tree classifier named DecisionTreeClassifier under the specified parameters.
  • A struct that builds DecisionTree. DecisionTreeBuilder keeps parameters for constructing DecisionTree.
  • Decision tree classifier. This struct is just a wrapper of Node.
  • The ERLPBoost algorithm proposed in the following paper:
  • The Gradient Boosting Machine proposed in the following paper:
  • A factory that produces a GaussianNBClassifier for a given distribution over training examples. The struct name comes from scikit-learn.
  • The Graph Separation Boosting algorithm proposed by Robert E. Schapire and Yoav Freund.
  • The LPBoost algorithm proposed by Demiriz, Bennett, and Shawe-Taylor.
    LPBoost is originally proposed in the following paper:
  • The MLPBoost algorithm, shorthand of Modified LPBoost algorithm, proposed in the following paper:
  • Naive Bayes classifier.
  • A wrapper for NNHypothesis.
  • A neural network hypothesis, produced by NeuralNetwork.
  • A wrapper for NNHypothesis.
  • The naive aggregation rule. See the following paper for example:
  • A neural network weak learner. Since this is just a weak learner, a shallow network is preferred. Of course, you can use a deep network if you don’t care about running time.
  • RegressionTree is the factory that generates a RegressionTreeClassifier for a given distribution over examples.
  • A struct that builds RegressionTree. RegressionTreeBuilder keeps parameters for constructing RegressionTree.
  • Regression Tree regressor. This struct is just a wrapper of Node.
  • Struct Sample holds a batch sample with dense/sparse format.
  • A struct that returns Sample. Using this struct, one can read a CSV/SVMLIGHT format file to Sample. Other formats are not supported yet.
  • SmoothBoost. Variable names, such as kappa, gamma, and theta, come from the original paper.
    Note that SmoothBoost needs to know the weak learner guarantee gamma.
    See Figure 1 in this paper: Smooth Boosting and Learning with Malicious Noise by Rocco A. Servedio.
  • The SoftBoost algorithm proposed in the following paper:

Enums

  • Activation functions available to neural networks.
  • Splitting criteria for growing decision tree.
  • Some useful functions / traits FWType updates. These options correspond to the Frank-Wolfe strategies.
  • An enumeration of sparse/dense feature.
  • Some useful functions / traits Some well-known loss functions.
  • The type of loss (error) function.
  • Loss functions available to Neural networks.

Traits

  • Booster trait The trait Booster defines the standard framework of Boosting. Here, the standard framework is defined as a repeated game between Booster and Weak Learner of the following form:
  • A trait that defines the behavor of classifier. You only need to implement confidence method.
  • Some useful functions / traits This trait defines the loss functions.
  • A trait that defines the behavor of regressor. You only need to implement predict method.
  • An interface that returns a struct of type Hypothesis.