forester 0.0.3

A crate for implementing various flavors of random forests and decision trees.
Documentation

forester

A rust crate for tailoring random forests and decision trees to your data set.

The aim of this project is to provide generic functionality for working with random forests. It is currently in a very early development stage. Key elements of the API are starting to stabilize, so if you happen to have anything to say about it, now would be a good time to open an Issue here

Don't forget to check out the examples in the repository.

Overview

This implementation of random forests is heavily inspired by (1). In particular, models for classification, regression, and density estimation will be provided in a unified framework based on traits.

Conceptually, the crate provides two main parts:

  1. A generic framework consisting of
    • Functionality for fitting and predicting trees and forests
    • Traits that allow these functions to understand arbitrary user data
  2. Common building blocks for plugging into the framework
    • Split/Performance criteria (RMSE, GINI, ...)
    • Split Finding strategies (best random, CART, ...)
    • Ensemble combiners (aggregating, boosting - to be done)

Usage

Most implementations of random forests work on tabular data, more or less randomly selecting which feature columns to try for a particular split. This works only with a finite set of predefined features. However, as described in (1), random forests can work with infinite-dimensional feature spaces. In other words, the parameter that identifies a feature can be continuous value rather than a discrete column index.

An example of an infinite-dimensional feature space is a feature that is formed as the linear combination of two columns (see rotational_classifier example). Which features to use and how to interpret them strongly depends on the data, so it hardly makes sense to provide a few arbitrary feature extraction methods. Instead, the work of reasoning about the data is deferred to the users of the crate, who need to implement the SampleDescription and TrainingData traits. These traits define how features are parameterized and extracted from the data, how the final prediction in tree leaves is made, how to evaluate splits, and much more...

Examples

Examples can be found in the repository.

Literature

  1. A. Criminisi, J. Shotton and E. Konukoglu, "Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning", Microsoft Research technical report TR-2011-114 (PDF)