[][src]Module smartcore::tree::decision_tree_regressor

Regression tree for for dependent variables that take continuous or ordered discrete values.

Decision Tree Regressor

The process of building a decision tree can be simplified to these two steps:

  1. Divide the predictor space \(X\) into K distinct and non-overlapping regions, \(R_1, R_2, ..., R_K\).
  2. For every observation that falls into the region \(R_k\), we make the same prediction, which is simply the mean of the response values for the training observations in \(R_k\).

Regions \(R_1, R_2, ..., R_K\) are build in such a way that minimizes the residual sum of squares (RSS) given by

\[RSS = \sum_{k=1}^K\sum_{i \in R_k} (y_i - \hat{y}_{Rk})^2\]

where \(\hat{y}_{Rk}\) is the mean response for the training observations withing region k.

SmartCore uses recursive binary splitting approach to build \(R_1, R_2, ..., R_K\) regions. The approach begins at the top of the tree and then successively splits the predictor space one predictor at a time. At each step of the tree-building process, the best split is made at that particular step, rather than looking ahead and picking a split that will lead to a better tree in some future step.

Example:

use smartcore::linalg::naive::dense_matrix::*;
use smartcore::tree::decision_tree_regressor::*;

// Longley dataset (https://www.statsmodels.org/stable/datasets/generated/longley.html)
let x = DenseMatrix::from_2d_array(&[
            &[234.289, 235.6, 159., 107.608, 1947., 60.323],
            &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
            &[258.054, 368.2, 161.6, 109.773, 1949., 60.171],
            &[284.599, 335.1, 165., 110.929, 1950., 61.187],
            &[328.975, 209.9, 309.9, 112.075, 1951., 63.221],
            &[346.999, 193.2, 359.4, 113.27, 1952., 63.639],
            &[365.385, 187., 354.7, 115.094, 1953., 64.989],
            &[363.112, 357.8, 335., 116.219, 1954., 63.761],
            &[397.469, 290.4, 304.8, 117.388, 1955., 66.019],
            &[419.18, 282.2, 285.7, 118.734, 1956., 67.857],
            &[442.769, 293.6, 279.8, 120.445, 1957., 68.169],
            &[444.546, 468.1, 263.7, 121.95, 1958., 66.513],
            &[482.704, 381.3, 255.2, 123.366, 1959., 68.655],
            &[502.601, 393.1, 251.4, 125.368, 1960., 69.564],
            &[518.173, 480.6, 257.2, 127.852, 1961., 69.331],
            &[554.894, 400.7, 282.7, 130.081, 1962., 70.551],
       ]);
let y: Vec<f64> = vec![
            83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0,
            101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9,
       ];

let tree = DecisionTreeRegressor::fit(&x, &y, Default::default()).unwrap();

let y_hat = tree.predict(&x).unwrap(); // use the same data for prediction

References:

Structs

DecisionTreeRegressor

Regression Tree

DecisionTreeRegressorParameters

Parameters of Regression Tree