[][src]Module smartcore::linear::ridge_regression

Ridge Regression

Linear regression is the standard algorithm for predicting a quantitative response \(y\) on the basis of a linear combination of explanatory variables \(X\) that assumes that there is approximately a linear relationship between \(X\) and \(y\). Ridge regression is an extension to linear regression that adds L2 regularization term to the loss function during training. This term encourages simpler models that have smaller coefficient values.

In ridge regression coefficients \(\beta_0, \beta_0, ... \beta_n\) are are estimated by solving

\[\hat{\beta} = (X^TX + \alpha I)^{-1}X^Ty \]

where \(\alpha \geq 0\) is a tuning parameter that controls strength of regularization. When \(\alpha = 0\) the penalty term has no effect, and ridge regression will produce the least squares estimates. However, as \(\alpha \rightarrow \infty\), the impact of the shrinkage penalty grows, and the ridge regression coefficient estimates will approach zero.

SmartCore uses SVD and Cholesky matrix decomposition to find estimates of \(\hat{\beta}\). The Cholesky decomposition is more computationally efficient and more numerically stable than calculating the normal equation directly, but does not work for all data matrices. Unlike the Cholesky decomposition, all matrices have an SVD decomposition.

Example:

use smartcore::linalg::naive::dense_matrix::*;
use smartcore::linear::ridge_regression::*;

// Longley dataset (https://www.statsmodels.org/stable/datasets/generated/longley.html)
let x = DenseMatrix::from_2d_array(&[
              &[234.289, 235.6, 159.0, 107.608, 1947., 60.323],
              &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
              &[258.054, 368.2, 161.6, 109.773, 1949., 60.171],
              &[284.599, 335.1, 165.0, 110.929, 1950., 61.187],
              &[328.975, 209.9, 309.9, 112.075, 1951., 63.221],
              &[346.999, 193.2, 359.4, 113.270, 1952., 63.639],
              &[365.385, 187.0, 354.7, 115.094, 1953., 64.989],
              &[363.112, 357.8, 335.0, 116.219, 1954., 63.761],
              &[397.469, 290.4, 304.8, 117.388, 1955., 66.019],
              &[419.180, 282.2, 285.7, 118.734, 1956., 67.857],
              &[442.769, 293.6, 279.8, 120.445, 1957., 68.169],
              &[444.546, 468.1, 263.7, 121.950, 1958., 66.513],
              &[482.704, 381.3, 255.2, 123.366, 1959., 68.655],
              &[502.601, 393.1, 251.4, 125.368, 1960., 69.564],
              &[518.173, 480.6, 257.2, 127.852, 1961., 69.331],
              &[554.894, 400.7, 282.7, 130.081, 1962., 70.551],
         ]);

let y: Vec<f64> = vec![83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0,
          100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9];

let y_hat = RidgeRegression::fit(&x, &y, RidgeRegressionParameters::default().with_alpha(0.1)).
                and_then(|lr| lr.predict(&x)).unwrap();

References:

Structs

RidgeRegression

Ridge regression

RidgeRegressionParameters

Ridge Regression parameters

Enums

RidgeRegressionSolverName

Approach to use for estimation of regression coefficients. Cholesky is more efficient but SVD is more stable.