[−][src]Module smartcore::linear::linear_regression

Linear Regression

Linear regression is a very straightforward approach for predicting a quantitative response \(y\) on the basis of a linear combination of explanatory variables \(X\). Linear regression assumes that there is approximately a linear relationship between \(X\) and \(y\). Formally, we can write this linear relationship as

\[y \approx \beta_0 + \sum_{i=1}^n \beta_iX_i + \epsilon\]

where \(\epsilon\) is a mean-zero random error term and the regression coefficients \(\beta_0, \beta_0, ... \beta_n\) are unknown, and must be estimated.

While regression coefficients can be estimated directly by solving

\[\hat{\beta} = (X^TX)^{-1}X^Ty \]

the \((X^TX)^{-1}\) term is both computationally expensive and numerically unstable. An alternative approach is to use a matrix decomposition to avoid this operation. SmartCore uses SVD and QR matrix decomposition to find estimates of \(\hat{\beta}\). The QR decomposition is more computationally efficient and more numerically stable than calculating the normal equation directly, but does not work for all data matrices. Unlike the QR decomposition, all matrices have an SVD decomposition.

Example:

use smartcore::linalg::naive::dense_matrix::*;
use smartcore::linear::linear_regression::*;

// Longley dataset (https://www.statsmodels.org/stable/datasets/generated/longley.html)
let x = DenseMatrix::from_2d_array(&[
              &[234.289, 235.6, 159.0, 107.608, 1947., 60.323],
              &[259.426, 232.5, 145.6, 108.632, 1948., 61.122],
              &[258.054, 368.2, 161.6, 109.773, 1949., 60.171],
              &[284.599, 335.1, 165.0, 110.929, 1950., 61.187],
              &[328.975, 209.9, 309.9, 112.075, 1951., 63.221],
              &[346.999, 193.2, 359.4, 113.270, 1952., 63.639],
              &[365.385, 187.0, 354.7, 115.094, 1953., 64.989],
              &[363.112, 357.8, 335.0, 116.219, 1954., 63.761],
              &[397.469, 290.4, 304.8, 117.388, 1955., 66.019],
              &[419.180, 282.2, 285.7, 118.734, 1956., 67.857],
              &[442.769, 293.6, 279.8, 120.445, 1957., 68.169],
              &[444.546, 468.1, 263.7, 121.950, 1958., 66.513],
              &[482.704, 381.3, 255.2, 123.366, 1959., 68.655],
              &[502.601, 393.1, 251.4, 125.368, 1960., 69.564],
              &[518.173, 480.6, 257.2, 127.852, 1961., 69.331],
              &[554.894, 400.7, 282.7, 130.081, 1962., 70.551],
         ]);

let y: Vec<f64> = vec![83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0,
          100.0, 101.2, 104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9];

let lr = LinearRegression::fit(&x, &y,
            LinearRegressionParameters::default().
            with_solver(LinearRegressionSolverName::QR)).unwrap();

let y_hat = lr.predict(&x).unwrap();

References:

Structs

LinearRegression	Linear Regression
LinearRegressionParameters	Linear Regression parameters

Enums

LinearRegressionSolverName

Approach to use for estimation of regression coefficients. QR is more efficient but SVD is more stable.