linreg-core
A lightweight, self-contained linear regression library written in Rust. Compiles to WebAssembly for browser use, Python bindings via PyO3, or runs as a native Rust crate.
Key design principle: All linear algebra and statistical distribution functions are implemented from scratch — no external math libraries required. This keeps binary sizes small and makes the crate highly portable.
Table of Contents
| Section | Description |
|---|---|
| Features | Regression methods, model statistics, diagnostic tests |
| Rust Usage | Native Rust crate usage |
| WebAssembly Usage | Browser/JavaScript usage |
| Python Usage | Python bindings via PyO3 |
| Feature Flags | Build configuration options |
| Validation | Testing and verification |
| Implementation Notes | Technical details |
Features
Regression Methods
- OLS Regression: Coefficients, standard errors, t-statistics, p-values, confidence intervals, model selection criteria (AIC, BIC, log-likelihood)
- Ridge Regression: L2-regularized regression with optional standardization, effective degrees of freedom, model selection criteria
- Lasso Regression: L1-regularized regression via coordinate descent with automatic variable selection, convergence tracking, model selection criteria
- Elastic Net: Combined L1 + L2 regularization for variable selection with multicollinearity handling, active set convergence, model selection criteria
- LOESS: Locally estimated scatterplot smoothing for non-parametric curve fitting with configurable span, polynomial degree, and robust fitting
- Lambda Path Generation: Create regularization paths for cross-validation
Model Statistics
- Fit Metrics: R-squared, Adjusted R-squared, F-statistic, F-test p-value
- Error Metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE)
- Model Selection: Log-likelihood, AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion)
- Residuals: Raw residuals, standardized residuals, fitted values, leverage (hat matrix diagonal)
- Multicollinearity: Variance Inflation Factor (VIF) for each predictor
Diagnostic Tests
| Category | Tests |
|---|---|
| Linearity | Rainbow Test, Harvey-Collier Test, RESET Test |
| Heteroscedasticity | Breusch-Pagan (Koenker variant), White Test (R & Python methods) |
| Normality | Jarque-Bera, Shapiro-Wilk (n ≤ 5000), Anderson-Darling |
| Autocorrelation | Durbin-Watson, Breusch-Godfrey (higher-order) |
| Multicollinearity | Variance Inflation Factor (VIF) |
| Influence | Cook's Distance, DFBETAS, DFFITS |
Rust Usage
Add to your Cargo.toml:
[]
= { = "0.5", = false }
OLS Regression (Rust)
use ols_regression;
Ridge Regression (Rust)
use ;
use Matrix;
Lasso Regression (Rust)
use ;
use Matrix;
Elastic Net Regression (Rust)
use ;
use Matrix;
Diagnostic Tests (Rust)
use ;
Lambda Path Generation (Rust)
use ;
use Matrix;
let x = new;
let y = vec!;
let options = LambdaPathOptions ;
let lambdas = make_lambda_path;
for &lambda in lambdas.iter
WebAssembly Usage
Build with wasm-pack:
OLS Regression (WASM)
import init from './pkg/linreg_core.js';
;
Ridge Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
console.log;
Lasso Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
Elastic Net Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
Lambda Path Generation (WASM)
const path = JSON.;
console.log;
console.log;
LOESS Regression (WASM)
const result = JSON.;
console.log;
console.log;
Diagnostic Tests (WASM)
// Rainbow test
const rainbow = JSON.;
// Harvey-Collier test
const hc = JSON.;
// Breusch-Pagan test
const bp = JSON.;
// White test (method selection: "r", "python", or "both")
const white = JSON.;
// White test - R-specific method
const whiteR = JSON.;
// White test - Python-specific method
const whitePy = JSON.;
// Jarque-Bera test
const jb = JSON.;
// Durbin-Watson test
const dw = JSON.;
// Shapiro-Wilk test
const sw = JSON.;
// Anderson-Darling test
const ad = JSON.;
// Cook's Distance
const cd = JSON.;
// DFBETAS (influence on coefficients)
const dfbetas = JSON.;
// DFFITS (influence on fitted values)
const dffits = JSON.;
// VIF test (multicollinearity)
const vif = JSON.;
console.log;
// RESET test (functional form)
const reset = JSON.;
// Breusch-Godfrey test (higher-order autocorrelation)
const bg = JSON.;
Statistical Utilities (WASM)
// Student's t CDF: P(T <= t)
const tCDF = ;
// Critical t-value for two-tailed test
const tCrit = ;
// Normal inverse CDF (probit)
const zScore = ;
// Descriptive statistics (all return JSON strings)
const mean = JSON.;
const variance = JSON.;
const stddev = JSON.;
const median = JSON.;
const quantile = JSON.;
const correlation = JSON.;
CSV Parsing (WASM)
const csv = ;
const parsed = JSON.;
console.log;
console.log;
Helper Functions (WASM)
const version = ; // e.g., "0.5.0"
const msg = ; // "Rust WASM is working!"
Domain Security (WASM)
Optional domain restriction via build-time environment variable:
LINREG_DOMAIN_RESTRICT=example.com,mysite.com
When NOT set (default), all domains are allowed.
Python Usage
Install from PyPI:
Quick Start (Python)
The recommended way to use linreg-core in Python is with native types (lists or numpy arrays):
# Works with Python lists
=
=
=
=
# Access attributes directly
# Get a formatted summary
With NumPy arrays:
=
=
=
Result objects provide:
- Direct attribute access (
result.r_squared,result.coefficients,result.aic,result.bic,result.log_likelihood) summary()method for formatted outputto_dict()method for JSON serialization
OLS Regression (Python)
=
=
=
=
Ridge Regression (Python)
=
Lasso Regression (Python)
=
Elastic Net Regression (Python)
=
Lambda Path Generation (Python)
=
Diagnostic Tests (Python)
# Breusch-Pagan test (heteroscedasticity)
=
# Harvey-Collier test (linearity)
=
# Rainbow test (linearity) - supports "r", "python", or "both" methods
=
# White test - choose method: "r", "python", or "both"
=
# Or use specific method functions
=
=
# Jarque-Bera test (normality)
=
# Durbin-Watson test (autocorrelation)
=
# Shapiro-Wilk test (normality)
=
# Anderson-Darling test (normality)
=
# Cook's Distance (influential observations)
=
# DFBETAS (influence on each coefficient)
=
# DFFITS (influence on fitted values)
=
# RESET test (model specification)
=
# Breusch-Godfrey test (higher-order autocorrelation)
=
Statistical Utilities (Python)
# Student's t CDF
=
# Critical t-value (two-tailed)
=
# Normal inverse CDF (probit)
=
# Library version
=
Descriptive Statistics (Python)
# All return float directly (no parsing needed)
=
=
=
=
=
=
# Works with numpy arrays too
=
CSV Parsing (Python)
=
=
Feature Flags
| Feature | Default | Description |
|---|---|---|
wasm |
Yes | Enables WASM bindings and browser support |
python |
No | Enables Python bindings via PyO3 |
validation |
No | Includes test data for validation tests |
For native Rust without WASM overhead:
= { = "0.5", = false }
For Python bindings (built with maturin):
Validation
Results are validated against R (lmtest, car, skedastic, nortest, glmnet) and Python (statsmodels, scipy, sklearn). See the verification/ directory for test scripts and reference outputs.
Running Tests
# Unit tests
# WASM tests
# All tests including doctests
Implementation Notes
Regularization
The Ridge and Lasso implementations follow the glmnet formulation:
minimize (1/(2n)) * Σ(yᵢ - β₀ - xᵢᵀβ)² + λ * [(1 - α) * ||β||₂² / 2 + α * ||β||₁]
- Ridge (α = 0): Closed-form solution with (X'X + λI)⁻¹X'y
- Lasso (α = 1): Coordinate descent algorithm
Numerical Precision
- QR decomposition used throughout for numerical stability
- Anderson-Darling uses Abramowitz & Stegun 7.1.26 for normal CDF (differs from R's Cephes by ~1e-6)
- Shapiro-Wilk implements Royston's 1995 algorithm matching R's implementation
Known Limitations
- Harvey-Collier test may fail on high-VIF datasets (VIF > 5) due to numerical instability in recursive residuals
- Shapiro-Wilk limited to n <= 5000 (matching R's limitation)
- White test may differ from R on collinear datasets due to numerical precision in near-singular matrices
Disclaimer
This library is under active development and has not reached 1.0 stability. While outputs are validated against R and Python implementations, do not use this library for critical applications (medical, financial, safety-critical systems) without independent verification. See the LICENSE for full terms. The software is provided "as is" without warranty of any kind.
License
Dual-licensed under MIT or Apache-2.0.