linreg-core
A lightweight, self-contained linear regression library written in Rust. Compiles to WebAssembly for browser use, Python bindings via PyO3, or runs as a native Rust crate.
Key design principle: All linear algebra and statistical distribution functions are implemented from scratch — no external math libraries required. This keeps binary sizes small and makes the crate highly portable.
Live Demo Link
Table of Contents
| Section | Description |
|---|---|
| Features | Regression methods, model statistics, diagnostic tests |
| Rust Usage | Native Rust crate usage |
| WebAssembly Usage | Browser/JavaScript usage |
| Python Usage | Python bindings via PyO3 |
| Feature Flags | Build configuration options |
| Validation | Testing and verification |
| Implementation Notes | Technical details |
Features
Regression Methods
- OLS Regression: Coefficients, standard errors, t-statistics, p-values, confidence intervals, model selection criteria (AIC, BIC, log-likelihood)
- Ridge Regression: L2-regularized regression with optional standardization, effective degrees of freedom, model selection criteria
- Lasso Regression: L1-regularized regression via coordinate descent with automatic variable selection, convergence tracking, model selection criteria
- Elastic Net: Combined L1 + L2 regularization for variable selection with multicollinearity handling, active set convergence, model selection criteria
- LOESS: Locally estimated scatterplot smoothing for non-parametric curve fitting with configurable span, polynomial degree, and robust fitting
- WLS (Weighted Least Squares): Regression with observation weights for heteroscedastic data, includes confidence intervals
- K-Fold Cross Validation: Model evaluation and hyperparameter tuning for all regression types (OLS, Ridge, Lasso, Elastic Net) with customizable folds, shuffling, and seeding
- Lambda Path Generation: Create regularization paths for cross-validation
Model Statistics
- Fit Metrics: R-squared, Adjusted R-squared, F-statistic, F-test p-value
- Error Metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE)
- Model Selection: Log-likelihood, AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion)
- Residuals: Raw residuals, standardized residuals, fitted values, leverage (hat matrix diagonal)
- Multicollinearity: Variance Inflation Factor (VIF) for each predictor
Diagnostic Tests
| Category | Tests |
|---|---|
| Linearity | Rainbow Test, Harvey-Collier Test, RESET Test |
| Heteroscedasticity | Breusch-Pagan (Koenker variant), White Test (R & Python methods) |
| Normality | Jarque-Bera, Shapiro-Wilk (n ≤ 5000), Anderson-Darling |
| Autocorrelation | Durbin-Watson, Breusch-Godfrey (higher-order) |
| Multicollinearity | Variance Inflation Factor (VIF) |
| Influence | Cook's Distance, DFBETAS, DFFITS |
Rust Usage
Add to your Cargo.toml:
[]
= { = "0.6", = false }
OLS Regression (Rust)
use ols_regression;
Ridge Regression (Rust)
use ;
use Matrix;
Lasso Regression (Rust)
use ;
use Matrix;
Elastic Net Regression (Rust)
use ;
use Matrix;
Diagnostic Tests (Rust)
use ;
WLS Regression (Rust)
use wls_regression;
LOESS Regression (Rust)
use ;
Custom LOESS options:
use ;
let options = LoessOptions ;
let result = loess_fit?;
K-Fold Cross Validation (Rust)
Cross-validation is used for model evaluation and hyperparameter tuning. The library supports K-Fold CV for all regression types:
use ;
CV Result fields:
mean_rmse,std_rmse- Mean and std of RMSE across foldsmean_mae,std_mae- Mean and std of MAE across foldsmean_r_squared,std_r_squared- Mean and std of R² across foldsmean_train_r_squared- Mean training R² (for overfitting detection)fold_results- Per-fold metrics (train/test sizes, MSE, RMSE, MAE, R²)fold_coefficients- Coefficients from each fold (for stability analysis)
Lambda Path Generation (Rust)
use ;
use Matrix;
let x = new;
let y = vec!;
let options = LambdaPathOptions ;
let lambdas = make_lambda_path;
for &lambda in lambdas.iter
Model Save/Load (Rust)
All trained models can be saved to disk and loaded back later:
use ;
// Train a model
let result = ols_regression?;
// Save to file
result.save?;
// Or with a custom name
result.save_with_name?;
// Load back
let loaded = load?;
The same save() and load() methods work for all model types: RegressionOutput, RidgeFit, LassoFit, ElasticNetFit, WlsFit, and LoessFit.
WebAssembly Usage
Build with wasm-pack:
OLS Regression (WASM)
import init from './pkg/linreg_core.js';
;
Ridge Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
console.log;
Lasso Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
Elastic Net Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
Lambda Path Generation (WASM)
const path = JSON.;
console.log;
console.log;
WLS Regression (WASM)
const result = JSON.;
console.log;
console.log;
console.log;
console.log;
console.log;
console.log;
console.log;
LOESS Regression (WASM)
const result = JSON.;
console.log;
console.log;
K-Fold Cross Validation (WASM)
// OLS cross-validation
const ols_cv = JSON.;
console.log;
console.log;
// Ridge cross-validation
const ridge_cv = JSON.;
// Lasso cross-validation
const lasso_cv = JSON.;
// Elastic Net cross-validation
const enet_cv = JSON.;
// Access per-fold results
ols_cv..;
Note: In WASM, boolean and seed parameters are passed as JSON strings. Use "true"/"false" for shuffle and "42" or "null" for seed.
Diagnostic Tests (WASM)
// Rainbow test
const rainbow = JSON.;
// Harvey-Collier test
const hc = JSON.;
// Breusch-Pagan test
const bp = JSON.;
// White test (method selection: "r", "python", or "both")
const white = JSON.;
// White test - R-specific method
const whiteR = JSON.;
// White test - Python-specific method
const whitePy = JSON.;
// Jarque-Bera test
const jb = JSON.;
// Durbin-Watson test
const dw = JSON.;
// Shapiro-Wilk test
const sw = JSON.;
// Anderson-Darling test
const ad = JSON.;
// Cook's Distance
const cd = JSON.;
// DFBETAS (influence on coefficients)
const dfbetas = JSON.;
// DFFITS (influence on fitted values)
const dffits = JSON.;
// VIF test (multicollinearity)
const vif = JSON.;
console.log;
// RESET test (functional form)
const reset = JSON.;
// Breusch-Godfrey test (higher-order autocorrelation)
const bg = JSON.;
Statistical Utilities (WASM)
// Student's t CDF: P(T <= t)
const tCDF = ;
// Critical t-value for two-tailed test
const tCrit = ;
// Normal inverse CDF (probit)
const zScore = ;
// Descriptive statistics (all return JSON strings)
const mean = JSON.;
const variance = JSON.;
const stddev = JSON.;
const median = JSON.;
const quantile = JSON.;
const correlation = JSON.;
CSV Parsing (WASM)
const csv = ;
const parsed = JSON.;
console.log;
console.log;
Helper Functions (WASM)
const version = ; // e.g., "0.5.0"
const msg = ; // "Rust WASM is working!"
Model Serialization (WASM)
// Train a model
const resultJson = ;
const result = JSON.;
// Serialize with metadata
const serialized = ;
// Get metadata without loading full model
const metadataJson = ;
const metadata = JSON.;
console.log;
console.log;
// Deserialize to get model data back
const modelJson = ;
const model = JSON.;
// Download in browser
const blob = ;
const url = ;
const a = document.;
a. = url;
a. = 'model.json';
a.;
Domain Security (WASM)
Optional domain restriction via build-time environment variable:
LINREG_DOMAIN_RESTRICT=example.com,mysite.com
When NOT set (default), all domains are allowed.
Python Usage
Install from PyPI:
Quick Start (Python)
The recommended way to use linreg-core in Python is with native types (lists or numpy arrays):
# Works with Python lists
=
=
=
=
# Access attributes directly
# Get a formatted summary
With NumPy arrays:
=
=
=
Result objects provide:
- Direct attribute access (
result.r_squared,result.coefficients,result.aic,result.bic,result.log_likelihood) summary()method for formatted outputto_dict()method for JSON serialization
OLS Regression (Python)
=
=
=
=
Ridge Regression (Python)
=
Lasso Regression (Python)
=
Elastic Net Regression (Python)
=
LOESS Regression (Python)
=
Lambda Path Generation (Python)
=
Diagnostic Tests (Python)
# Breusch-Pagan test (heteroscedasticity)
=
# Harvey-Collier test (linearity)
=
# Rainbow test (linearity) - supports "r", "python", or "both" methods
=
# White test - choose method: "r", "python", or "both"
=
# Or use specific method functions
=
=
# Jarque-Bera test (normality)
=
# Durbin-Watson test (autocorrelation)
=
# Shapiro-Wilk test (normality)
=
# Anderson-Darling test (normality)
=
# Cook's Distance (influential observations)
=
# DFBETAS (influence on each coefficient)
=
# DFFITS (influence on fitted values)
=
# RESET test (model specification)
=
# Breusch-Godfrey test (higher-order autocorrelation)
=
Statistical Utilities (Python)
# Student's t CDF
=
# Critical t-value (two-tailed)
=
# Normal inverse CDF (probit)
=
# Library version
=
Descriptive Statistics (Python)
# All return float directly (no parsing needed)
=
=
=
=
=
=
# Works with numpy arrays too
=
CSV Parsing (Python)
=
=
Model Save/Load (Python)
# Train a model
=
# Save to file
# Load back
=
The save_model() and load_model() functions work with all result types: OLSResult, RidgeResult, LassoResult, ElasticNetResult, LoessResult, and WlsResult.
Feature Flags
| Feature | Default | Description |
|---|---|---|
wasm |
Yes | Enables WASM bindings and browser support |
python |
No | Enables Python bindings via PyO3 |
validation |
No | Includes test data for validation tests |
For native Rust without WASM overhead:
= { = "0.6", = false }
For Python bindings (built with maturin):
Validation
Results are validated against R (lmtest, car, skedastic, nortest, glmnet) and Python (statsmodels, scipy, sklearn). See the verification/ directory for test scripts and reference outputs.
Running Tests
# Unit tests
# WASM tests
# All tests including doctests
Implementation Notes
Regularization
The Ridge and Lasso implementations follow the glmnet formulation:
minimize (1/(2n)) * Σ(yᵢ - β₀ - xᵢᵀβ)² + λ * [(1 - α) * ||β||₂² / 2 + α * ||β||₁]
- Ridge (α = 0): Closed-form solution with (X'X + λI)⁻¹X'y
- Lasso (α = 1): Coordinate descent algorithm
Numerical Precision
- QR decomposition used throughout for numerical stability
- Anderson-Darling uses Abramowitz & Stegun 7.1.26 for normal CDF (differs from R's Cephes by ~1e-6)
- Shapiro-Wilk implements Royston's 1995 algorithm matching R's implementation
Known Limitations
- Harvey-Collier test may fail on high-VIF datasets (VIF > 5) due to numerical instability in recursive residuals
- Shapiro-Wilk limited to n <= 5000 (matching R's limitation)
- White test may differ from R on collinear datasets due to numerical precision in near-singular matrices
Disclaimer
This library is under active development and has not reached 1.0 stability. While outputs are validated against R and Python implementations, do not use this library for critical applications (medical, financial, safety-critical systems) without independent verification. See the LICENSE for full terms. The software is provided "as is" without warranty of any kind.
Benchmarks
Benchmark results run on Windows with cargo bench --no-default-features. Times are median values.
Core Regression Benchmarks
| Benchmark | Size (n × p) | Time | Throughput |
|---|---|---|---|
| OLS Regression | 10 × 2 | 12.46 µs | 802.71 Kelem/s |
| OLS Regression | 50 × 3 | 53.72 µs | 930.69 Kelem/s |
| OLS Regression | 100 × 5 | 211.09 µs | 473.73 Kelem/s |
| OLS Regression | 500 × 10 | 7.46 ms | 67.04 Kelem/s |
| OLS Regression | 1000 × 20 | 47.81 ms | 20.91 Kelem/s |
| OLS Regression | 5000 × 50 | 2.86 s | 1.75 Kelem/s |
| Ridge Regression | 50 × 3 | 9.61 µs | 5.20 Melem/s |
| Ridge Regression | 100 × 5 | 70.41 µs | 1.42 Melem/s |
| Ridge Regression | 500 × 10 | 842.37 µs | 593.56 Kelem/s |
| Ridge Regression | 1000 × 20 | 1.38 ms | 724.71 Kelem/s |
| Ridge Regression | 5000 × 50 | 10.25 ms | 487.78 Kelem/s |
| Lasso Regression | 50 × 3 | 258.82 µs | 193.18 Kelem/s |
| Lasso Regression | 100 × 5 | 247.89 µs | 403.41 Kelem/s |
| Lasso Regression | 500 × 10 | 3.58 ms | 139.86 Kelem/s |
| Lasso Regression | 1000 × 20 | 1.54 ms | 651.28 Kelem/s |
| Lasso Regression | 5000 × 50 | 12.52 ms | 399.50 Kelem/s |
| Elastic Net Regression | 50 × 3 | 46.15 µs | 1.08 Melem/s |
| Elastic Net Regression | 100 × 5 | 358.07 µs | 279.27 Kelem/s |
| Elastic Net Regression | 500 × 10 | 1.61 ms | 310.18 Kelem/s |
| Elastic Net Regression | 1000 × 20 | 1.60 ms | 623.66 Kelem/s |
| Elastic Net Regression | 5000 × 50 | 12.57 ms | 397.77 Kelem/s |
| WLS Regression | 50 × 3 | 32.92 µs | 1.52 Melem/s |
| WLS Regression | 100 × 5 | 155.30 µs | 643.93 Kelem/s |
| WLS Regression | 500 × 10 | 6.63 ms | 75.37 Kelem/s |
| WLS Regression | 1000 × 20 | 42.68 ms | 23.43 Kelem/s |
| WLS Regression | 5000 × 50 | 2.64 s | 1.89 Kelem/s |
| LOESS Fit | 50 × 1 | 132.83 µs | 376.42 Kelem/s |
| LOESS Fit | 100 × 1 | 1.16 ms | 86.00 Kelem/s |
| LOESS Fit | 500 × 1 | 28.42 ms | 17.59 Kelem/s |
| LOESS Fit | 1000 × 1 | 113.00 ms | 8.85 Kelem/s |
| LOESS Fit | 100 × 2 | 7.10 ms | 14.09 Kelem/s |
| LOESS Fit | 500 × 2 | 1.05 s | 476.19 elem/s |
Lambda Path & Elastic Net Path Benchmarks
| Benchmark | Size (n × p) | Time | Throughput |
|---|---|---|---|
| Elastic Net Path | 100 × 5 | 198.60 ms | 503.52 elem/s |
| Elastic Net Path | 500 × 10 | 69.46 ms | 7.20 Kelem/s |
| Elastic Net Path | 1000 × 20 | 39.08 ms | 25.59 Kelem/s |
| Make Lambda Path | 100 × 5 | 1.09 µs | 91.58 Melem/s |
| Make Lambda Path | 500 × 10 | 8.10 µs | 61.70 Melem/s |
| Make Lambda Path | 1000 × 20 | 29.96 µs | 33.37 Melem/s |
| Make Lambda Path | 5000 × 50 | 424.18 µs | 11.79 Melem/s |
Diagnostic Test Benchmarks
| Benchmark | Size (n × p) | Time |
|---|---|---|
| Rainbow Test | 50 × 3 | 40.34 µs |
| Rainbow Test | 100 × 5 | 187.94 µs |
| Rainbow Test | 500 × 10 | 8.63 ms |
| Rainbow Test | 1000 × 20 | 60.09 ms |
| Rainbow Test | 5000 × 50 | 3.45 s |
| Harvey-Collier Test | 50 × 1 | 15.26 µs |
| Harvey-Collier Test | 100 × 1 | 30.32 µs |
| Harvey-Collier Test | 500 × 1 | 138.44 µs |
| Harvey-Collier Test | 1000 × 1 | 298.33 µs |
| Breusch-Pagan Test | 50 × 3 | 58.07 µs |
| Breusch-Pagan Test | 100 × 5 | 296.74 µs |
| Breusch-Pagan Test | 500 × 10 | 13.79 ms |
| Breusch-Pagan Test | 1000 × 20 | 96.49 ms |
| Breusch-Pagan Test | 5000 × 50 | 5.56 s |
| White Test | 50 × 3 | 14.31 µs |
| White Test | 100 × 5 | 44.25 µs |
| White Test | 500 × 10 | 669.40 µs |
| White Test | 1000 × 20 | 4.89 ms |
| Jarque-Bera Test | 50 × 3 | 30.13 µs |
| Jarque-Bera Test | 100 × 5 | 149.29 µs |
| Jarque-Bera Test | 500 × 10 | 6.64 ms |
| Jarque-Bera Test | 1000 × 20 | 47.89 ms |
| Jarque-Bera Test | 5000 × 50 | 2.75 s |
| Durbin-Watson Test | 50 × 3 | 31.80 µs |
| Durbin-Watson Test | 100 × 5 | 152.56 µs |
| Durbin-Watson Test | 500 × 10 | 6.87 ms |
| Durbin-Watson Test | 1000 × 20 | 48.65 ms |
| Durbin-Watson Test | 5000 × 50 | 2.76 s |
| Breusch-Godfrey Test | 50 × 3 | 71.73 µs |
| Breusch-Godfrey Test | 100 × 5 | 348.94 µs |
| Breusch-Godfrey Test | 500 × 10 | 14.77 ms |
| Breusch-Godfrey Test | 1000 × 20 | 100.08 ms |
| Breusch-Godfrey Test | 5000 × 50 | 5.64 s |
| Shapiro-Wilk Test | 10 × 2 | 2.04 µs |
| Shapiro-Wilk Test | 50 × 3 | 4.87 µs |
| Shapiro-Wilk Test | 100 × 5 | 10.67 µs |
| Shapiro-Wilk Test | 500 × 10 | 110.02 µs |
| Shapiro-Wilk Test | 1000 × 20 | 635.13 µs |
| Shapiro-Wilk Test | 5000 × 50 | 17.53 ms |
| Anderson-Darling Test | 50 × 3 | 34.02 µs |
| Anderson-Darling Test | 100 × 5 | 162.28 µs |
| Anderson-Darling Test | 500 × 10 | 6.95 ms |
| Anderson-Darling Test | 1000 × 20 | 48.15 ms |
| Anderson-Darling Test | 5000 × 50 | 2.78 s |
| Cook's Distance Test | 50 × 3 | 64.52 µs |
| Cook's Distance Test | 100 × 5 | 297.69 µs |
| Cook's Distance Test | 500 × 10 | 12.73 ms |
| Cook's Distance Test | 1000 × 20 | 94.02 ms |
| Cook's Distance Test | 5000 × 50 | 5.31 s |
| DFBETAS Test | 50 × 3 | 46.34 µs |
| DFBETAS Test | 100 × 5 | 185.52 µs |
| DFBETAS Test | 500 × 10 | 7.04 ms |
| DFBETAS Test | 1000 × 20 | 49.68 ms |
| DFFITS Test | 50 × 3 | 33.56 µs |
| DFFITS Test | 100 × 5 | 157.62 µs |
| DFFITS Test | 500 × 10 | 6.82 ms |
| DFFITS Test | 1000 × 20 | 48.35 ms |
| VIF Test | 50 × 3 | 5.36 µs |
| VIF Test | 100 × 5 | 12.68 µs |
| VIF Test | 500 × 10 | 128.04 µs |
| VIF Test | 1000 × 20 | 807.30 µs |
| VIF Test | 5000 × 50 | 26.33 ms |
| RESET Test | 50 × 3 | 77.85 µs |
| RESET Test | 100 × 5 | 359.12 µs |
| RESET Test | 500 × 10 | 14.40 ms |
| RESET Test | 1000 × 20 | 100.52 ms |
| RESET Test | 5000 × 50 | 5.67 s |
| Full Diagnostics | 100 × 5 | 2.75 ms |
| Full Diagnostics | 500 × 10 | 104.01 ms |
| Full Diagnostics | 1000 × 20 | 740.52 ms |
Linear Algebra Benchmarks
| Benchmark | Size | Time |
|---|---|---|
| Matrix Transpose | 10 × 10 | 209.50 ns |
| Matrix Transpose | 50 × 50 | 3.67 µs |
| Matrix Transpose | 100 × 100 | 14.92 µs |
| Matrix Transpose | 500 × 500 | 924.23 µs |
| Matrix Transpose | 1000 × 1000 | 5.56 ms |
| Matrix Multiply (matmul) | 10 × 10 × 10 | 1.54 µs |
| Matrix Multiply (matmul) | 50 × 50 × 50 | 144.15 µs |
| Matrix Multiply (matmul) | 100 × 100 × 100 | 1.39 ms |
| Matrix Multiply (matmul) | 200 × 200 × 200 | 11.90 ms |
| Matrix Multiply (matmul) | 1000 × 100 × 100 | 13.94 ms |
| QR Decomposition | 10 × 5 | 1.41 µs |
| QR Decomposition | 50 × 10 | 14.81 µs |
| QR Decomposition | 100 × 20 | 57.61 µs |
| QR Decomposition | 500 × 50 | 2.19 ms |
| QR Decomposition | 1000 × 100 | 19.20 ms |
| QR Decomposition | 5000 × 100 | 1.48 s |
| QR Decomposition | 10000 × 100 | 8.09 s |
| QR Decomposition | 1000 × 500 | 84.48 ms |
| SVD | 10 × 5 | 150.36 µs |
| SVD | 50 × 10 | 505.41 µs |
| SVD | 100 × 20 | 2.80 ms |
| SVD | 500 × 50 | 60.00 ms |
| SVD | 1000 × 100 | 513.35 ms |
| Matrix Invert | 5 × 5 | 877.32 ns |
| Matrix Invert | 10 × 10 | 2.48 µs |
| Matrix Invert | 20 × 20 | 5.46 µs |
| Matrix Invert | 50 × 50 | 31.94 µs |
| Matrix Invert | 100 × 100 | 141.38 µs |
| Matrix Invert | 200 × 200 | 647.03 µs |
Pressure Benchmarks (Large Datasets)
| Benchmark | Size (n) | Time |
|---|---|---|
| Pressure (OLS + all diagnostics) | 10000 | 11.28 s |
License
Dual-licensed under MIT or Apache-2.0.