anofox-regression
A robust statistics library for regression analysis in Rust.
This library provides sklearn-style regression estimators with full statistical inference support including standard errors, t-statistics, p-values, confidence intervals, and prediction intervals.
Features
-
Multiple Regression Methods
- Ordinary Least Squares (OLS)
- Weighted Least Squares (WLS)
- Ridge Regression (L2 regularization)
- Elastic Net (L1 + L2 regularization)
- Recursive Least Squares (RLS) with online learning
- Bounded Least Squares (BLS/NNLS) with box constraints
- Dynamic Linear Model (LmDynamic) with time-varying coefficients
- Tweedie GLM (Gaussian, Poisson, Gamma, Inverse-Gaussian, Compound Poisson-Gamma)
- Poisson GLM (Log, Identity, Sqrt links) with offset support
- Negative Binomial GLM (overdispersed count data with theta estimation)
- Binomial GLM (Logistic, Probit, Complementary log-log)
- Augmented Linear Model (ALM) with 24+ distributions (Normal, Laplace, Student-t, Gamma, Beta, etc.)
-
Smoothing & Classification
- LOWESS (Locally Weighted Scatterplot Smoothing)
- AID (Automatic Identification of Demand) classifier for demand pattern classification
-
Loss Functions
- MAE, MSE, RMSE, MAPE, sMAPE, MASE, pinball loss
-
Statistical Inference
- Coefficient standard errors, t-statistics, and p-values
- Confidence intervals for coefficients
- Prediction intervals (R-style
predict(..., interval="prediction")) - Confidence intervals for mean response
-
Model Diagnostics
- R², Adjusted R², RMSE
- F-statistic and p-value
- AIC, AICc, BIC, Log-likelihood
- Residual analysis (standardized, studentized)
- GLM residuals (Pearson, deviance, working)
- Leverage and influence measures (Cook's distance, DFFITS)
- Variance Inflation Factor (VIF) for multicollinearity detection
-
Robust Handling
- Automatic detection of collinear/constant columns
- Rank-deficient matrix handling
- Edge cases (extreme weights, near-singular matrices)
- R-compatible NA handling (na.omit, na.exclude, na.fail, na.pass)
Installation
Add to your Cargo.toml:
[]
= "0.3"
Quick Start
Basic OLS Regression
use *;
use ;
// Create sample data
let x = from_fn;
let y = from_fn;
// Fit OLS model
let model = builder
.with_intercept
.build;
let fitted = model.fit.unwrap;
// Access results
println!;
println!;
println!;
// Make predictions
let x_new = from_fn;
let predictions = fitted.predict;
Prediction Intervals
use *;
let fitted = builder
.with_intercept
.build
.fit
.unwrap;
// 95% prediction intervals for new observations
let result = fitted.predict_with_interval;
println!;
println!;
println!;
// 95% confidence intervals for mean response
let result = fitted.predict_with_interval;
Weighted Least Squares
use *;
let weights = from_fn;
let model = builder
.with_intercept
.weights
.build;
let fitted = model.fit.unwrap;
Ridge Regression
use *;
let model = builder
.with_intercept
.lambda
.compute_inference
.build;
let fitted = model.fit.unwrap;
// Access inference statistics
let result = fitted.result;
if let Some = &result.std_errors
Elastic Net
use *;
let model = builder
.with_intercept
.lambda
.alpha // 0 = Ridge, 1 = Lasso
.max_iterations
.tolerance
.build;
let fitted = model.fit.unwrap;
println!;
Recursive Least Squares (Online Learning)
use *;
let model = builder
.with_intercept
.forgetting_factor // Recent data weighted more
.build;
let mut fitted = model.fit.unwrap;
// Online update with new observation
let x_new = from_fn;
let y_new = 5.0;
let prediction = fitted.update;
Bounded Least Squares (NNLS)
use *;
// Non-negative least squares (all coefficients >= 0)
let model = nnls.build;
let fitted = model.fit.unwrap;
// Custom box constraints: lower <= coefficients <= upper
let model = builder
.lower_bounds
.upper_bounds
.build;
let fitted = model.fit.unwrap;
Tweedie GLM
use *;
// Gamma regression with log link (insurance claims, positive continuous data)
let model = gamma
.with_intercept
.build;
let fitted = model.fit.unwrap;
println!;
println!;
// Poisson regression (count data)
let model = poisson
.with_intercept
.build;
// Compound Poisson-Gamma (zero-inflated continuous data)
let model = builder
.var_power // Between 1 (Poisson) and 2 (Gamma)
.link_power // Log link
.with_intercept
.build;
Poisson GLM (Count Data)
use *;
// Poisson regression with log link (count data)
let model = log
.with_intercept
.compute_inference
.build;
let fitted = model.fit.unwrap;
println!;
println!;
// Predict counts
let counts = fitted.predict_count;
// Poisson with identity link
let model = identity
.with_intercept
.build;
// Poisson with sqrt link
let model = sqrt
.with_intercept
.build;
// Rate modeling with offset (for exposure)
// y_i ~ Poisson(exposure_i * rate), log(E[y]) = log(exposure) + Xβ
let exposures = from_fn;
let offset = from_fn;
let model = log
.with_intercept
.offset
.build;
let fitted = model.fit.unwrap;
// Predict with new offset (exposure = 2 for all new observations)
let x_new = from_fn;
let new_offset = from_fn;
let rates = fitted.predict_with_offset;
Negative Binomial GLM (Overdispersed Count Data)
use *;
// Negative binomial with automatic theta estimation (like MASS::glm.nb)
let model = builder
.with_intercept
.estimate_theta // Estimate dispersion parameter
.build;
let fitted = model.fit.unwrap;
println!;
println!;
println!;
// Negative binomial with fixed theta
let model = with_theta
.with_intercept
.build;
let fitted = model.fit.unwrap;
// Rate modeling with offset
let exposures = from_fn;
let offset = from_fn;
let model = builder
.with_intercept
.offset
.build;
let fitted = model.fit.unwrap;
// GLM residuals
let pearson = fitted.pearson_residuals;
let deviance = fitted.deviance_residuals;
Binomial GLM (Logistic Regression)
use *;
// Logistic regression (binary classification)
let model = logistic
.with_intercept
.compute_inference
.build;
let fitted = model.fit.unwrap;
// Predict probabilities
let probs = fitted.predict_probability;
// Predict with standard errors and confidence intervals
let pred = fitted.predict_with_se;
println!;
println!;
println!;
// Probit regression
let model = probit
.with_intercept
.build;
// Complementary log-log regression
let model = cloglog
.with_intercept
.build;
// GLM residuals
let pearson = fitted.pearson_residuals;
let deviance = fitted.deviance_residuals;
let working = fitted.working_residuals;
Augmented Linear Model (ALM)
The ALM supports 24+ distributions for maximum likelihood regression, based on the greybox R package.
use *;
// Laplace (LAD) regression - robust to outliers
let model = builder
.distribution
.with_intercept
.build;
let fitted = model.fit.unwrap;
println!;
println!;
// Student-t regression - heavy-tailed errors
let model = builder
.distribution
.with_intercept
.build;
let fitted = model.fit.unwrap;
// Log-Normal for positive skewed data
let model = builder
.distribution
.with_intercept
.build;
// Beta regression for proportions in (0, 1)
let model = builder
.distribution
.with_intercept
.build;
// Gamma for positive continuous data
let model = builder
.distribution
.with_intercept
.build;
// Custom link function
let model = builder
.distribution
.link
.with_intercept
.build;
NA Handling
use ;
// Process data with NA values (represented as f64::NAN)
let result = process.unwrap;
println!;
// na.exclude: Remove NA but expand output back to original length
let result = process.unwrap;
let residuals_expanded = result.na_info.expand;
// na.fail: Error if any NA present
let result = process;
assert!;
// na.pass: Keep NA values (solver must handle them)
let result = process.unwrap;
Model Diagnostics
use *;
let fitted = builder
.with_intercept
.build
.fit
.unwrap;
let result = fitted.result;
// Goodness of fit
println!;
println!;
println!;
// F-test
println!;
println!;
// Information criteria
println!;
println!;
// Residual diagnostics
let std_resid = standardized_residuals;
let leverage = compute_leverage;
let cooks_d = cooks_distance;
// Detect influential points
let influential = influential_cooks;
// Variance Inflation Factor for multicollinearity
let vif = variance_inflation_factor;
API Reference
Regression Result Fields
| Field | Description |
|---|---|
coefficients |
Estimated regression coefficients |
intercept |
Intercept term (if fitted) |
std_errors |
Standard errors of coefficients |
t_statistics |
t-statistics for coefficients |
p_values |
Two-tailed p-values |
conf_interval_lower/upper |
Confidence intervals for coefficients |
r_squared |
Coefficient of determination |
adj_r_squared |
Adjusted R² |
mse |
Mean squared error |
rmse |
Root mean squared error |
f_statistic |
F-statistic for overall model |
f_pvalue |
p-value for F-test |
aic |
Akaike Information Criterion |
aicc |
Corrected AIC |
bic |
Bayesian Information Criterion |
log_likelihood |
Log-likelihood |
residuals |
Model residuals |
fitted_values |
Predicted values on training data |
Tweedie GLM Result Fields
| Field | Description |
|---|---|
deviance |
Total deviance of fitted model |
null_deviance |
Deviance of intercept-only model |
dispersion |
Estimated dispersion parameter |
iterations |
Number of IRLS iterations |
Tweedie Family (var_power)
| var_power | Distribution | Use Case |
|---|---|---|
| 0 | Gaussian | Standard linear regression |
| 1 | Poisson | Count data |
| 1-2 | Compound Poisson-Gamma | Zero-inflated continuous (insurance, rainfall) |
| 2 | Gamma | Positive continuous |
| 3 | Inverse-Gaussian | Positive, right-skewed |
Poisson GLM Result Fields
| Field | Description |
|---|---|
deviance |
Total deviance of fitted model |
null_deviance |
Deviance of intercept-only model |
dispersion |
Estimated dispersion parameter |
iterations |
Number of IRLS iterations |
Poisson Link Functions
| Link | Function | Inverse | Use Case |
|---|---|---|---|
| Log | ln(μ) | exp(η) | Canonical, most common |
| Identity | μ | η | Linear relationships |
| Sqrt | √μ | η² | Alternative for count data |
Negative Binomial GLM Result Fields
| Field | Description |
|---|---|
deviance |
Total deviance of fitted model |
null_deviance |
Deviance of intercept-only model |
theta |
Estimated or fixed dispersion parameter |
dispersion |
Estimated dispersion parameter |
iterations |
Number of iterations |
Negative Binomial Parameters
| Parameter | Description |
|---|---|
theta |
Dispersion parameter (size). Larger = less overdispersion. |
estimate_theta |
If true, estimate theta via alternating ML. Default: true. |
When to use Negative Binomial vs Poisson:
- Use Poisson when Var(Y) ≈ E[Y]
- Use Negative Binomial when Var(Y) > E[Y] (overdispersion)
- NB variance: V(μ) = μ + μ²/θ (approaches Poisson as θ → ∞)
Binomial GLM Result Fields
| Field | Description |
|---|---|
deviance |
Total deviance of fitted model |
null_deviance |
Deviance of intercept-only model |
iterations |
Number of IRLS iterations |
Binomial Link Functions
| Link | Function | Inverse | Use Case |
|---|---|---|---|
| Logit | log(p/(1-p)) | 1/(1+exp(-η)) | Standard logistic regression |
| Probit | Φ⁻¹(p) | Φ(η) | Dose-response, bioassay |
| Cloglog | log(-log(1-p)) | 1-exp(-exp(η)) | Asymmetric, extreme events |
GLM Residual Types
| Type | Formula | Use Case |
|---|---|---|
| Pearson | (y - μ) / √V(μ) | Outlier detection, overdispersion |
| Deviance | sign(y - μ) × √d_i | Model fit assessment |
| Working | (y - μ) × (dη/dμ) | IRLS algorithm diagnostics |
ALM (Augmented Linear Model) Result Fields
| Field | Description |
|---|---|
log_likelihood |
Maximized log-likelihood |
scale |
Estimated scale parameter |
iterations |
Number of IRLS iterations |
ALM Distributions
| Category | Distributions |
|---|---|
| Continuous | Normal, Laplace, Student-t, Logistic, Asymmetric Laplace, Generalised Normal, S |
| Log-transformed | Log-Normal, Log-Laplace, Log-S, Log-Generalised Normal |
| Positive | Gamma, Inverse Gaussian, Exponential, Folded Normal |
| Box-Cox | Box-Cox Normal |
| Proportions | Beta, Logit-Normal |
| Count data | Poisson, Negative Binomial, Binomial, Geometric |
| Ordinal | Cumulative Logistic, Cumulative Normal |
ALM Link Functions
| Link | Function | Inverse | Typical Use |
|---|---|---|---|
| Identity | μ = η | η | Normal, Laplace, Student-t |
| Log | μ = exp(η) | ln(μ) | Gamma, Poisson, Log-Normal |
| Logit | μ = 1/(1+e⁻ᶯ) | ln(μ/(1-μ)) | Beta, Binomial |
| Probit | μ = Φ(η) | Φ⁻¹(μ) | Ordinal models |
| Inverse | μ = 1/η | 1/μ | Inverse Gaussian |
| Sqrt | μ = η² | √μ | Count data (alternative) |
| Cloglog | μ = 1-e⁻ᵉˣᵖ⁽ᶯ⁾ | ln(-ln(1-μ)) | Asymmetric binary |
Prediction Types (GLM)
| Type | Description |
|---|---|
PredictionType::Response |
Predictions on response scale (probabilities for binomial) |
PredictionType::Link |
Predictions on link scale (log-odds for logit) |
Interval Types
IntervalType::Prediction- Prediction interval for new observations (wider)IntervalType::Confidence- Confidence interval for mean response (narrower)
Lambda Scaling
For Ridge and Elastic Net, use LambdaScaling::Glmnet to match R's glmnet package:
let model = builder
.lambda
.lambda_scaling // lambda * n
.build;
Examples
The examples/ directory contains runnable examples demonstrating each regression method:
| Example | Description |
|---|---|
ols.rs |
Ordinary Least Squares regression |
wls.rs |
Weighted Least Squares regression |
ridge.rs |
Ridge regression (L2 regularization) |
elastic_net.rs |
Elastic Net (L1 + L2 regularization) |
rls.rs |
Recursive Least Squares with online learning |
bls.rs |
Bounded Least Squares / NNLS |
tweedie.rs |
Tweedie GLM (Gaussian, Poisson, Gamma) |
poisson.rs |
Poisson GLM for count data |
negative_binomial.rs |
Negative Binomial GLM for overdispersed counts |
binomial.rs |
Binomial GLM / Logistic regression |
alm.rs |
Augmented Linear Model (24+ distributions) |
lm_dynamic.rs |
Dynamic linear model with time-varying coefficients |
lowess.rs |
LOWESS smoothing |
aid.rs |
Automatic Intermittent Demand classification |
collinearity_intervals.rs |
Handling collinearity in prediction intervals |
Run an example with:
Validation
This library is validated against R's statistical functions:
lm()for OLSlm()with weights for WLSglmnet::glmnet()for Ridge and Elastic Netnnls::nnls()for Non-negative Least Squaresstatmod::tweedie()for Tweedie GLMglm(..., family=poisson)for Poisson GLM (log, identity, sqrt links)MASS::glm.nb()for Negative Binomial GLM with theta estimationglm(..., family=binomial)for Binomial GLM (logit, probit, cloglog)glm(..., family=Gamma)for Gamma GLMglm(..., family=inverse.gaussian)for Inverse Gaussian GLMglm(..., offset=...)for rate modeling with offset termsgreybox::alm()for Augmented Linear Model (Normal, Laplace, Student-t, Log-Normal, Gamma, etc.)residuals(..., type="pearson/deviance/working")for GLM residualspredict(..., se.fit=TRUE)for GLM predictions with standard errorsna.omit(),na.exclude(),na.fail(),na.pass()for NA handlingpredict(..., interval="prediction")for prediction intervalscooks.distance(),hatvalues(),rstandard()for diagnosticscar::vif()for variance inflation factors
All tests ensure numerical agreement with R within appropriate tolerances (708 tests total).
Dependencies
License
MIT License