inferust
Statistical modeling for Rust — a statsmodels-inspired library.
inferust fills the gap between Python's statsmodels / scipy.stats and the Rust ecosystem. It gives you regression summaries, hypothesis tests, descriptive stats, and correlation matrices with the same depth of output you'd expect from Python — p-values, confidence intervals, AIC/BIC, significance stars, and all.
Features
| Module | What you get | Python equivalent |
|---|---|---|
regression::Ols |
OLS with fast Cholesky and stable SVD solvers, coefficients, std errors, t-stats, p-values, R², adj-R², F-stat, AIC, BIC | statsmodels.OLS().fit() |
hypothesis::ttest |
One-sample, two-sample Welch, paired t-tests with 95% CI | scipy.stats.ttest_* |
hypothesis::chisq |
Goodness-of-fit and independence (contingency table) | scipy.stats.chisquare, chi2_contingency |
hypothesis::anova |
One-way ANOVA table (SS, MS, F, p) | scipy.stats.f_oneway |
descriptive::Summary |
mean, std, variance, min/max, quartiles, skewness, excess kurtosis | pd.Series.describe() |
correlation |
Pearson, Spearman, full correlation matrix | df.corr() |
Installation
Add to your Cargo.toml:
[]
= "0.1"
Quick start
OLS Regression
use Ols;
let x = vec!;
let y = vec!;
let result = new
.with_feature_names
.fit
.unwrap;
result.print_summary;
Output:
═══════════════════════════════════════════════════════════════════
OLS Regression Results
═══════════════════════════════════════════════════════════════════
Dep. variable: y Observations : 4
R² : 0.998102 Adj. R² : 0.994305
F-statistic : 262.7732 F p-value : 0.039405
AIC : 14.7316 BIC : 12.0167
───────────────────────────────────────────────────────────────────
Variable Coef Std Err t P>|t|
───────────────────────────────────────────────────────────────────
const -5.654762 5.033740 -1.1234 0.460565
hours_studied 4.130952 0.177951 23.2141 0.027430 *
prior_gpa 8.166667 1.490421 5.4793 0.115581
───────────────────────────────────────────────────────────────────
Significance codes: *** p<0.001 ** p<0.01 * p<0.05 . p<0.1
═══════════════════════════════════════════════════════════════════
Hypothesis tests
use ;
// Paired t-test
let before = vec!;
let after = vec!;
paired.unwrap.print;
// Two-sample Welch t-test
two_sample.unwrap.print;
// One-way ANOVA
one_way.unwrap.print;
// Chi-squared goodness-of-fit
goodness_of_fit.unwrap.print;
// Chi-squared test of independence
independence.unwrap.print;
Descriptive statistics
use Summary;
let data = vec!;
new.unwrap.print;
// ─────────────────────────────
// n : 6
// mean : 6.400000
// std : 2.282176
// min : 3.600000
// 25% : 4.575000
// 50% : 6.150000
// 75% : 8.250000
// max : 9.300000
// skewness : -0.058732
// kurtosis : -1.504070
// ─────────────────────────────
Correlation
use correlation;
let r = pearson.unwrap;
let rs = spearman.unwrap;
let matrix = correlation_matrix.unwrap;
print_correlation_matrix;
OLS builder options
use ;
new // intercept on by default
.with_feature_names // label columns
.with_solver // default fast path
.no_intercept // force through origin
.fit
.unwrap;
new
.stable // SVD solver for tougher designs
.fit
.unwrap;
OlsResult also exposes .predict(&x) for out-of-sample predictions and all raw fields (coefficients, residuals, r_squared, p_values, etc.) for programmatic use.
Solver strategy
inferust defaults to a Cholesky solve of the normal system for full-rank, well-conditioned OLS problems. This avoids the extra work of forming a full inverse for coefficient estimation and is the fastest path for typical dense data.
For tougher or poorly conditioned designs, call .stable() or .with_solver(OlsSolver::Svd) to use the SVD path. The test suite includes statsmodels-derived reference values for coefficients, standard errors, t/p-values, R², F-statistics, AIC, and BIC.
Benchmarks
The repository includes reproducible OLS benchmark scripts for comparing inferust with Python statsmodels on deterministic synthetic data. Build and run the Rust benchmark in release mode:
Run the Python comparison after installing numpy, scipy, and statsmodels:
On the current local benchmark machine, the 10,000 row × 8 feature case measured approximately:
| Engine | Solver | Median fit time |
|---|---|---|
inferust |
Cholesky | 0.769 ms |
inferust |
SVD | 2.474 ms |
statsmodels |
default OLS | 2.492 ms |
Benchmark results vary by machine and BLAS/LAPACK configuration, so treat these as a local smoke test rather than a universal claim. The checksum printed by each script is useful for confirming both implementations fit equivalent data.
Error handling
All fallible functions return inferust::Result<T> (an alias for Result<T, InferustError>):
use InferustError;
match result
Dependencies
| Crate | Purpose |
|---|---|
nalgebra |
Matrix operations for OLS normal equations — no LAPACK required |
statrs |
Student's t, F, and χ² distributions for p-values and confidence intervals |
thiserror |
Ergonomic error types |
Roadmap
- Logistic regression (GLM with logit link)
- Ridge / Lasso regularization
- Durbin-Watson and Breusch-Pagan diagnostic tests
- Tukey HSD post-hoc test (after ANOVA)
- Time-series: ARIMA / ACF / PACF
- Weighted OLS
Contributions welcome — open an issue or PR!
License
MIT — see LICENSE.