Greeners: High-Performance Econometrics in Rust
Greeners is a lightning-fast, type-safe econometrics library written in pure Rust. It provides a comprehensive suite of estimators for Cross-Sectional, Time-Series, and Panel Data analysis, leveraging linear algebra backends (LAPACK/BLAS) for maximum performance.
Designed for academic research, heavy simulations, and production-grade economic modeling.
🎊 v1.0.2 STABLE RELEASE: Named Variables & Enhanced Data Loading
Greeners v1.0.2 brings human-readable variable names in regression output and flexible data loading from multiple sources!
🆕 Multiple Data Loading Options (NEW in v1.0.2)
Load data from CSV, JSON, URLs, or use the Builder pattern - just like pandas/polars!
// 1. CSV from URL (reproducible research!)
let df = from_csv_url?;
// 2. JSON from local file (column or record oriented)
let df = from_json?;
// 3. JSON from URL (API integration)
let df = from_json_url?;
// 4. Builder pattern (most convenient!)
let df = builder
.add_column
.add_column
.build?;
// 5. CSV from local file (classic)
let df = from_csv?;
Why this matters:
- ✅ Reproducible research - Load datasets directly from GitHub/URLs
- ✅ API integration - Fetch data from web services
- ✅ Flexible formats - CSV, JSON (column/record oriented)
- ✅ Pandas-like - Familiar syntax for data scientists
- ✅ Type-safe - All data loading is checked at compile time
📖 See examples/dataframe_loading.rs for all loading methods.
Named Variables in Results (NEW in v1.0.2)
No more generic x0, x1, x2 in regression output! All models now display actual variable names from your Formula:
use ;
let formula = parse?;
let result = OLSfrom_formula?;
println!;
Before (v1.0.1):
OLS Regression Results
====================================
Variable Coef Std Err t P>|t|
const 5.23 0.45 11.62 0.000
x0 2.15 0.12 17.92 0.000 <- Generic names
x1 0.08 0.02 4.00 0.000
x2 -1.20 0.25 -4.80 0.000
Now (v1.0.2):
OLS Regression Results
====================================
Variable Coef Std Err t P>|t|
const 5.23 0.45 11.62 0.000
education 2.15 0.12 17.92 0.000 <- Actual variable names!
experience 0.08 0.02 4.00 0.000
female -1.20 0.25 -4.80 0.000
Applies to ALL models:
- ✅ OLS, WLS, Cochrane-Orcutt (FGLS)
- ✅ IV/2SLS (Instrumental Variables)
- ✅ Logit/Probit (Binary Choice)
- ✅ Quantile Regression (all quantiles)
- ✅ Panel Data (Fixed Effects, Random Effects, Between)
- ✅ GMM (Generalized Method of Moments)
- ✅ Difference-in-Differences
Comprehensive Test Coverage
v1.0.2 includes 143 unit tests covering all major functionality:
- 62 new tests added in v1.0.2 across 7 test modules
- Full coverage of IV/2SLS, Panel Data, DiD, FGLS, Quantile Regression
- Diagnostic tests (VIF, Breusch-Pagan, Jarque-Bera, Durbin-Watson)
- GMM specification tests (J-statistic, overidentification)
- Model selection and information criteria
Run tests locally:
Code Quality Improvements
- Applied clippy lints for idiomatic Rust (25+ improvements)
- Replaced
.iter().cloned().collect()with.to_vec()for better performance - Modern range checks using
.contains()instead of manual comparisons - Cleaner, more maintainable codebase
🎉 v1.0.1: Specification Tests
Greeners reaches production stability with comprehensive specification tests for diagnosing regression assumptions!
Specification Tests (NEW in v1.0.1)
Diagnose violations of classical regression assumptions and identify appropriate remedies:
use ;
// Estimate model
let model = OLSfrom_formula?;
let = df.to_design_matrix?;
let residuals = model.residuals;
let fitted = model.fitted_values;
// 1. White Test for Heteroskedasticity
let = white_test?;
if p_value < 0.05
// 2. RESET Test for Functional Form Misspecification
let = reset_test?;
if p_value < 0.05
// 3. Breusch-Godfrey Test for Autocorrelation
let = breusch_godfrey_test?;
if p_value < 0.05
// 4. Goldfeld-Quandt Test for Heteroskedasticity
let = goldfeld_quandt_test?;
When to Use:
- White Test → General heteroskedasticity test (any form)
- RESET Test → Detect omitted variables or wrong functional form
- Breusch-Godfrey → Detect autocorrelation in time series/panel data
- Goldfeld-Quandt → Test heteroskedasticity when you suspect specific ordering
Remedies:
- Heteroskedasticity →
CovarianceType::HC3orHC4 - Autocorrelation →
CovarianceType::NeweyWest(lags) - Misspecification → Add
I(x^2),x1*x2interactions
Stata/R/Python Equivalents:
- Stata:
estat hettest,estat ovtest,estat bgodfrey - R:
lmtest::bptest(),lmtest::resettest(),lmtest::bgtest() - Python:
statsmodels.stats.diagnostic.het_white()
📖 See examples/specification_tests.rs for comprehensive demonstration.
✨ NEW: R/Python-Style Formula API
Greeners now supports R/Python-style formula syntax (like statsmodels and lm()), making model specification intuitive and concise:
use ;
// Python equivalent: smf.ols('y ~ x1 + x2', data=df).fit(cov_type='HC1')
let formula = parse?;
let result = OLSfrom_formula?;
All estimators support formulas: OLS, WLS, DiD, IV/2SLS, Logit/Probit, Quantile Regression, Panel Data (FE/RE/Between), and more!
📖 See FORMULA_API.md for complete documentation and examples.
🚀 NEW in v0.9.0: Panel Diagnostics & Model Selection
Greeners now provides comprehensive tools for panel data model selection and information criteria-based model comparison - essential for rigorous empirical research!
Model Selection & Comparison
Compare multiple models using AIC/BIC with automatic ranking and Akaike weights for model averaging:
use ;
// Estimate competing models
let model1 = OLSfrom_formula?;
let model2 = OLSfrom_formula?;
let model3 = OLSfrom_formula?;
// Compare models
let models = vec!;
let comparison = compare_models;
print_comparison;
// Calculate Akaike weights for model averaging
let aic_values: = comparison.iter.map.collect;
let = akaike_weights;
Output:
=============================== Model Comparison ===============================
Model | AIC | BIC | Rank(AIC) | Rank(BIC)
--------------------------------------------------------------------------------
Full Model | 183.83 | 191.48 | 1 | 1
Restricted | 184.77 | 190.50 | 2 | 2
Simple | 188.19 | 192.01 | 3 | 3
📊 AKAIKE WEIGHTS:
Δ_AIC < 2: Substantial support
Δ_AIC 4-7: Considerably less support
Δ_AIC > 10: Essentially no support
Panel Diagnostics Tests
Test whether pooled OLS is appropriate or if panel data methods (Fixed/Random Effects) are needed:
Breusch-Pagan LM Test for Random Effects
use ;
// Estimate pooled OLS
let model_pooled = OLSfrom_formula?;
let = df.to_design_matrix?;
let residuals = model_pooled.residuals;
// Test for random effects
let = breusch_pagan_lm?;
// Interpretation:
// H₀: σ²_u = 0 (no panel effects, pooled OLS adequate)
// H₁: σ²_u > 0 (random effects needed)
// If p < 0.05 → Use Random Effects or Fixed Effects
F-Test for Fixed Effects
// Test if firm fixed effects are significant
let = f_test_fixed_effects?;
// Interpretation:
// H₀: All firm effects are zero (pooled OLS adequate)
// H₁: Firm effects exist (use fixed effects)
// If p < 0.05 → Use Fixed Effects model
Summary Statistics
Quick descriptive statistics for initial data exploration:
use SummaryStats;
let stats = describe;
// Returns: (mean, std, min, Q25, median, Q75, max, n_obs)
// Pretty-print summary table
let summary_data = vec!;
print_summary;
Stata/R/Python Equivalents:
- Stata:
estat ic(AIC/BIC),xttest0(BP LM),testparm(F-test) - R:
AIC(),BIC(),plm::plmtest(),plm::pFtest() - Python:
statsmodelsinformation criteria,linearmodels.paneldiagnostics
📖 See examples/panel_model_selection.rs for comprehensive demonstration with panel data workflow.
🌟 NEW in v0.5.0: Marginal Effects for Binary Choice Models
After estimating Logit/Probit models, coefficients alone are hard to interpret (they're on log-odds/z-score scale). Marginal effects translate these to probability changes - essential for policy analysis and substantive interpretation!
Average Marginal Effects (AME) - RECOMMENDED
use ;
// Estimate Logit model
let formula = parse?;
let result = from_formula?;
// Get design matrix
let = df.to_design_matrix?;
// Calculate Average Marginal Effects (AME)
let ame = result.average_marginal_effects?;
// Interpretation: AME[gpa] = 0.15 means:
// "A 1-point increase in GPA increases admission probability by 15 percentage points"
// (averaged across all students in the sample)
Why AME?
- ✅ Accounts for heterogeneity across observations
- ✅ More robust to non-linearities
- ✅ Standard in modern econometrics (Stata, R, Python)
- ✅ Easy to interpret: probability changes, not log-odds
Marginal Effects at Means (MEM)
// Calculate Marginal Effects at Means (MEM)
let mem = result.marginal_effects_at_means?;
// Interpretation: Effect for "average" student
// ⚠️ Less robust than AME - can evaluate at impossible values (e.g., average of dummies)
Predictions
// Predict admission probabilities for new students
let probs = result.predict_proba;
// Example: probs[0] = 0.85 → 85% chance of admission
Logit vs Probit Comparison
// Both models give similar marginal effects
let logit_result = from_formula?;
let probit_result = from_formula?;
let ame_logit = logit_result.average_marginal_effects?;
let ame_probit = probit_result.average_marginal_effects?;
// Typically: ame_logit ≈ ame_probit (differences < 1-2 percentage points)
Stata/R/Python Equivalents:
- Stata:
margins, dydx(*)(AME) ormargins, dydx(*) atmeans(MEM) - R:
mfx::logitmfx()ormargins::margins() - Python:
statsmodels.discrete.discrete_model.Logit(...).get_margeff()
📖 See examples/marginal_effects.rs for comprehensive demonstration with college admission data.
Two-Way Clustered Standard Errors
For panel data with clustering along two dimensions (e.g., firms × time):
// Panel data: 4 firms × 6 time periods
let firm_ids = vec!;
let time_ids = vec!;
// Two-way clustering (Cameron-Gelbach-Miller, 2011)
let result = OLSfrom_formula?;
// Formula: V = V_firm + V_time - V_intersection
// Accounts for BOTH within-firm AND within-time correlation
When to use:
- ✅ Panel data (firms/countries over time)
- ✅ Correlation within entities AND within time periods
- ✅ More robust than one-way clustering
- ✅ Standard in modern panel data econometrics
Stata equivalent: reghdfe y x, vce(cluster firm_id time_id)
📖 See examples/two_way_clustering.rs for complete comparison of non-robust vs one-way vs two-way clustering.
🎊 NEW in v0.4.0: Categorical Variables & Polynomial Terms
Categorical Variable Encoding
Automatic dummy variable creation with R/Python syntax:
// Categorical variable: creates dummies, drops first level
let formula = parse?;
let result = OLSfrom_formula?;
// If region has values [0, 1, 2, 3] → creates 3 dummies (drops 0 as reference)
How it works:
C(var)detects unique values in the variable- Creates K-1 dummy variables (drops first category as reference)
- Essential for categorical data (regions, industries, treatment groups)
Polynomial Terms
Non-linear relationships made easy:
// Quadratic model: captures diminishing returns
let formula = parse?;
// Cubic model: more flexible
let formula = parse?;
// Alternative syntax (Python-style)
let formula = parse?;
Use cases:
- Production functions (diminishing returns)
- Wage curves (experience effects)
- Growth models (non-linear dynamics)
Combine with interactions:
// Region-specific quadratic effects
let formula = parse?;
🆕 NEW in v0.2.0: Clustered Standard Errors & Advanced Diagnostics
Clustered Standard Errors
Critical for panel data and hierarchical structures where observations are grouped:
// Panel data: firms over time
let cluster_ids = vec!; // Firm IDs
let result = OLSfrom_formula?;
Use clustered SE when:
- Panel data (repeated observations per entity)
- Hierarchical data (students in schools, patients in hospitals)
- Experimental data with treatment clusters
- Geographic clustering (observations in regions/countries)
Advanced Diagnostics
New diagnostic tools for model validation:
use Diagnostics;
// Multicollinearity detection
let vif = vif?; // Variance Inflation Factor
let cond_num = condition_number?; // Condition Number
// Influential observations
let leverage = leverage?; // Hat values
let cooks_d = cooks_distance?; // Cook's Distance
// Assumption testing (already available)
let = jarque_bera?; // Normality
let = breusch_pagan?; // Heteroskedasticity
let dw_stat = durbin_watson; // Autocorrelation
🎉 NEW in v0.3.0: Interactions, HC2/HC3, and Predictions
Interaction Terms
Model interaction effects with R/Python syntax:
// Full interaction: x1 * x2 expands to x1 + x2 + x1:x2
let formula = parse?;
let result = OLSfrom_formula?;
// Interaction only: just the product term
let formula2 = parse?;
Use cases:
- Differential effects by groups (e.g., education returns by gender)
- Treatment effect heterogeneity
- Testing moderation/mediation hypotheses
Enhanced Robust Standard Errors
// HC2: Leverage-adjusted (more efficient with small samples)
let result_hc2 = OLSfrom_formula?;
// HC3: Jackknife (most robust - RECOMMENDED for small samples)
let result_hc3 = OLSfrom_formula?;
Comparison:
- HC1: White (1980), uses n/(n-k) correction
- HC2: Adjusts for leverage: σ²/(1-h_i)
- HC3: Jackknife: σ²/(1-h_i)² - Most conservative & robust
Post-Estimation Predictions
// Out-of-sample predictions
let x_new = from_shape_vec?;
let predictions = result.predict;
// In-sample fitted values
let fitted = result.fitted_values;
// Residuals
let resid = result.residuals;
🚀 Features
Cross-Sectional & General
- OLS & GLS: Robust standard errors (White, Newey-West).
- IV / 2SLS: Instrumental Variables for endogeneity correction.
- Quantile Regression: Robust estimation via Iteratively Reweighted Least Squares (IRLS).
- Discrete Choice: Logit and Probit models (Newton-Raphson MLE).
- Diagnostics: R-squared, F-Test, T-Test, Confidence Intervals.
Time Series (Macroeconometrics)
- Unit Root Tests: Augmented Dickey-Fuller (ADF).
- VAR (Vector Autoregression): Multivariate modeling with Information Criteria (AIC/BIC).
- VARMA: Hannan-Rissanen algorithm for ARMA structures.
- VECM (Cointegration): Johansen Procedure (Eigenvalue decomposition) for I(1) systems.
- Impulse Response Functions (IRF): Orthogonalized structural shocks.
Panel Data
- Fixed Effects (Within): Absorbs individual heterogeneity.
- Random Effects: Swamy-Arora GLS estimator.
- Between Estimator: Long-run cross-sectional relationships.
- Dynamic Panel: Arellano-Bond (Difference GMM) to solve Nickell Bias.
- Panel Threshold: Hansen (1999) non-linear regime switching models.
- Testing: Hausman Test for FE vs RE.
Systems of Equations
- SUR: Seemingly Unrelated Regressions (Zellner).
- 3SLS: Three-Stage Least Squares (System IV).
System Requirements (Pre-requisites)
Debian / Ubuntu / Pop!_OS:
Fedora / RHEL / CentOS:
Arch Linux / Manjaro:
macOS:
📦 Installation
Add this to your Cargo.toml:
[]
= "1.0.2"
= "0.15"
# Note: You must have a BLAS/LAPACK provider installed on your system
= { = "0.14", = ["openblas"] }
🎯 Quick Start
Loading Data (Multiple Options!)
Greeners provides flexible data loading similar to pandas/polars - from local files, URLs, or manual construction:
1. CSV from Local File
use ;
2. CSV from URL (NEW!)
// Load data directly from GitHub or any URL
let df = from_csv_url?;
// Perfect for reproducible research and shared datasets!
3. JSON from Local File (NEW!)
// Column-oriented JSON (like pandas.to_json(orient='columns'))
// { "x": [1.0, 2.0, 3.0], "y": [2.0, 4.0, 6.0] }
let df = from_json?;
// Or record-oriented JSON (like pandas.to_json(orient='records'))
// [{"x": 1.0, "y": 2.0}, {"x": 2.0, "y": 4.0}]
let df = from_json?;
4. JSON from URL (NEW!)
// Load JSON directly from APIs or URLs
let df = from_json_url?;
5. Builder Pattern (NEW!)
// Most convenient for manual data construction
let df = builder
.add_column
.add_column
.add_column
.build?;
let formula = parse?;
let result = OLSfrom_formula?;
Supported formats:
- ✅ CSV (local files)
- ✅ CSV (URLs) - requires internet connection
- ✅ JSON (local files) - both column and record oriented
- ✅ JSON (URLs) - perfect for API integration
- ✅ Builder pattern - convenient manual construction
- ✅ HashMap - traditional programmatic construction
📖 See examples/dataframe_loading.rs for comprehensive demonstration of all loading methods.
Using Formula API (R/Python Style)
use ;
use Array1;
use HashMap;
Traditional Matrix API
use ;
use ;
📚 Formula API Examples
Difference-in-Differences
use ;
// Python: smf.ols('outcome ~ treated + post + treated:post', data=df).fit(cov_type='HC1')
let formula = parse?;
let result = from_formula?;
Instrumental Variables (2SLS)
use ;
// Endogenous equation: y ~ x1 + x_endog
// Instruments: z1, z2
let endog_formula = parse?;
let instrument_formula = parse?;
let result = IVfrom_formula?;
Logit/Probit
use ;
// Binary choice models
let formula = parse?;
let logit_result = from_formula?;
let probit_result = from_formula?;
Panel Data (Fixed Effects)
use ;
let formula = parse?;
let result = from_formula?;
Quantile Regression
use ;
// Median regression
let formula = parse?;
let result = from_formula?;
🔧 Formula Syntax
- Basic:
y ~ x1 + x2 + x3(with intercept) - No intercept:
y ~ x1 + x2 - 1ory ~ 0 + x1 + x2 - Intercept only:
y ~ 1
All formulas follow R/Python syntax for familiarity and ease of use.
📖 Documentation
- FORMULA_API.md - Complete formula API guide with Python/R equivalents
- examples/ - Working examples for all estimators
dataframe_loading.rs- Load data from CSV, JSON, URLs, or Builder pattern (NEW!)csv_formula_example.rs- Load CSV files and run regressionsformula_example.rs- General formula API demonstrationdid_formula_example.rs- Difference-in-Differences with formulasquickstart_formula.rs- Quick start examplemarginal_effects.rs- Logit/Probit marginal effects (AME/MEM)specification_tests.rs- White, RESET, Breusch-Godfrey, Goldfeld-Quandt testspanel_model_selection.rs- Panel diagnostics and model comparison
Run examples:
🎯 Why Greeners?
- Familiar Syntax: R/Python-style formulas make transition seamless
- Type Safety: Rust's type system catches errors at compile time
- Performance: Native speed with BLAS/LAPACK backends
- Comprehensive: Full suite of econometric estimators
- Production Ready: Memory safe, no garbage collection pauses