Greeners: High-Performance Econometrics in Rust
Greeners is a lightning-fast, type-safe econometrics library written in pure Rust. It provides a comprehensive suite of estimators for Cross-Sectional, Time-Series, and Panel Data analysis, leveraging linear algebra backends (LAPACK/BLAS) for maximum performance.
Designed for academic research, heavy simulations, and production-grade economic modeling.
π v1.3.0 MAJOR FEATURE RELEASE: Complete Data Handling & Time Series
Greeners v1.3.0 brings pandas-like DataFrame capabilities and essential time series operations for econometric analysis - all while maintaining 100% backward compatibility with v1.0.2!
π Three Major Feature Sets (NEW in v1.3.0)
1. String Column Support
Store free-form text data alongside numerical columns:
use DataFrame;
let customers = builder
.add_int
.add_string
.add_string
.add_column
.build?;
// Access string data
let names = customers.get_string?;
println!; // "Alice Johnson"
String vs Categorical:
- String columns: Free text, unique values (names, emails, addresses, comments)
- Categorical columns: Repeated categories, encoded as integers (regions, groups)
π See examples/string_features.rs for comprehensive demonstration.
2. Missing Data & Null Support
Complete toolkit for handling missing values - just like pandas!
use DataFrame;
// Detect missing values
let mask = df.isna?; // Boolean mask
let n_missing = df.count_na; // Count
// Remove missing data
let clean = df.dropna?; // Drop any row with NaN
let clean_subset = df.dropna_subset?; // Drop if specific cols missing
// Fill missing values
let filled = df.fillna?; // Fill with constant
let forward = df.fillna_ffill?; // Forward fill (carry last valid)
let backward = df.fillna_bfill?; // Backward fill (carry next valid)
let smooth = df.interpolate?; // Linear interpolation
Comprehensive workflow:
- Detect:
isna(),notna(),count_na()for investigation - Handle:
dropna()for complete-case analysis - Impute:
fillna(),ffill(),bfill(),interpolate()for treatment
π See examples/missing_data_features.rs for complete workflow.
3. Time Series Operations
Essential operations for econometric time series analysis:
use DataFrame;
// Stock price data
let df = builder
.add_column
.add_column
.build?;
// Lag operator - create lagged variables
let with_lag = df.lag?; // Previous day's price β price_lag_1
// Essential for AR models: y_t = Ξ²β + Ξ²βΒ·y_{t-1} + Ξ΅_t
// Lead operator - forward-looking variables
let with_lead = df.lead?; // Next day's price β price_lead_1
// Essential for lead-lag analysis and Granger causality
// First differences - achieve stationarity
let stationary = df.diff?; // Ξprice_t = price_t - price_{t-1} β price_diff_1
// Essential for unit root tests and I(1) processes
// Percentage changes - returns calculation
let returns = df.pct_change?; // (price_t - price_{t-1}) / price_{t-1} β price_pct_1
// Standard in finance for asset returns
// Chain operations for complete analysis
let analysis = df
.lag?
.diff?
.pct_change?;
// Creates: price_lag_1, price_diff_1, price_pct_1
Use cases:
- Finance: Returns (
pct_change), momentum strategies - Econometrics: AR models (
lag), stationarity testing (diff), GDP growth - Machine Learning: Time series feature engineering (multiple lags)
Mathematical relationships:
lag(x, n)[t] = x[t-n]lead(x, n)[t] = x[t+n]diff(x, n)[t] = x[t] - x[t-n]pct_change(x, n)[t] = (x[t] - x[t-n]) / x[t-n]
π See examples/time_series_features.rs for 11 practical examples.
Why v1.3.0 Matters
Before v1.3.0:
- Greeners = Powerful econometric estimators + basic DataFrame
- Missing data? Manual handling required
- Time series? Use
shift()and manual calculations - Text data? Not supported
Now v1.3.0:
- Greeners = Complete data analysis platform with pandas-like capabilities
- String columns β Missing data toolkit β Time series ops β
- Full workflow: Load β Clean β Transform β Model β Predict
- Only Rust library with comprehensive econometrics + DataFrame
- 100% backward compatible - all v1.0.2 code works unchanged!
Migration from v1.0.2
100% backward compatible - zero breaking changes!
All v1.0.2 code works unchanged. New capabilities are purely additive:
// Your existing v1.0.2 code
let df = from_csv?;
let formula = parse?;
let result = OLSfrom_formula?;
// β
Still works perfectly!
// New v1.3.0 capabilities (additive)
let df_with_strings = df.add_string?; // NEW
let clean_df = df.dropna?; // NEW
let with_lags = df.lag?; // NEW
π v1.0.2 STABLE RELEASE: Named Variables & Enhanced Data Loading
Greeners v1.0.2 brings human-readable variable names in regression output and flexible data loading from multiple sources!
π Multiple Data Loading Options (NEW in v1.0.2)
Load data from CSV, JSON, URLs, or use the Builder pattern - just like pandas/polars!
// 1. CSV from URL (reproducible research!)
let df = from_csv_url?;
// 2. JSON from local file (column or record oriented)
let df = from_json?;
// 3. JSON from URL (API integration)
let df = from_json_url?;
// 4. Builder pattern (most convenient!)
let df = builder
.add_column
.add_column
.build?;
// 5. CSV from local file (classic)
let df = from_csv?;
Why this matters:
- β Reproducible research - Load datasets directly from GitHub/URLs
- β API integration - Fetch data from web services
- β Flexible formats - CSV, JSON (column/record oriented)
- β Pandas-like - Familiar syntax for data scientists
- β Type-safe - All data loading is checked at compile time
π See examples/dataframe_loading.rs for all loading methods.
Named Variables in Results (NEW in v1.0.2)
No more generic x0, x1, x2 in regression output! All models now display actual variable names from your Formula:
use ;
let formula = parse?;
let result = OLSfrom_formula?;
println!;
Before (v1.0.1):
OLS Regression Results
====================================
Variable Coef Std Err t P>|t|
const 5.23 0.45 11.62 0.000
x0 2.15 0.12 17.92 0.000 <- Generic names
x1 0.08 0.02 4.00 0.000
x2 -1.20 0.25 -4.80 0.000
Now (v1.0.2):
OLS Regression Results
====================================
Variable Coef Std Err t P>|t|
const 5.23 0.45 11.62 0.000
education 2.15 0.12 17.92 0.000 <- Actual variable names!
experience 0.08 0.02 4.00 0.000
female -1.20 0.25 -4.80 0.000
Applies to ALL models:
- β OLS, WLS, Cochrane-Orcutt (FGLS)
- β IV/2SLS (Instrumental Variables)
- β Logit/Probit (Binary Choice)
- β Quantile Regression (all quantiles)
- β Panel Data (Fixed Effects, Random Effects, Between)
- β GMM (Generalized Method of Moments)
- β Difference-in-Differences
Comprehensive Test Coverage
v1.3.0 includes 102 unit tests covering all major functionality:
- 17 new tests added in v1.3.0 for time series operations
- Full coverage of OLS, IV/2SLS, Panel Data, DiD, FGLS, Quantile Regression
- String columns, Missing data, Time series operations
- Diagnostic tests (VIF, Breusch-Pagan, Jarque-Bera, Durbin-Watson)
- GMM specification tests (J-statistic, overidentification)
- Model selection and information criteria
Run tests locally:
Code Quality Improvements
- Applied clippy lints for idiomatic Rust (25+ improvements)
- Replaced
.iter().cloned().collect()with.to_vec()for better performance - Modern range checks using
.contains()instead of manual comparisons - Cleaner, more maintainable codebase
π v1.0.1: Specification Tests
Greeners reaches production stability with comprehensive specification tests for diagnosing regression assumptions!
Specification Tests (NEW in v1.0.1)
Diagnose violations of classical regression assumptions and identify appropriate remedies:
use ;
// Estimate model
let model = OLSfrom_formula?;
let = df.to_design_matrix?;
let residuals = model.residuals;
let fitted = model.fitted_values;
// 1. White Test for Heteroskedasticity
let = white_test?;
if p_value < 0.05
// 2. RESET Test for Functional Form Misspecification
let = reset_test?;
if p_value < 0.05
// 3. Breusch-Godfrey Test for Autocorrelation
let = breusch_godfrey_test?;
if p_value < 0.05
// 4. Goldfeld-Quandt Test for Heteroskedasticity
let = goldfeld_quandt_test?;
When to Use:
- White Test β General heteroskedasticity test (any form)
- RESET Test β Detect omitted variables or wrong functional form
- Breusch-Godfrey β Detect autocorrelation in time series/panel data
- Goldfeld-Quandt β Test heteroskedasticity when you suspect specific ordering
Remedies:
- Heteroskedasticity β
CovarianceType::HC3orHC4 - Autocorrelation β
CovarianceType::NeweyWest(lags) - Misspecification β Add
I(x^2),x1*x2interactions
Stata/R/Python Equivalents:
- Stata:
estat hettest,estat ovtest,estat bgodfrey - R:
lmtest::bptest(),lmtest::resettest(),lmtest::bgtest() - Python:
statsmodels.stats.diagnostic.het_white()
π See examples/specification_tests.rs for comprehensive demonstration.
β¨ NEW: R/Python-Style Formula API
Greeners now supports R/Python-style formula syntax (like statsmodels and lm()), making model specification intuitive and concise:
use ;
// Python equivalent: smf.ols('y ~ x1 + x2', data=df).fit(cov_type='HC1')
let formula = parse?;
let result = OLSfrom_formula?;
All estimators support formulas: OLS, WLS, DiD, IV/2SLS, Logit/Probit, Quantile Regression, Panel Data (FE/RE/Between), and more!
π See FORMULA_API.md for complete documentation and examples.
π Panel Diagnostics & Model Selection
Greeners now provides comprehensive tools for panel data model selection and information criteria-based model comparison - essential for rigorous empirical research!
Model Selection & Comparison
Compare multiple models using AIC/BIC with automatic ranking and Akaike weights for model averaging:
use ;
// Estimate competing models
let model1 = OLSfrom_formula?;
let model2 = OLSfrom_formula?;
let model3 = OLSfrom_formula?;
// Compare models
let models = vec!;
let comparison = compare_models;
print_comparison;
// Calculate Akaike weights for model averaging
let aic_values: = comparison.iter.map.collect;
let = akaike_weights;
Output:
=============================== Model Comparison ===============================
Model | AIC | BIC | Rank(AIC) | Rank(BIC)
--------------------------------------------------------------------------------
Full Model | 183.83 | 191.48 | 1 | 1
Restricted | 184.77 | 190.50 | 2 | 2
Simple | 188.19 | 192.01 | 3 | 3
π AKAIKE WEIGHTS:
Ξ_AIC < 2: Substantial support
Ξ_AIC 4-7: Considerably less support
Ξ_AIC > 10: Essentially no support
Panel Diagnostics Tests
Test whether pooled OLS is appropriate or if panel data methods (Fixed/Random Effects) are needed:
Breusch-Pagan LM Test for Random Effects
use ;
// Estimate pooled OLS
let model_pooled = OLSfrom_formula?;
let = df.to_design_matrix?;
let residuals = model_pooled.residuals;
// Test for random effects
let = breusch_pagan_lm?;
// Interpretation:
// Hβ: ΟΒ²_u = 0 (no panel effects, pooled OLS adequate)
// Hβ: ΟΒ²_u > 0 (random effects needed)
// If p < 0.05 β Use Random Effects or Fixed Effects
F-Test for Fixed Effects
// Test if firm fixed effects are significant
let = f_test_fixed_effects?;
// Interpretation:
// Hβ: All firm effects are zero (pooled OLS adequate)
// Hβ: Firm effects exist (use fixed effects)
// If p < 0.05 β Use Fixed Effects model
Summary Statistics
Quick descriptive statistics for initial data exploration:
use SummaryStats;
let stats = describe;
// Returns: (mean, std, min, Q25, median, Q75, max, n_obs)
// Pretty-print summary table
let summary_data = vec!;
print_summary;
Stata/R/Python Equivalents:
- Stata:
estat ic(AIC/BIC),xttest0(BP LM),testparm(F-test) - R:
AIC(),BIC(),plm::plmtest(),plm::pFtest() - Python:
statsmodelsinformation criteria,linearmodels.paneldiagnostics
π See examples/panel_model_selection.rs for comprehensive demonstration with panel data workflow.
π Marginal Effects for Binary Choice Models
After estimating Logit/Probit models, coefficients alone are hard to interpret (they're on log-odds/z-score scale). Marginal effects translate these to probability changes - essential for policy analysis and substantive interpretation!
Average Marginal Effects (AME) - RECOMMENDED
use ;
// Estimate Logit model
let formula = parse?;
let result = from_formula?;
// Get design matrix
let = df.to_design_matrix?;
// Calculate Average Marginal Effects (AME)
let ame = result.average_marginal_effects?;
// Interpretation: AME[gpa] = 0.15 means:
// "A 1-point increase in GPA increases admission probability by 15 percentage points"
// (averaged across all students in the sample)
Why AME?
- β Accounts for heterogeneity across observations
- β More robust to non-linearities
- β Standard in modern econometrics (Stata, R, Python)
- β Easy to interpret: probability changes, not log-odds
Marginal Effects at Means (MEM)
// Calculate Marginal Effects at Means (MEM)
let mem = result.marginal_effects_at_means?;
// Interpretation: Effect for "average" student
// β οΈ Less robust than AME - can evaluate at impossible values (e.g., average of dummies)
Predictions
// Predict admission probabilities for new students
let probs = result.predict_proba;
// Example: probs[0] = 0.85 β 85% chance of admission
Logit vs Probit Comparison
// Both models give similar marginal effects
let logit_result = from_formula?;
let probit_result = from_formula?;
let ame_logit = logit_result.average_marginal_effects?;
let ame_probit = probit_result.average_marginal_effects?;
// Typically: ame_logit β ame_probit (differences < 1-2 percentage points)
Stata/R/Python Equivalents:
- Stata:
margins, dydx(*)(AME) ormargins, dydx(*) atmeans(MEM) - R:
mfx::logitmfx()ormargins::margins() - Python:
statsmodels.discrete.discrete_model.Logit(...).get_margeff()
π See examples/marginal_effects.rs for comprehensive demonstration with college admission data.
Two-Way Clustered Standard Errors
For panel data with clustering along two dimensions (e.g., firms Γ time):
// Panel data: 4 firms Γ 6 time periods
let firm_ids = vec!;
let time_ids = vec!;
// Two-way clustering (Cameron-Gelbach-Miller, 2011)
let result = OLSfrom_formula?;
// Formula: V = V_firm + V_time - V_intersection
// Accounts for BOTH within-firm AND within-time correlation
When to use:
- β Panel data (firms/countries over time)
- β Correlation within entities AND within time periods
- β More robust than one-way clustering
- β Standard in modern panel data econometrics
Stata equivalent: reghdfe y x, vce(cluster firm_id time_id)
π See examples/two_way_clustering.rs for complete comparison of non-robust vs one-way vs two-way clustering.
π Categorical Variables & Polynomial Terms
Categorical Variable Encoding
Automatic dummy variable creation with R/Python syntax:
// Categorical variable: creates dummies, drops first level
let formula = parse?;
let result = OLSfrom_formula?;
// If region has values [0, 1, 2, 3] β creates 3 dummies (drops 0 as reference)
How it works:
C(var)detects unique values in the variable- Creates K-1 dummy variables (drops first category as reference)
- Essential for categorical data (regions, industries, treatment groups)
Polynomial Terms
Non-linear relationships made easy:
// Quadratic model: captures diminishing returns
let formula = parse?;
// Cubic model: more flexible
let formula = parse?;
// Alternative syntax (Python-style)
let formula = parse?;
Use cases:
- Production functions (diminishing returns)
- Wage curves (experience effects)
- Growth models (non-linear dynamics)
Combine with interactions:
// Region-specific quadratic effects
let formula = parse?;
π Clustered Standard Errors & Advanced Diagnostics
Clustered Standard Errors
Critical for panel data and hierarchical structures where observations are grouped:
// Panel data: firms over time
let cluster_ids = vec!; // Firm IDs
let result = OLSfrom_formula?;
Use clustered SE when:
- Panel data (repeated observations per entity)
- Hierarchical data (students in schools, patients in hospitals)
- Experimental data with treatment clusters
- Geographic clustering (observations in regions/countries)
Advanced Diagnostics
New diagnostic tools for model validation:
use Diagnostics;
// Multicollinearity detection
let vif = vif?; // Variance Inflation Factor
let cond_num = condition_number?; // Condition Number
// Influential observations
let leverage = leverage?; // Hat values
let cooks_d = cooks_distance?; // Cook's Distance
// Assumption testing (already available)
let = jarque_bera?; // Normality
let = breusch_pagan?; // Heteroskedasticity
let dw_stat = durbin_watson; // Autocorrelation
π Interactions, HC2/HC3, and Predictions
Interaction Terms
Model interaction effects with R/Python syntax:
// Full interaction: x1 * x2 expands to x1 + x2 + x1:x2
let formula = parse?;
let result = OLSfrom_formula?;
// Interaction only: just the product term
let formula2 = parse?;
Use cases:
- Differential effects by groups (e.g., education returns by gender)
- Treatment effect heterogeneity
- Testing moderation/mediation hypotheses
Enhanced Robust Standard Errors
// HC2: Leverage-adjusted (more efficient with small samples)
let result_hc2 = OLSfrom_formula?;
// HC3: Jackknife (most robust - RECOMMENDED for small samples)
let result_hc3 = OLSfrom_formula?;
Comparison:
- HC1: White (1980), uses n/(n-k) correction
- HC2: Adjusts for leverage: ΟΒ²/(1-h_i)
- HC3: Jackknife: ΟΒ²/(1-h_i)Β² - Most conservative & robust
Post-Estimation Predictions
// Out-of-sample predictions
let x_new = from_shape_vec?;
let predictions = result.predict;
// In-sample fitted values
let fitted = result.fitted_values;
// Residuals
let resid = result.residuals;
π Features
Cross-Sectional & General
- OLS & GLS: Robust standard errors (White, Newey-West).
- IV / 2SLS: Instrumental Variables for endogeneity correction.
- Quantile Regression: Robust estimation via Iteratively Reweighted Least Squares (IRLS).
- Discrete Choice: Logit and Probit models (Newton-Raphson MLE).
- Diagnostics: R-squared, F-Test, T-Test, Confidence Intervals.
Time Series (Macroeconometrics)
- Unit Root Tests: Augmented Dickey-Fuller (ADF).
- VAR (Vector Autoregression): Multivariate modeling with Information Criteria (AIC/BIC).
- VARMA: Hannan-Rissanen algorithm for ARMA structures.
- VECM (Cointegration): Johansen Procedure (Eigenvalue decomposition) for I(1) systems.
- Impulse Response Functions (IRF): Orthogonalized structural shocks.
Panel Data
- Fixed Effects (Within): Absorbs individual heterogeneity.
- Random Effects: Swamy-Arora GLS estimator.
- Between Estimator: Long-run cross-sectional relationships.
- Dynamic Panel: Arellano-Bond (Difference GMM) to solve Nickell Bias.
- Panel Threshold: Hansen (1999) non-linear regime switching models.
- Testing: Hausman Test for FE vs RE.
Systems of Equations
- SUR: Seemingly Unrelated Regressions (Zellner).
- 3SLS: Three-Stage Least Squares (System IV).
System Requirements (Pre-requisites)
Debian / Ubuntu / Pop!_OS:
Fedora / RHEL / CentOS:
Arch Linux / Manjaro:
macOS:
π¦ Installation
Add this to your Cargo.toml:
[]
= "1.3.0"
= "0.17"
# Note: You must have a BLAS/LAPACK provider installed on your system
= { = "0.18", = ["openblas-system"] }
π― Quick Start
Loading Data (Multiple Options!)
Greeners provides flexible data loading similar to pandas/polars - from local files, URLs, or manual construction:
1. CSV from Local File
use ;
2. CSV from URL (NEW!)
// Load data directly from GitHub or any URL
let df = from_csv_url?;
// Perfect for reproducible research and shared datasets!
3. JSON from Local File (NEW!)
// Column-oriented JSON (like pandas.to_json(orient='columns'))
// { "x": [1.0, 2.0, 3.0], "y": [2.0, 4.0, 6.0] }
let df = from_json?;
// Or record-oriented JSON (like pandas.to_json(orient='records'))
// [{"x": 1.0, "y": 2.0}, {"x": 2.0, "y": 4.0}]
let df = from_json?;
4. JSON from URL (NEW!)
// Load JSON directly from APIs or URLs
let df = from_json_url?;
5. Builder Pattern (NEW!)
// Most convenient for manual data construction
let df = builder
.add_column
.add_column
.add_column
.build?;
let formula = parse?;
let result = OLSfrom_formula?;
Supported formats:
- β CSV (local files)
- β CSV (URLs) - requires internet connection
- β JSON (local files) - both column and record oriented
- β JSON (URLs) - perfect for API integration
- β Builder pattern - convenient manual construction
- β HashMap - traditional programmatic construction
π See examples/dataframe_loading.rs for comprehensive demonstration of all loading methods.
Using Formula API (R/Python Style)
use ;
use Array1;
use HashMap;
Traditional Matrix API
use ;
use ;
π Formula API Examples
Difference-in-Differences
use ;
// Python: smf.ols('outcome ~ treated + post + treated:post', data=df).fit(cov_type='HC1')
let formula = parse?;
let result = from_formula?;
Instrumental Variables (2SLS)
use ;
// Endogenous equation: y ~ x1 + x_endog
// Instruments: z1, z2
let endog_formula = parse?;
let instrument_formula = parse?;
let result = IVfrom_formula?;
Logit/Probit
use ;
// Binary choice models
let formula = parse?;
let logit_result = from_formula?;
let probit_result = from_formula?;
Panel Data (Fixed Effects)
use ;
let formula = parse?;
let result = from_formula?;
Quantile Regression
use ;
// Median regression
let formula = parse?;
let result = from_formula?;
π§ Formula Syntax
- Basic:
y ~ x1 + x2 + x3(with intercept) - No intercept:
y ~ x1 + x2 - 1ory ~ 0 + x1 + x2 - Intercept only:
y ~ 1
All formulas follow R/Python syntax for familiarity and ease of use.
π Documentation
- FORMULA_API.md - Complete formula API guide with Python/R equivalents
- examples/ - Working examples for all estimators
string_features.rs- String column support (NEW v1.3.0!)missing_data_features.rs- Missing data toolkit (NEW v1.3.0!)time_series_features.rs- Time series operations: lag, lead, diff, pct_change (NEW v1.3.0!)dataframe_loading.rs- Load data from CSV, JSON, URLs, or Builder patterncsv_formula_example.rs- Load CSV files and run regressionsformula_example.rs- General formula API demonstrationdid_formula_example.rs- Difference-in-Differences with formulasquickstart_formula.rs- Quick start examplemarginal_effects.rs- Logit/Probit marginal effects (AME/MEM)specification_tests.rs- White, RESET, Breusch-Godfrey, Goldfeld-Quandt testspanel_model_selection.rs- Panel diagnostics and model comparison
Run examples:
# NEW v1.3.0 examples
# Other examples
π― Why Greeners?
- Familiar Syntax: R/Python-style formulas make transition seamless
- Type Safety: Rust's type system catches errors at compile time
- Performance: Native speed with BLAS/LAPACK backends
- Comprehensive: Full suite of econometric estimators
- Production Ready: Memory safe, no garbage collection pauses