anofox-statistics
A comprehensive statistical hypothesis testing library for Rust, validated against R.
This library provides a wide range of statistical tests commonly used in data analysis, all validated against R's implementations to ensure numerical accuracy.
Features
-
Math Primitives
- Mean, variance, standard deviation, median
- Trimmed mean (robust to outliers)
- Skewness and kurtosis (Fisher's definition, matching R's e1071)
-
Parametric Tests
- T-tests (Welch, Student, Paired) with all alternatives
- Yuen's test (robust t-test using trimmed means)
- Brown-Forsythe test (homogeneity of variances)
-
Nonparametric Tests
- Ranking with average tie handling
- Mann-Whitney U test (Wilcoxon rank-sum)
- Wilcoxon signed-rank test (paired)
- Kruskal-Wallis test (k-sample)
- Brunner-Munzel test (robust rank-based test for stochastic equality)
-
Distributional Tests
- Shapiro-Wilk normality test (Royston AS R94)
- D'Agostino's K-squared test (omnibus normality test using skewness and kurtosis)
-
Resampling Methods
- Permutation engine with custom statistics
- Permutation t-test
- Stationary bootstrap (for dependent data)
- Circular block bootstrap
-
Modern Distribution Tests
- Energy distance test (univariate and multivariate)
- Maximum Mean Discrepancy (MMD) with multiple kernels (Gaussian, Linear, Polynomial, Laplacian)
-
Forecast Evaluation
- Diebold-Mariano test for comparing predictive accuracy
- Clark-West test for nested model comparison
- Superior Predictive Ability (SPA) test for multiple model comparison
- MSPE-Adjusted SPA test for multiple nested models (Clark-West + bootstrap)
- Model Confidence Set (Hansen, Lunde, & Nason, 2011)
Installation
Add to your Cargo.toml:
[]
= "0.2"
Quick Start
T-Tests
use ;
let group1 = vec!;
let group2 = vec!;
// Welch t-test (unequal variances)
let result = t_test
.expect;
println!;
println!;
println!;
// Student t-test (equal variances assumed)
let result = t_test?;
// Paired t-test
let result = t_test?;
Yuen's Robust T-Test
use yuen_test;
// 20% trimmed means (robust to outliers)
let result = yuen_test?;
println!;
println!;
Brown-Forsythe Test
use brown_forsythe;
let groups = vec!;
let result = brown_forsythe?;
println!;
println!;
Nonparametric Tests
use ;
// Ranking
let data = vec!;
let ranks = rank?;
// Mann-Whitney U test
let result = mann_whitney_u?;
// Wilcoxon signed-rank test (paired)
let result = wilcoxon_signed_rank?;
// Kruskal-Wallis test
let result = kruskal_wallis?;
// Brunner-Munzel test (robust alternative to Mann-Whitney)
let result = brunner_munzel?;
println!;
Normality Tests
use ;
let data = vec!;
// Shapiro-Wilk test
let result = shapiro_wilk?;
println!;
println!;
// D'Agostino's K-squared test (omnibus test using skewness and kurtosis)
let result = dagostino_k_squared?;
println!;
println!;
Resampling Methods
use ;
// Permutation t-test
let result = permutation_t_test?;
println!;
// Stationary bootstrap for time series
let bootstrap = new?;
let samples: = bootstrap.take.collect;
// Circular block bootstrap
let bootstrap = new?;
Modern Distribution Tests
use ;
// Energy distance test
let result = energy_distance_test?;
println!;
println!;
// Maximum Mean Discrepancy with Gaussian kernel
let result = mmd_test?;
println!;
println!;
// MMD with automatic bandwidth selection
let result = mmd_test?;
Forecast Evaluation
use ;
// Forecast errors from two competing models
let errors_model1 = vec!;
let errors_model2 = vec!;
// Diebold-Mariano test
let result = diebold_mariano?;
println!;
println!;
// Clark-West test for nested models (e.g., AR(1) vs AR(2))
let restricted_errors = vec!; // Benchmark/restricted model
let unrestricted_errors = vec!; // Alternative/unrestricted model
let result = clark_west?;
println!;
println!;
// Superior Predictive Ability test (compare benchmark vs multiple models)
let benchmark_losses = vec!;
let model_losses = vec!;
let result = spa_test?;
println!;
println!;
// MSPE-Adjusted SPA for multiple nested models
// Combines Clark-West adjustment with bootstrap for multiple testing
let benchmark_errors = vec!;
let nested_model_errors = vec!;
let result = mspe_adjusted_spa?;
println!;
println!;
// Model Confidence Set - identify the set of best models
let losses = vec!;
let result = model_confidence_set?;
println!;
println!;
API Reference
Parametric Tests
| Function | Description |
|---|---|
t_test(x, y, kind, alternative) |
T-test (Welch, Student, or Paired) |
yuen_test(x, y, trim) |
Yuen's trimmed mean t-test |
brown_forsythe(groups) |
Brown-Forsythe test for homogeneity of variances |
Nonparametric Tests
| Function | Description |
|---|---|
rank(data) |
Compute ranks with average tie handling |
mann_whitney_u(x, y) |
Mann-Whitney U test (Wilcoxon rank-sum) |
wilcoxon_signed_rank(x, y) |
Wilcoxon signed-rank test for paired samples |
kruskal_wallis(groups) |
Kruskal-Wallis H test for k independent samples |
brunner_munzel(x, y) |
Brunner-Munzel test for stochastic equality |
Distributional Tests
| Function | Description |
|---|---|
shapiro_wilk(data) |
Shapiro-Wilk test for normality |
dagostino_k_squared(data) |
D'Agostino's K-squared omnibus normality test |
Resampling Methods
| Function | Description |
|---|---|
permutation_t_test(x, y, n_permutations, seed) |
Permutation-based t-test |
PermutationEngine::new(x, y, seed) |
Generic permutation testing engine |
StationaryBootstrap::new(data, expected_length, seed) |
Stationary bootstrap for dependent data |
CircularBlockBootstrap::new(data, block_length, seed) |
Circular block bootstrap |
Modern Distribution Tests
| Function | Description |
|---|---|
energy_distance_test(x, y, n_permutations, seed) |
Energy distance two-sample test |
mmd_test(x, y, kernel, n_permutations, seed) |
Maximum Mean Discrepancy test |
Forecast Evaluation
| Function | Description |
|---|---|
diebold_mariano(e1, e2, loss, h) |
Diebold-Mariano test for predictive accuracy |
clark_west(e1, e2, h) |
Clark-West test for nested model comparison |
spa_test(benchmark, models, n_bootstrap, block_length, seed) |
Superior Predictive Ability test |
mspe_adjusted_spa(benchmark, models, n_bootstrap, block_length, seed) |
MSPE-Adjusted SPA for multiple nested models |
model_confidence_set(losses, alpha, statistic, n_bootstrap, block_length, seed) |
Model Confidence Set (Hansen et al., 2011) |
Math Primitives
| Function | Description |
|---|---|
mean(data) |
Arithmetic mean |
variance(data) |
Sample variance |
std_dev(data) |
Sample standard deviation |
median(data) |
Median |
trimmed_mean(data, trim) |
Trimmed mean |
skewness(data) |
Sample skewness (Fisher's, type 2) |
kurtosis(data) |
Sample excess kurtosis (Fisher's, type 2) |
Validation
This library is developed using Test-Driven Development (TDD) with R as the oracle (ground truth). All implementations are validated against R's statistical functions:
t.test()for t-testsWRS2::yuen()for Yuen's testcar::leveneTest()for Brown-Forsythewilcox.test()for Mann-Whitney and Wilcoxon testskruskal.test()for Kruskal-Wallislawstat::brunner.munzel.test()for Brunner-Munzelshapiro.test()for Shapiro-Wilkmoments::agostino.test()andmoments::anscombe.test()for D'Agostinoe1071::skewness()ande1071::kurtosis()for skewness/kurtosisforecast::dm.test()for Diebold-Mariano
All tests ensure numerical agreement with R within appropriate tolerances.
Dependencies
- statrs - Statistical distributions
- thiserror - Error handling
- rand - Random number generation for resampling
License
MIT License