scirs2-stats
Comprehensive statistical computing for Rust — part of the SciRS2 scientific computing ecosystem.
scirs2-stats is the statistical backbone of SciRS2, providing a production-ready, pure-Rust implementation of probability distributions, hypothesis testing, Bayesian inference, survival analysis, MCMC sampling, Gaussian processes, copulas, and much more. The API mirrors SciPy's stats module where sensible, while going considerably further in v0.3.0 with nonparametric Bayes, causal inference, sequential Monte Carlo, and advanced time-series-oriented statistics.
Overview
Modern statistical workflows demand more than descriptive statistics and p-values. scirs2-stats covers:
- Classical statistics: descriptive measures, 100+ distributions, hypothesis tests, regression
- Bayesian inference: conjugate priors, MCMC (MH, HMC, NUTS, Gibbs, slice), SMC/particle filters, hierarchical models, Bayesian networks
- Survival analysis: Kaplan-Meier, Nelson-Aalen, Cox PH, AFT models, competing risks
- Nonparametric Bayes: Dirichlet process mixtures, Chinese restaurant process, Indian buffet process
- Causal inference: causal graphs (DAGs), do-calculus queries, cointegration tests
- Dependence modelling: copulas (Frank, Clayton, Gumbel, Student-t, vine), Bayesian copulas
- Advanced distributions: GPD, stable, von Mises-Fisher, truncated, Tweedie
- Multiple testing & effect sizes: FDR control, Bonferroni, BH, BY, Holm; Cohen's d, eta-squared, omega-squared
Feature List (v0.3.0)
Descriptive Statistics
- Mean, median, trimmed mean, geometric mean, harmonic mean
- Variance, standard deviation, MAD, IQR, range, coefficient of variation, Gini coefficient
- Skewness, kurtosis (Fisher and Pearson), moments
- Pearson, Spearman, Kendall tau, partial correlation, intraclass correlation (ICC)
- Quantiles, percentiles, box-plot statistics, winsorized statistics
Probability Distributions (100+)
- Continuous: Normal, Uniform, Student-t, Chi-square, F, Gamma, Beta, Exponential, Laplace, Logistic, Cauchy, Pareto, Weibull, Lognormal, Rayleigh, Gumbel, Extreme Value, ...
- Discrete: Poisson, Bernoulli, Binomial, Geometric, Hypergeometric, Negative Binomial, ...
- Multivariate: Multivariate Normal, Multivariate-t, Dirichlet, Wishart, Inverse-Wishart, Multinomial
- Circular: von Mises, wrapped Cauchy, wrapped Normal
- Heavy-tailed / special:
- Generalized Pareto Distribution (GPD) with MLE and PWM fitting
- Stable distributions (alpha-stable) with characteristic function
- von Mises-Fisher distribution on the sphere
- Truncated distributions (any base distribution truncated to an interval)
- Tweedie distribution (compound Poisson-Gamma, power variance)
Hypothesis Testing
- Parametric: one-/two-/paired-sample t-tests, one-way ANOVA, Tukey HSD
- Nonparametric: Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, Friedman
- Normality: Shapiro-Wilk, Anderson-Darling, D'Agostino's K²
- Goodness-of-fit: Kolmogorov-Smirnov (one- and two-sample), Chi-square
- Homogeneity: Levene, Bartlett, Brown-Forsythe
- Multiple testing corrections: Bonferroni, Benjamini-Hochberg (BH), Benjamini-Yekutieli (BY), Holm, Hochberg
- Effect size measures: Cohen's d, Cohen's f², eta-squared, partial eta-squared, omega-squared, Cramer's V, epsilon-squared
Regression Analysis
- Simple and multiple linear regression, polynomial regression
- Ridge (L2), Lasso (L1), Elastic Net regularized regression
- Robust regression: RANSAC, Huber, Theil-Sen
- Stepwise selection, cross-validation utilities, AIC/BIC model criteria
- Residual analysis, influence measures, VIF calculation
Bayesian Statistics & MCMC
- Conjugate priors (Beta-Binomial, Gamma-Poisson, Normal-Normal, ...)
- Markov Chain Monte Carlo:
- Metropolis-Hastings with adaptive proposals
- Hamiltonian Monte Carlo (HMC)
- No-U-Turn Sampler (NUTS)
- Gibbs sampling
- Slice sampling
- Sequential Monte Carlo (SMC) / particle filters: Bootstrap particle filter, auxiliary particle filter, resample-move, tempering
- Hierarchical Bayesian models
- Bayesian networks (DAG structure, exact and approximate inference)
- Variational inference utilities
Gaussian Processes
- Squared-exponential, Matern (1/2, 3/2, 5/2), rational quadratic, periodic, linear kernels
- Kernel composition (sum, product, scale)
- GP regression and classification
- Sparse GP (inducing-point methods: FITC, VFE)
- Deep GP (stacked latent GP layers)
- Hyperparameter optimization via marginal likelihood
Survival Analysis
- Kaplan-Meier estimator with confidence bands
- Nelson-Aalen cumulative hazard estimator
- Cox proportional hazards model (partial likelihood, Breslow baseline)
- Accelerated Failure Time (AFT) models (Weibull, log-normal, log-logistic parametrizations)
- Competing risks analysis (cause-specific and sub-distribution hazard models)
- Log-rank test and generalized Wilcoxon test
- Restricted mean survival time (RMST) estimation
Copulas & Dependence Modelling
- Parametric copulas: Frank, Clayton, Gumbel, Gaussian, Student-t
- Vine copulas (C-vine, D-vine, R-vine structures)
- Copula fitting (MLE, canonical ML), tail dependence coefficients
- Conditional simulation from fitted copulas
Mixture Models
- Gaussian Mixture Models (GMM) with EM and variational EM
- Finite mixture models (general distributions)
- Nonparametric Bayesian mixtures:
- Dirichlet Process Mixture Models (DPMM)
- Chinese Restaurant Process (CRP) samplers
- Indian Buffet Process (IBP) for latent feature models
Causal Inference
- Causal graph representation (DAGs, CPDAGs, MAGs)
- D-separation and Markov blanket queries
- Cointegration testing (Engle-Granger, Johansen trace and eigenvalue tests)
- Structural equation models (SEM) — basic linear
- Causal impact estimation via Bayesian structural time series
Time-Series-Oriented Statistics
- Dynamic Factor Models (DFM) with EM fitting
- Time-Varying Parameter VAR (TVP-VAR) with Kalman filter
- Hidden Markov Models (HMM) — Baum-Welch, Viterbi
- Stationarity tests: ADF, KPSS, Phillips-Perron, DFGLS, Zivot-Andrews
- Spectral density estimation: periodogram, Welch, multitaper
Compositional & Spatial Data
- Compositional data analysis: Aitchison geometry, ALR/CLR/ILR transforms, Dirichlet MLE
- Spatial statistics: variogram estimation and modelling, Kriging (ordinary, simple, universal), Moran's I
- Spatial scan statistics, point pattern analysis (K-function, L-function, Ripley)
Panel Data & Hierarchical Models
- Panel data models: fixed effects, random effects (GLS), pooled OLS
- Hierarchical linear models (HLM / multilevel models)
- Cross-sectional dependence tests, Hausman test
Quasi-Monte Carlo Sampling
- Sobol sequences, Halton sequences, Faure sequences
- Latin hypercube sampling (LHS) with optimization
- Scrambled nets (Owen's scrambling)
Quick Start
Add to your Cargo.toml:
[]
= "0.3.0"
Basic Descriptive Statistics
use array;
use ;
let data = array!;
let m = mean.unwrap; // 5.5
let med = median.unwrap; // 5.5
let s = std.unwrap; // sample std dev
let sk = skew.unwrap;
let kurt = kurtosis.unwrap;
println!;
Hypothesis Testing with Multiple Testing Correction
use array;
use ;
let group_a = array!;
let group_b = array!;
let result = ttest_ind.unwrap;
println!;
// Multiple testing correction over a collection of p-values
let p_values = vec!;
let corrected = correct.unwrap;
println!;
Sequential Monte Carlo (Particle Filter)
use ;
let config = BootstrapConfig ;
let pf = new;
// Feed observations and propagate particles
let estimates = pf.filter.unwrap;
Survival Analysis
use ;
// Kaplan-Meier estimator
let times = vec!;
let events = vec!;
let km = fit.unwrap;
println!;
// Nelson-Aalen cumulative hazard
let na = fit.unwrap;
println!;
Copula Modelling
use ;
let clayton = new; // theta=2
let = ;
println!;
let samples = clayton.sample.unwrap;
Gaussian Process Regression
use ;
let kernel = new; // length_scale=1.0, variance=1.0
let mut gp = new;
gp.fit.unwrap;
let = gp.predict.unwrap;
API Overview
| Module | Description |
|---|---|
descriptive |
Descriptive statistics: mean, variance, skewness, quantiles |
distributions |
100+ probability distributions with pdf/cdf/rvs |
distributions::multivariate |
Multivariate Normal, Dirichlet, Wishart, von Mises-Fisher |
distributions::generalized_pareto |
GPD fitting and inference |
distributions::stable |
Alpha-stable distributions |
distributions::truncated |
Truncated wrappers for any distribution |
distributions::tweedie |
Tweedie / compound Poisson-Gamma |
tests |
Parametric and nonparametric hypothesis tests |
multiple_testing |
Bonferroni, BH, BY, Holm corrections |
effect_size |
Cohen's d, eta-squared, omega-squared, Cramer's V |
regression |
Linear, ridge, lasso, elastic net, robust regression |
bayesian |
Conjugate priors, variational inference utilities |
mcmc |
Metropolis-Hastings, HMC, NUTS, Gibbs, slice, SMC |
mcmc::smc |
Sequential Monte Carlo / particle filters |
gaussian_process |
GP regression and classification |
gaussian_process::advanced |
Sparse GP, deep GP, non-Gaussian likelihoods |
survival |
Kaplan-Meier, Nelson-Aalen, Cox PH, AFT, competing risks |
copula |
Frank, Clayton, Gumbel, Gaussian, Student-t, vine copulas |
mixture |
GMM, finite mixtures, Dirichlet process mixtures |
nonparametric_bayes |
CRP, IBP, stick-breaking constructions |
causal |
Causal DAGs, d-separation, do-calculus, SEM |
causal_graph |
Graph-based causal inference utilities |
cointegration |
Engle-Granger, Johansen tests |
hmm |
Hidden Markov Models (Baum-Welch, Viterbi) |
bayesian_network |
Bayesian network inference |
dynamic_factor |
Dynamic factor models (DFM) |
tvp_var |
Time-varying parameter VAR |
information |
Mutual information, KL divergence, entropy |
stationarity |
ADF, KPSS, Phillips-Perron, DFGLS, Zivot-Andrews |
spectral_density |
Periodogram, Welch, multitaper PSD |
compositional |
Aitchison geometry, ALR/CLR/ILR transforms |
spatial |
Variogram, Kriging, Moran's I, K-function |
spatial_stats |
Spatial scan statistics, point processes |
panel |
Fixed/random effects panel models |
hierarchical |
Hierarchical linear models |
extreme_value |
GEV, GPD, block maxima, peaks-over-threshold |
nonparametric |
Kernel density estimation, rank tests |
robust |
Robust location and scale estimators |
resampling |
Bootstrap (simple, stratified, block), jackknife, permutation |
sampling |
QMC: Sobol, Halton, LHS, scrambled nets |
random |
Random variate generation utilities |
Feature Flags
| Flag | Description |
|---|---|
parallel |
Enable Rayon-based parallel computation (recommended) |
simd |
SIMD-accelerated inner loops via scirs2-core |
serde |
Serialization support via serde / oxicode |
python |
Python interop layer |
Default features: none (pure Rust, no C/Fortran dependencies).
Links
License
Apache License 2.0. See LICENSE for details.