scirs2-stats 0.1.1

Statistical functions module for SciRS2 (scirs2-stats)
Documentation

scirs2-stats

crates.io License Documentation

Production-ready statistical functions module for the SciRS2 ecosystem (v0.1.0), providing comprehensive statistical computing capabilities modeled after SciPy's stats module but optimized for Rust's performance and safety guarantees. Following the SciRS2 POLICY, this module ensures ecosystem consistency through scirs2-core abstractions.

Overview

SciRS2-stats is a mature, production-ready statistical computing library that provides:

  • High-performance statistical algorithms optimized for Rust's memory safety and concurrency
  • Comprehensive API compatibility with SciPy's stats module for easy migration
  • Zero-copy operations where possible for maximum performance
  • Thread-safe implementations leveraging Rust's ownership system
  • Extensive test coverage with 280+ tests ensuring reliability

Features

  • Descriptive statistics
    • Basic measures: mean, median, variance, standard deviation
    • Advanced statistics: skewness, kurtosis, moments
    • Correlation measures: Pearson, Spearman, Kendall tau, partial correlation, point-biserial
    • Dispersion measures: MAD, median absolute deviation, IQR, range, coefficient of variation
  • Statistical distributions
    • Continuous: Normal, Uniform, Student's t, Chi-square, F, Gamma, Beta, Exponential, Laplace, Logistic, Cauchy, Pareto, Weibull
    • Discrete: Poisson, Binomial, Hypergeometric, Bernoulli, Geometric, Negative Binomial
    • Multivariate: Multivariate Normal, Multivariate t, Dirichlet, Wishart, InverseWishart, Multinomial
  • Statistical tests
    • Parametric tests: t-tests (one-sample, two-sample, paired), ANOVA
    • Non-parametric tests: Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, Friedman
    • Normality tests: Shapiro-Wilk, Anderson-Darling, D'Agostino's K²
    • Goodness-of-fit tests: Chi-square, Kolmogorov-Smirnov
  • Random number generation
  • Regression models
    • Linear, polynomial, and stepwise regression
    • Robust regression methods (RANSAC, Huber, Theil-Sen)
    • Regularized models (Ridge, Lasso, Elastic Net)
  • Sampling techniques (bootstrap, stratified sampling)
  • Contingency table functions

Installation

Add scirs2-stats to your Cargo.toml:

[dependencies]
scirs2-stats = "0.1.0"
ndarray = "0.16.1"

This version represents Stable Release before the stable 1.0 release, offering production-ready functionality with a stable API and zero-warning code quality.

Requirements

  • Rust 1.65 or later
  • rand 0.9.0 or later
  • ndarray 0.16.0 or later

Usage Examples

Descriptive Statistics

use ndarray::array;
use scirs2_stats::{mean, median, std, var, skew, kurtosis};

let data = array![1.0, 2.0, 3.0, 4.0, 5.0];

// Calculate basic statistics
let mean_val = mean(&data.view()).unwrap();
let median_val = median(&data.view()).unwrap();
let var_val = var(&data.view(), 0).unwrap();  // ddof = 0 for population variance
let std_val = std(&data.view(), 0).unwrap();  // ddof = 0 for population standard deviation

// Advanced statistics
let skewness = skew(&data.view(), false).unwrap();  // bias = false
let kurt = kurtosis(&data.view(), true, false).unwrap();  // fisher = true, bias = false

Statistical Distributions

use scirs2_stats::distributions;

// Normal distribution
let normal = distributions::norm(0.0f64, 1.0).unwrap();
let pdf = normal.pdf(0.0);
let cdf = normal.cdf(1.96);
let samples = normal.rvs(100).unwrap();

// Poisson distribution
let poisson = distributions::poisson(3.0f64, 0.0).unwrap();
let pmf = poisson.pmf(2.0);
let cdf = poisson.cdf(4.0);
let samples = poisson.rvs(100).unwrap();

Correlation Measures

use ndarray::{array, Array2};
use scirs2_stats::{pearson_r, spearman_r, kendall_tau, corrcoef};

// Calculate Pearson correlation coefficient (linear correlation)
let x = array![1.0, 2.0, 3.0, 4.0, 5.0];
let y = array![5.0, 4.0, 3.0, 2.0, 1.0];

let r = pearson_r(&x.view(), &y.view()).unwrap();
println!("Pearson correlation: {}", r);  // Should be -1.0 (perfect negative correlation)

// Spearman rank correlation (monotonic relationship)
let rho = spearman_r(&x.view(), &y.view()).unwrap();
println!("Spearman correlation: {}", rho);

// Kendall tau rank correlation
let tau = kendall_tau(&x.view(), &y.view(), "b").unwrap();
println!("Kendall tau correlation: {}", tau);

// Correlation matrix for multiple variables
let data = array![
    [1.0, 5.0, 10.0],
    [2.0, 4.0, 9.0],
    [3.0, 3.0, 8.0],
    [4.0, 2.0, 7.0],
    [5.0, 1.0, 6.0]
];

let corr_matrix = corrcoef(&data.view(), "pearson").unwrap();
println!("Correlation matrix:\n{:?}", corr_matrix);

Dispersion Measures

use ndarray::array;
use scirs2_stats::{
    mean_abs_deviation, median_abs_deviation, iqr, data_range, coef_variation
};

let data = array![1.0, 2.0, 3.0, 4.0, 5.0, 100.0];  // Note the outlier

// Mean absolute deviation (from mean)
let mad = mean_abs_deviation(&data.view(), None).unwrap();
println!("Mean absolute deviation: {}", mad);

// Median absolute deviation (robust to outliers)
let median_ad = median_abs_deviation(&data.view(), None, None).unwrap();
println!("Median absolute deviation: {}", median_ad);

// Scaled median absolute deviation (to match std. dev. in normal distributions)
let scaled_mad = median_abs_deviation(&data.view(), None, Some(1.4826)).unwrap();
println!("Scaled median absolute deviation: {}", scaled_mad);

// Interquartile range (Q3 - Q1)
let iqr_val = iqr(&data.view(), None).unwrap();
println!("Interquartile range: {}", iqr_val);

// Range (max - min)
let range_val = data_range(&data.view()).unwrap();
println!("Range: {}", range_val);

// Coefficient of variation (std/mean)
let cv = coef_variation(&data.view(), 1).unwrap(); // 1 = sample
println!("Coefficient of variation: {}", cv);

Statistical Tests

use ndarray::{array, Array2};
use scirs2_stats::{
    ttest_1samp, ttest_ind, ttest_rel, kstest, shapiro, 
    shapiro_wilk, anderson_darling, dagostino_k2,
    wilcoxon, kruskal_wallis, friedman
};

// Parametric tests
let data = array![5.1, 4.9, 6.2, 5.7, 5.5, 5.1, 5.2, 5.0];
let (t_stat, p_value) = ttest_1samp(&data.view(), 5.0).unwrap();
println!("One-sample t-test: t={}, p={}", t_stat, p_value);

let group1 = array![5.1, 4.9, 6.2, 5.7, 5.5];
let group2 = array![4.8, 5.2, 5.1, 4.7, 4.9];
let (t_stat, p_value) = ttest_ind(&group1.view(), &group2.view(), true).unwrap();
println!("Two-sample t-test: t={}, p={}", t_stat, p_value);

// Normality tests
let (w_stat, p_value) = shapiro_wilk(&data.view()).unwrap();
println!("Shapiro-Wilk test: W={}, p={}", w_stat, p_value);

let (a2_stat, p_value) = anderson_darling(&data.view()).unwrap();
println!("Anderson-Darling test: A²={}, p={}", a2_stat, p_value);

// Non-parametric tests
let before = array![125.0, 115.0, 130.0, 140.0, 140.0];
let after = array![110.0, 122.0, 125.0, 120.0, 140.0];
let (w, p_value) = wilcoxon(&before.view(), &after.view(), "wilcox", true).unwrap();
println!("Wilcoxon signed-rank test: W={}, p={}", w, p_value);

// Kruskal-Wallis test for independent samples
let group3 = array![2.8, 3.4, 3.7, 2.2, 2.0];
let samples = vec![group1.view(), group2.view(), group3.view()];
let (h, p_value) = kruskal_wallis(&samples).unwrap();
println!("Kruskal-Wallis test: H={}, p={}", h, p_value);

// Friedman test for repeated measures
let repeated_data = array![
    [7.0, 9.0, 8.0],
    [6.0, 5.0, 7.0],
    [9.0, 7.0, 6.0],
    [8.0, 5.0, 6.0]
];
let (chi2, p_value) = friedman(&repeated_data.view()).unwrap();
println!("Friedman test: Chi²={}, p={}", chi2, p_value);

Random Number Generation

use scirs2_stats::random::{uniform, randn, randint, choice};
use ndarray::array;

// Generate uniform random numbers between 0 and 1
let uniform_samples = uniform(0.0, 1.0, 10, Some(42)).unwrap();

// Generate standard normal random numbers
let normal_samples = randn(10, Some(123)).unwrap();

// Generate random integers between 1 and 100
let int_samples = randint(1, 101, 5, Some(456)).unwrap();

// Randomly choose elements from an array
let options = array!["apple", "banana", "cherry", "date", "elderberry"];
let choices = choice(&options.view(), 3, false, None, Some(789)).unwrap();

Statistical Sampling

use scirs2_stats::sampling;
use ndarray::array;

// Create an array
let data = array![1.0, 2.0, 3.0, 4.0, 5.0];

// Generate bootstrap samples
let bootstrap_samples = sampling::bootstrap(&data.view(), 10, Some(42)).unwrap();

// Generate a random permutation
let permutation = sampling::permutation(&data.view(), Some(123)).unwrap();

Production Readiness

This release (0.1.0) represents a production-ready state with:

  • Comprehensive functionality: All core statistical operations implemented
  • Extensive testing: 280+ tests with 99.6% pass rate
  • API stability: Stable public API ready for production use
  • Performance optimized: Benchmarked against SciPy for competitive performance
  • Memory safe: Leverages Rust's ownership system for memory safety
  • Documentation complete: Comprehensive API documentation with examples

Roadmap to 1.0

The next major release (1.0.0) will focus on:

  • API stabilization and final polish
  • Additional optimization passes
  • Extended integration testing
  • Performance benchmarking suite

License

This project is dual-licensed under:

You can choose to use either license. See the LICENSE file for details.

See Also

Check the TODO.md file for future enhancements and development roadmap.