StatOxide: High-performance statistical computing in Rust with Python bindings
StatOxide is a modern, high-performance statistical computing library written in Rust, with comprehensive Python bindings via PyO3. Designed for data scientists, statisticians, and researchers who need both performance and productivity.
๐ Features
๐ Core Data Structures
- Series: Columnar data with metadata (name, dtype, levels)
- DataFrame: Tabular data structure with column operations
- Formula: R-style formula parsing for model specification
๐ Statistical Functions
- Descriptive Statistics: Mean, variance, skewness, kurtosis, quantiles
- Probability Distributions: 12 continuous + 6 discrete distributions
- Statistical Tests: t-test, chi-square, ANOVA, correlation tests
- Correlation Measures: Pearson, Spearman, Kendall tau
๐งฎ Statistical Models
- Linear Models: OLS, Ridge, Lasso, Elastic Net with proper inference
- Generalized Linear Models: Logistic, Poisson, Gamma, Negative Binomial regression
- Mixed Effects Models: Linear and GLMMs with EM algorithm estimation
- Robust Statistics: M-estimators, S-estimators, MM-estimators
- Nonparametric Methods: Kernel regression, local regression, smoothing splines
๐ Time Series Analysis
- Core Structures:
TimeSerieswith datetime indexing - ARIMA Models: AR, MA, ARMA, ARIMA, SARIMA
- GARCH Models: ARCH, GARCH for volatility modeling
- Decomposition: STL, moving averages, Hodrick-Prescott filter
- Forecasting: Point forecasts, prediction intervals
๐ ๏ธ Utilities
- Linear Algebra: Matrix operations, solvers, decompositions
- Random Generation: Distributions, bootstrap, train-test split
- Data Validation: Type checking, missing value detection
- Numerical Methods: Softmax, standardization, normalization
๐ Python API
StatOxide provides a complete Python interface through PyO3 bindings:
# Core data structures
=
=
# Statistical functions
=
# Formula parsing
=
# Models
=
# Mixed effects models
=
# Time series
=
# Utilities
, =
๐๏ธ Architecture
StatOxide is organized as a multi-crate Rust workspace:
statoxide/
โโโ Cargo.toml # Workspace configuration
โโโ crates/
โ โโโ so-core/ # Core data structures & formula parsing
โ โโโ so-linalg/ # Linear algebra abstraction
โ โโโ so-stats/ # Statistical functions & distributions
โ โโโ so-models/ # Statistical models (regression, GLM, mixed effects, etc.)
โ โโโ so-tsa/ # Time series analysis
โ โโโ so-utils/ # Utility functions
โ โโโ so-python/ # Python bindings (PyO3)
โโโ assets/logo.png # Project logo
โโโ LICENSE-MIT # MIT license
โโโ LICENSE-APACHE-2.0 # Apache 2.0 license
๐ฆ Installation
Prerequisites
- Rust Toolchain:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh - Python Development Files:
- Ubuntu/Debian:
sudo apt-get install python3-dev python3.11-dev - macOS:
brew install python@3.11
- Ubuntu/Debian:
- Maturin (recommended):
pip install maturin
Building from Source
# Clone the repository
# Build Python bindings with maturin
# or
Direct Cargo Build
The shared library will be at target/release/libso_python.so.
๐งช Testing
Rust Tests
Python Tests
After installation:
๐ Documentation
- API Reference: Run
cargo doc --all --no-deps --openfor Rust documentation - Python Docstrings: All Python functions include detailed docstrings
- Examples: See
crates/so-python/test_api.pyfor usage examples
๐ฏ Design Principles
- Performance: Leverage Rust's zero-cost abstractions and LLVM optimizations
- Safety: Memory safety guarantees without garbage collection
- Interoperability: Seamless Python integration with minimal overhead
- Modularity: Independent crates for clear separation of concerns
- API Consistency: Familiar interfaces inspired by R, pandas, and statsmodels
๐ง Development Status
| Module | Status | Notes |
|---|---|---|
| so-core | โ Complete | Data structures, formula parsing |
| so-linalg | โ Complete | Linear algebra abstraction |
| so-stats | โ Complete | Statistical functions & distributions |
| so-models | โ Complete | Regression, GLM, mixed effects, robust, nonparametric |
| so-tsa | โ Complete | ARIMA, GARCH, decomposition, forecasting |
| so-utils | โ Complete | Random generation, validation, numerical methods |
| so-python | โ Complete | Full Python bindings implemented |
๐ License
StatOxide is dual-licensed under both:
- MIT License: See LICENSE-MIT for details
- Apache License 2.0: See LICENSE-APACHE-2.0 for details
You may use StatOxide under either license at your option.
๐ Acknowledgments
- R and statsmodels for statistical API inspiration
- pandas for DataFrame design patterns
- PyO3 team for excellent Rust-Python interop
- ndarray and faer for numerical computing foundations
๐ค Contributing
Contributions are welcome!
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
cargo test --all - Submit a pull request
๐ Support
- Issues: GitHub Issues
- Repository: GitHub Repository