rs-stats

A comprehensive statistical library written in Rust, providing powerful tools for probability, distributions, and hypothesis testing.
rs-stats offers a broad range of statistical functionality implemented in pure Rust. It's designed to be intuitive, efficient, and reliable for both simple and complex statistical analysis. The library aims to provide a comprehensive set of tools for data scientists, researchers, and developers working with statistical models.
🎯 Key Features
- Panic-Free Error Handling: All functions return
StatsResult<T> instead of panicking, making the library production-ready and safe
- Comprehensive Error Types: Custom
StatsError enum provides detailed error information for all failure cases
- Type-Safe: Leverages Rust's type system for compile-time safety
Features
Installation
Add rs-stats to your Cargo.toml:
[dependencies]
rs-stats = "2.0.0"
Or use cargo add:
cargo add rs-stats
Usage Examples
Basic Statistical Functions
use rs_stats::prob::{average, variance, population_std_dev, std_err};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let mean = average(&data)?;
let var = variance(&data)?;
let std_dev = population_std_dev(&data)?;
let std_error = std_err(&data)?;
println!("Mean: {}", mean);
println!("Variance: {}", var);
println!("Standard Deviation: {}", std_dev);
println!("Standard Error: {}", std_error);
Ok(())
}
Working with Distributions
use rs_stats::distributions::normal_distribution::{normal_pdf, normal_cdf, normal_inverse_cdf};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let x = 1.96;
let density = normal_pdf(x, 0.0, 1.0)?;
println!("PDF at {}: {}", x, density);
let cumulative = normal_cdf(x, 0.0, 1.0)?;
println!("CDF at {}: {}", x, cumulative);
let p = 0.975;
let quantile = normal_inverse_cdf(p, 0.0, 1.0)?;
println!("{}th percentile: {}", p * 100.0, quantile);
Ok(())
}
Hypothesis Testing
use rs_stats::hypothesis_tests::t_test::{one_sample_t_test, two_sample_t_test};
use rs_stats::hypothesis_tests::chi_square_test::{chi_square_goodness_of_fit, chi_square_independence};
use rs_stats::hypothesis_tests::anova::one_way_anova;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let sample = vec![5.1, 5.2, 4.9, 5.0, 5.3];
let result = one_sample_t_test(&sample, 5.0)?;
println!("One-sample t-test p-value: {}", result.p_value);
let sample1 = vec![5.1, 5.2, 4.9, 5.0, 5.3];
let sample2 = vec![4.8, 4.9, 5.0, 4.7, 4.9];
let result = two_sample_t_test(&sample1, &sample2, true)?;
println!("Two-sample t-test p-value: {}", result.p_value);
let groups = vec![
vec![5.1, 5.2, 4.9, 5.0, 5.3],
vec![4.8, 4.9, 5.0, 4.7, 4.9],
vec![5.2, 5.3, 5.1, 5.4, 5.2],
];
let groups_refs: Vec<&[f64]> = groups.iter().map(|g| g.as_slice()).collect();
let result = one_way_anova(&groups_refs)?;
println!("ANOVA p-value: {}", result.p_value);
let observed = vec![
vec![45, 55],
vec![60, 40],
];
let (chi_sq, df, p_value) = chi_square_independence(&observed)?;
println!("Chi-square independence test p-value: {}", p_value);
Ok(())
}
Regression Analysis
use rs_stats::regression::linear_regression::LinearRegression;
use rs_stats::regression::multiple_linear_regression::MultipleLinearRegression;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];
let mut model = LinearRegression::new();
model.fit(&x, &y)?;
println!("Slope: {}", model.slope);
println!("Intercept: {}", model.intercept);
println!("R-squared: {}", model.r_squared);
let prediction = model.predict(6.0);
println!("Prediction for x=6: {}", prediction);
match model.confidence_interval(6.0, 0.95) {
Ok((lower, upper)) => {
println!("95% confidence interval: ({}, {})", lower, upper);
}
Err(e) => {
println!("Could not calculate confidence interval: {}", e);
}
}
let x_multi = vec![
vec![1.0, 2.0], vec![2.0, 1.0], vec![3.0, 3.0], vec![4.0, 2.0], ];
let y_multi = vec![9.0, 8.0, 16.0, 15.0];
let mut multi_model = MultipleLinearRegression::new();
multi_model.fit(&x_multi, &y_multi)?;
println!("Coefficients: {:?}", multi_model.coefficients);
println!("R-squared: {}", multi_model.r_squared);
println!("Adjusted R-squared: {}", multi_model.adjusted_r_squared);
let new_observation = vec![5.0, 4.0];
let prediction = multi_model.predict(&new_observation);
println!("Prediction for new observation: {}", prediction);
multi_model.save("model.json")?;
let loaded_model = MultipleLinearRegression::load("model.json")?;
Ok(())
}
Decision Trees
use rs_stats::regression::decision_tree::{DecisionTree, TreeType, SplitCriterion};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut recovery_time_tree = DecisionTree::<f64, f64>::new(
TreeType::Regression,
SplitCriterion::Mse,
5, 2, 1 );
let patient_features = vec![
vec![45.0, 3.0, 28.5, 2.0, 7.0], vec![62.0, 4.0, 31.2, 3.0, 8.0], vec![38.0, 2.0, 24.3, 1.0, 5.0], ];
let recovery_days = vec![14.0, 28.0, 10.0];
recovery_time_tree.fit(&patient_features, &recovery_days)?;
let new_patient = vec![
vec![55.0, 3.0, 27.0, 2.0, 6.0], ];
let predicted_recovery_days = recovery_time_tree.predict(&new_patient)?;
println!("Predicted recovery days: {:?}", predicted_recovery_days);
let mut diabetes_risk_tree = DecisionTree::<u8, f64>::new(
TreeType::Classification,
SplitCriterion::Gini,
4, 2, 1 );
let medical_features = vec![
vec![85.0, 22.0, 120.0, 35.0, 0.0], vec![140.0, 31.0, 145.0, 52.0, 1.0], vec![165.0, 34.0, 155.0, 48.0, 1.0], ];
let diabetes_status = vec![0, 1, 1];
diabetes_risk_tree.fit(&medical_features, &diabetes_status)?;
println!("Tree Structure:\n{}", diabetes_risk_tree.tree_structure());
println!("Tree Summary:\n{}", diabetes_risk_tree.summary());
let importance = diabetes_risk_tree.feature_importances();
println!("Feature Importance: {:?}", importance);
Ok(())
}
The Decision Tree implementation supports:
- Both regression and classification tasks
- Multiple split criteria (MSE, MAE for regression; Gini, Entropy for classification)
- Generic types with appropriate trait bounds
- Parallel processing for optimal performance
- Tree visualization and interpretation tools
- Feature importance calculation
Error Handling
rs-stats uses a custom error handling system that makes the library panic-free and production-ready. All functions return StatsResult<T>, which is a type alias for Result<T, StatsError>.
Error Types
The StatsError enum provides detailed error information:
use rs_stats::{StatsError, StatsResult};
fn analyze_data(data: &[f64]) -> StatsResult<f64> {
let mean = rs_stats::prob::average(data)?; let variance = rs_stats::prob::variance(data)?;
Ok(mean + variance)
}
match analyze_data(&vec![]) {
Ok(result) => println!("Result: {}", result),
Err(StatsError::EmptyData { message }) => {
println!("Error: {}", message);
}
Err(StatsError::ConversionError { message }) => {
println!("Conversion error: {}", message);
}
Err(e) => println!("Other error: {}", e),
}
Common Error Variants
InvalidInput: Invalid input parameters
ConversionError: Type conversion failures
EmptyData: Empty data arrays
DimensionMismatch: Mismatched array dimensions
NumericalError: Numerical computation errors
NotFitted: Model not fitted before prediction
InvalidParameter: Invalid parameter values
IndexOutOfBounds: Array index out of bounds
MathematicalError: Mathematical operation errors
All errors implement std::error::Error and can be easily converted to strings for logging or user-facing messages.
Documentation
For detailed API documentation, run:
cargo doc --open
Testing
The library includes a comprehensive test suite. Run the tests with:
cargo test
Contributing
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin feature/my-new-feature
- Submit a pull request
Before submitting your PR, please make sure:
- All tests pass
- Code follows the project's style and conventions
- New features include appropriate documentation and tests
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- The Rust community for their excellent documentation and support
- Contributors to the project
- Various statistical references and research papers that informed the implementations