# Statify
A lightweight and versatile statistics library for Rust that provides essential statistical functions for data analysis.
## Features
- **Descriptive Statistics**: Mean, median, mode, variance, standard deviation (both sample and population)
- **Distribution Metrics**: Percentiles, quartiles, interquartile range (IQR)
- **Range Statistics**: Min, max, range, sum
- **Correlation Analysis**: Pearson correlation coefficient and covariance
- **Normalization**: Min-max normalization, standard normalization, custom range scaling
- **Linear Regression**: Simple linear regression with slope, intercept, R², and predictions
- **Normal Distribution**: Probability density function (PDF) and cumulative distribution function (CDF)
- **Advanced Metrics**: Skewness, kurtosis, coefficient of variation, standard error
- **Standardization**: Z-scores for individual values or entire datasets
- **Type Support**: Works with both `f64` and `f32` floating-point types
- **Error Handling**: Robust error handling with descriptive error types
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
statify = "0.1.0"
```
## Usage
The library extends `Vec<f64>` and `Vec<f32>` with the `Stats` trait, making it simple to calculate statistics on your data:
```rust
use statify::Stats;
fn main() {
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
// Descriptive statistics
let mean = data.mean().unwrap();
let median = data.median().unwrap();
let std_dev = data.std_dev().unwrap();
println!("Mean: {}", mean);
println!("Median: {}", median);
println!("Standard Deviation: {}", std_dev);
// Percentiles and quartiles
let q1 = data.quartile_1().unwrap();
let q3 = data.quartile_3().unwrap();
let iqr = data.iqr().unwrap();
println!("Q1: {}, Q3: {}, IQR: {}", q1, q3, iqr);
}
```
### Correlation and Covariance
```rust
use statify::{correlation, covariance};
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];
let corr = correlation(&x, &y).unwrap();
let cov = covariance(&x, &y).unwrap();
println!("Correlation: {}", corr);
println!("Covariance: {}", cov);
```
### Z-Scores
```rust
use statify::{z_score, z_scores, Stats};
// Single value z-score
let score = z_score(75.0, 50.0, 10.0).unwrap();
println!("Z-score: {}", score);
// Z-scores for entire dataset
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let scores = z_scores(&data).unwrap();
println!("Z-scores: {:?}", scores);
```
### Normalization
```rust
use statify::{normalize_min_max, normalize_standard, normalize_range};
let data = vec![10.0, 20.0, 30.0, 40.0, 50.0];
// Min-max normalization (0 to 1)
let normalized = normalize_min_max(&data).unwrap();
// Standard normalization (z-scores)
let standardized = normalize_standard(&data).unwrap();
// Custom range normalization (-1 to 1)
let custom = normalize_range(&data, -1.0, 1.0).unwrap();
```
### Linear Regression
```rust
use statify::linear_regression;
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.1, 3.9, 6.2, 7.8, 10.1];
let result = linear_regression(&x, &y).unwrap();
println!("Slope: {}", result.slope);
println!("Intercept: {}", result.intercept);
println!("R²: {}", result.r_squared);
// Make predictions
let prediction = result.predict(6.0);
println!("Predicted y for x=6: {}", prediction);
```
### Normal Distribution
```rust
use statify::{normal_pdf, normal_cdf, standard_normal_pdf, standard_normal_cdf};
// Custom normal distribution (mean=100, std_dev=15)
let pdf = normal_pdf(100.0, 100.0, 15.0).unwrap();
let cdf = normal_cdf(115.0, 100.0, 15.0).unwrap();
// Standard normal distribution (mean=0, std_dev=1)
let std_pdf = standard_normal_pdf(0.0);
let std_cdf = standard_normal_cdf(1.96);
println!("Standard normal CDF at 1.96: {}", std_cdf); // ~0.975
```
### Advanced Metrics
```rust
use statify::{skewness, kurtosis, coefficient_of_variation, standard_error};
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
let skew = skewness(&data).unwrap();
let kurt = kurtosis(&data).unwrap();
let cv = coefficient_of_variation(&data).unwrap();
let se = standard_error(&data).unwrap();
println!("Skewness: {}", skew);
println!("Kurtosis: {}", kurt);
println!("Coefficient of Variation: {}%", cv);
println!("Standard Error: {}", se);
```
## API Overview
### Trait Methods (Stats)
All methods return a `StatsResult<T>` which handles errors gracefully:
- `mean()` - Arithmetic mean
- `median()` - Middle value when sorted
- `mode()` - Most frequent values
- `variance()` - Sample variance
- `std_dev()` - Sample standard deviation
- `variance_pop()` - Population variance
- `std_dev_pop()` - Population standard deviation
- `min()` - Minimum value
- `max()` - Maximum value
- `range()` - Difference between max and min
- `sum()` - Sum of all values
- `percentile(p)` - Value at the p-th percentile
- `quartile_1()` - 25th percentile
- `quartile_3()` - 75th percentile
- `iqr()` - Interquartile range (Q3 - Q1)
### Standalone Functions
**Correlation & Covariance**
- `correlation(x, y)` - Pearson correlation coefficient
- `covariance(x, y)` - Covariance between two datasets
**Normalization**
- `normalize_min_max(data)` - Min-max normalization (0 to 1)
- `normalize_standard(data)` - Standard normalization (z-scores)
- `normalize_range(data, min, max)` - Normalize to custom range
**Linear Regression**
- `linear_regression(x, y)` - Returns `LinearRegressionResult` with:
- `slope` - Regression line slope
- `intercept` - Y-intercept
- `r_squared` - Coefficient of determination
- `predict(x)` - Predict y for given x
- `predict_many(x_values)` - Predict multiple values
**Normal Distribution**
- `normal_pdf(x, mean, std_dev)` - Probability density function
- `normal_cdf(x, mean, std_dev)` - Cumulative distribution function
- `standard_normal_pdf(x)` - Standard normal PDF (μ=0, σ=1)
- `standard_normal_cdf(x)` - Standard normal CDF (μ=0, σ=1)
**Standardization**
- `z_score(value, mean, std_dev)` - Standard score for a single value
- `z_scores(data)` - Standard scores for all values in a dataset
**Advanced Metrics**
- `standard_error(data)` - Standard error of the mean
- `coefficient_of_variation(data)` - CV expressed as percentage
- `skewness(data)` - Measure of distribution asymmetry
- `kurtosis(data)` - Measure of distribution tailedness (excess kurtosis)
## Error Handling
The library uses a custom `StatsError` enum for error handling:
- `EmptyDataset` - Dataset is empty
- `InsufficientData` - Not enough data for the operation
- `DivisionByZero` - Division by zero would occur
All statistical functions return `StatsResult<T>` which is a `Result<T, StatsError>`.
## License
MIT
## Contributing
Contributions are welcome. Please ensure tests pass before submitting pull requests.