statify 0.2.0

A lightweight and minimal statistics library for Rust
Documentation
# Statify


A lightweight and versatile statistics library for Rust that provides essential statistical functions for data analysis.

## Features


- **Descriptive Statistics**: Mean, median, mode, variance, standard deviation (both sample and population)
- **Distribution Metrics**: Percentiles, quartiles, interquartile range (IQR)
- **Range Statistics**: Min, max, range, sum
- **Correlation Analysis**: Pearson correlation coefficient and covariance
- **Normalization**: Min-max normalization, standard normalization, custom range scaling
- **Linear Regression**: Simple linear regression with slope, intercept, R², and predictions
- **Normal Distribution**: Probability density function (PDF) and cumulative distribution function (CDF)
- **Advanced Metrics**: Skewness, kurtosis, coefficient of variation, standard error
- **Standardization**: Z-scores for individual values or entire datasets
- **Type Support**: Works with both `f64` and `f32` floating-point types
- **Error Handling**: Robust error handling with descriptive error types

## Installation


Add this to your `Cargo.toml`:

```toml
[dependencies]
statify = "0.1.0"
```

## Usage


The library extends `Vec<f64>` and `Vec<f32>` with the `Stats` trait, making it simple to calculate statistics on your data:

```rust
use statify::Stats;

fn main() {
    let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
    
    // Descriptive statistics
    let mean = data.mean().unwrap();
    let median = data.median().unwrap();
    let std_dev = data.std_dev().unwrap();
    
    println!("Mean: {}", mean);
    println!("Median: {}", median);
    println!("Standard Deviation: {}", std_dev);
    
    // Percentiles and quartiles
    let q1 = data.quartile_1().unwrap();
    let q3 = data.quartile_3().unwrap();
    let iqr = data.iqr().unwrap();
    
    println!("Q1: {}, Q3: {}, IQR: {}", q1, q3, iqr);
}
```

### Correlation and Covariance


```rust
use statify::{correlation, covariance};

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];

let corr = correlation(&x, &y).unwrap();
let cov = covariance(&x, &y).unwrap();

println!("Correlation: {}", corr);
println!("Covariance: {}", cov);
```

### Z-Scores


```rust
use statify::{z_score, z_scores, Stats};

// Single value z-score
let score = z_score(75.0, 50.0, 10.0).unwrap();
println!("Z-score: {}", score);

// Z-scores for entire dataset
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let scores = z_scores(&data).unwrap();
println!("Z-scores: {:?}", scores);
```

### Normalization


```rust
use statify::{normalize_min_max, normalize_standard, normalize_range};

let data = vec![10.0, 20.0, 30.0, 40.0, 50.0];

// Min-max normalization (0 to 1)
let normalized = normalize_min_max(&data).unwrap();

// Standard normalization (z-scores)
let standardized = normalize_standard(&data).unwrap();

// Custom range normalization (-1 to 1)
let custom = normalize_range(&data, -1.0, 1.0).unwrap();
```

### Linear Regression


```rust
use statify::linear_regression;

let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.1, 3.9, 6.2, 7.8, 10.1];

let result = linear_regression(&x, &y).unwrap();

println!("Slope: {}", result.slope);
println!("Intercept: {}", result.intercept);
println!("R²: {}", result.r_squared);

// Make predictions
let prediction = result.predict(6.0);
println!("Predicted y for x=6: {}", prediction);
```

### Normal Distribution


```rust
use statify::{normal_pdf, normal_cdf, standard_normal_pdf, standard_normal_cdf};

// Custom normal distribution (mean=100, std_dev=15)
let pdf = normal_pdf(100.0, 100.0, 15.0).unwrap();
let cdf = normal_cdf(115.0, 100.0, 15.0).unwrap();

// Standard normal distribution (mean=0, std_dev=1)
let std_pdf = standard_normal_pdf(0.0);
let std_cdf = standard_normal_cdf(1.96);

println!("Standard normal CDF at 1.96: {}", std_cdf); // ~0.975
```

### Advanced Metrics


```rust
use statify::{skewness, kurtosis, coefficient_of_variation, standard_error};

let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];

let skew = skewness(&data).unwrap();
let kurt = kurtosis(&data).unwrap();
let cv = coefficient_of_variation(&data).unwrap();
let se = standard_error(&data).unwrap();

println!("Skewness: {}", skew);
println!("Kurtosis: {}", kurt);
println!("Coefficient of Variation: {}%", cv);
println!("Standard Error: {}", se);
```

## API Overview


### Trait Methods (Stats)


All methods return a `StatsResult<T>` which handles errors gracefully:

- `mean()` - Arithmetic mean
- `median()` - Middle value when sorted
- `mode()` - Most frequent values
- `variance()` - Sample variance
- `std_dev()` - Sample standard deviation
- `variance_pop()` - Population variance
- `std_dev_pop()` - Population standard deviation
- `min()` - Minimum value
- `max()` - Maximum value
- `range()` - Difference between max and min
- `sum()` - Sum of all values
- `percentile(p)` - Value at the p-th percentile
- `quartile_1()` - 25th percentile
- `quartile_3()` - 75th percentile
- `iqr()` - Interquartile range (Q3 - Q1)

### Standalone Functions


**Correlation & Covariance**
- `correlation(x, y)` - Pearson correlation coefficient
- `covariance(x, y)` - Covariance between two datasets

**Normalization**
- `normalize_min_max(data)` - Min-max normalization (0 to 1)
- `normalize_standard(data)` - Standard normalization (z-scores)
- `normalize_range(data, min, max)` - Normalize to custom range

**Linear Regression**
- `linear_regression(x, y)` - Returns `LinearRegressionResult` with:
  - `slope` - Regression line slope
  - `intercept` - Y-intercept
  - `r_squared` - Coefficient of determination
  - `predict(x)` - Predict y for given x
  - `predict_many(x_values)` - Predict multiple values

**Normal Distribution**
- `normal_pdf(x, mean, std_dev)` - Probability density function
- `normal_cdf(x, mean, std_dev)` - Cumulative distribution function
- `standard_normal_pdf(x)` - Standard normal PDF (μ=0, σ=1)
- `standard_normal_cdf(x)` - Standard normal CDF (μ=0, σ=1)

**Standardization**
- `z_score(value, mean, std_dev)` - Standard score for a single value
- `z_scores(data)` - Standard scores for all values in a dataset

**Advanced Metrics**
- `standard_error(data)` - Standard error of the mean
- `coefficient_of_variation(data)` - CV expressed as percentage
- `skewness(data)` - Measure of distribution asymmetry
- `kurtosis(data)` - Measure of distribution tailedness (excess kurtosis)

## Error Handling


The library uses a custom `StatsError` enum for error handling:

- `EmptyDataset` - Dataset is empty
- `InsufficientData` - Not enough data for the operation
- `DivisionByZero` - Division by zero would occur

All statistical functions return `StatsResult<T>` which is a `Result<T, StatsError>`.

## License


MIT

## Contributing


Contributions are welcome. Please ensure tests pass before submitting pull requests.