numrs2 0.3.3 - Docs.rs

# NumRS Statistical Functions

The NumRS statistics module provides a rich set of statistical functions for data analysis. These functions include descriptive statistics, correlation analysis, histograms, and various methods for analyzing distributions.

## Key Features

- Descriptive statistics (mean, variance, std, min, max)
- Correlation and covariance calculations
- Histogram generation (1D and 2D)
- Percentile and quantile calculations
- Binning and counting operations
- Weighted statistics
- Array-based API similar to NumPy's statistics functions

## Basic Usage

### Descriptive Statistics

```rust
use numrs2::prelude::*;

// Create a sample array
let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);

// Basic statistics using the Statistics trait
let mean = data.mean();          // 3.0
let variance = data.var();       // 2.0
let std_dev = data.std();        // 1.414...
let minimum = data.min();        // 1.0
let maximum = data.max();        // 5.0

// Peak-to-peak range
let range = ptp(&data, None)?;   // 4.0
```

### Percentiles and Quantiles

```rust
use numrs2::stats::{percentile, quantile};

// Create data array
let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);

// Calculate percentiles (0-100 scale)
let p_points = Array::from_vec(vec![0.0, 25.0, 50.0, 75.0, 100.0]);
let p_values = percentile(&data, &p_points, Some("linear"))?;
// -> [1.0, 2.0, 3.0, 4.0, 5.0]

// Calculate quantiles (0-1 scale)
let q_points = Array::from_vec(vec![0.0, 0.25, 0.5, 0.75, 1.0]);
let q_values = quantile(&data, &q_points, Some("linear"))?;
// -> [1.0, 2.0, 3.0, 4.0, 5.0]

// Different interpolation methods are available:
// - "linear": Linear interpolation between points
// - "lower": Use the lower data point
// - "higher": Use the higher data point
// - "nearest": Use the nearest data point
// - "midpoint": Use the midpoint between adjacent data points
```

### Correlation and Covariance

```rust
use numrs2::stats::{cov, corrcoef};

// Create two data arrays
let x = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
let y = Array::from_vec(vec![5.0, 4.0, 3.0, 2.0, 1.0]);

// Calculate covariance
let covariance = cov(&x, &y)?;  // -2.5

// Calculate correlation coefficient
let correlation = corrcoef(&x, &y)?;  // -1.0 (perfect negative correlation)
```

### Histograms

```rust
use numrs2::stats::{histogram, histogram2d};

// Create sample data
let data = Array::from_vec(vec![1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]);

// Create a histogram with 5 bins
let (hist, bin_edges) = histogram(&data, 5, None, None)?;

// Create a 2D histogram
let x = Array::from_vec(vec![1.0, 2.0, 2.5, 3.0, 4.0, 5.0]);
let y = Array::from_vec(vec![2.0, 3.0, 3.5, 4.0, 5.0, 6.0]);

// Create a 2D histogram with 3x3 bins
let (hist2d, x_edges, y_edges) = histogram2d(&x, &y, 3, None, None)?;
```

### Binning and Counting

```rust
use numrs2::stats::{bincount, digitize};

// Count occurrences of integers
let data = Array::from_vec(vec![0.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0]);
let counts = bincount(&data, None, Some(5))?;
// -> [1, 2, 3, 4, 0]

// Assign values to bins
let values = Array::from_vec(vec![0.2, 1.4, 2.5, 3.7, 4.8]);
let bins = Array::from_vec(vec![0.0, 1.0, 2.0, 3.0, 4.0, 5.0]);
let indices = digitize(&values, &bins, Some(false))?;
// Assigns each value to a bin
```

### Weighted Statistics

```rust
use numrs2::stats::average;

// Create data and weights
let data = Array::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0]);
let weights = Array::from_vec(vec![5.0, 4.0, 3.0, 2.0, 1.0]);

// Calculate weighted average
let weighted_avg = average(&data, Some(&weights), None, None)?;
// Regular average for comparison
let avg = average(&data, None, None, None)?;

// Create weighted histogram
let (hist, bin_edges) = histogram(&data, 5, None, Some(&weights))?;
```

## Examples

For detailed examples, see:
- `statistics_example.rs`: Comprehensive demonstration of statistical functions

## Implementation Details

- The statistical functions in NumRS are designed to be numerically stable
- Functions support various data types through generic implementations
- Most functions include optional parameters for customization
- Where applicable, functions can operate along specific array axes
- The implementation aims to match NumPy's behavior for compatibility

## Performance Notes

- Functions are optimized for moderate-sized datasets
- For very large datasets, consider using parallel processing with the `parallel_optimize` module
- Some statistical operations require sorting, which can affect performance for large arrays
- Weighted statistics may have additional overhead compared to unweighted operations