Function pearson_correlation

Source
pub fn pearson_correlation(a: &[f32], b: &[f32]) -> f32
Expand description

Computes the Pearson correlation coefficient between two vectors a and b.

Two equivalent formulas:

  1. Using deviations from mean (implemented here for better numerical stability): $r = \frac{\sum(x - \bar{x})(y - \bar{y})}{\sqrt{\sum(x - \bar{x})^2\sum(y - \bar{y})^2}}$

  2. Direct computation: $r = \frac{n\sum xy - \sum x\sum y}{\sqrt{(n\sum x^2 - (\sum x)^2)(n\sum y^2 - (\sum y)^2)}}$

where $\bar{x}$ and $\bar{y}$ are the means of vectors $x$ and $y$ respectively, and $n$ is the length of the vectors.

Note: Formula 1 is used in this implementation because it:

  • Reduces the risk of numerical overflow by centering the data
  • Provides better numerical stability for large values

§Arguments

  • a - The first vector.
  • b - The second vector.

§Returns

The Pearson correlation coefficient between a and b. If either vector is empty or their lengths do not match, returns NaN.

§Examples

let a = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
let b = [10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0];
let correlation = nwr::pearson_correlation(&a, &b);
assert_eq!(format!("{:.4}", correlation), "-1.0000".to_string()); // Perfect negative correlation

let empty: [f32; 0] = [];
assert!(nwr::pearson_correlation(&empty, &empty).is_nan()); // Check handling of empty vectors