Expand description
This crate provides a reasonably efficient implementation of Sourav Chatterjee’s xi-correlation coefficient, based on the original paper.
Chatterjee’s xi provides a measure of one variable’s dependence on another
in a much more general sense than, for example, Pearson’s correlation
coefficient. Suppose we have some sequence of random x values uniformly
distributed from zero to tau. For each one, we compute y = sin(x).
Pearson’s correlation coefficient will be roughly zero for this data, as it
measures linear dependence. On the other hand, Chatterjee’s xi will be
close to 1, representing that y is strongly a function of x, regardless
of what function that may be.
§Highlights
- Extremely simple to use (just call
xicor(),xicorf(), etc, with two slices containing the data) - Generic over
Ord, as xi does not require calculations on the elements themselves, only the ability to compare them. In principle even strings could be correlated in this manner (lexicographically), for example. - Quite fast. In release mode on a 12-year-old machine (Dell M4700),
xicorfwas able to process 1,000,000 pairs in 0.33 seconds. Profiling revealed that 80% of this calculation lay in the standard library’s sorting routines.
§Progress
- Calculation of the xi coefficient itself
- P-values for testing independence
Functions§
- xicor
- Calculate the xi-correlation of two sequences whose values are orderable
(they implement
Ord). - xicor_
norm - Calculate the normalised xi-correlation of two sequences whose values are
orderable (they implement
Ord). - xicorf
- Calculate the xi-correlation of two floating-point sequences.
- xicorf_
norm - Calculate the normalised xi-correlation of two floating-point sequences.