Crate hoeffding_integer_d[][src]

Expand description

Hoeffding Dependence Coefficient is good at finding associations, even in many non-linear situations. For genetic algorithms it can characterize fitness, especially where Pearson’s correlation R strongly promotes linear solutions in nonlinear problems. This integer version of Hoeffding uses integer representation of half and quarter matched rankings. Integer use was inspired by the observation that the original normalized function contains a denominator Pochhammer and progress indicating digits could vanish with floating point conversion for large numbers of pairs (~ n>1585 with f64’s).

Late 1950’s version of Hoeffding’s Dependence Coefficient i = individual index for first, second, third, fourth, fifth… pairs in two lists. (imagine a list of X and Y plotted pixel screen dots) n = total number of pairs, n must be 5 or greater and n for first and second lists must be the same.
Ri = each rank of each element in the first list (rankings may have ties/duplicates)
Si = each rank of each element in the second list (rankings may have ties/duplicates) Qi = Bivariate rank of Ri & Si (rankings may have partial ties one axis or duplicates)

D = 30 * {(n-2)(n-3) Sum[ (Qi-1)(Qi-2),{i,1,n} ] + Sum[(Ri-1)(Ri-2)(Si-1)(Si-2), {i,1,n}] - 2*(n-2) * Sum[(Ri-2)(Si-2)(Qi-1), {i,1,n}] } )/(n(n-1)(n-2)(n-3)(n-4)))

The “Hoeffding Integer D” value presented here is the original Hoeffding Dependence coefficent, multiplied by 256/30 * (n)(n-1)(n-2)(n-3)(n-4)
D_Hoeffding_Integer = 256 {Sum[ (Qi-1)(Qi-2) ] + Sum[(Ri-1)(Ri-2)(Si-1)(Si-2)] - 2
(n-2) * Sum[(Ri-2)(Si-2)(Qi-1)]}

Minimun and Maximum possible “Hoeffding Integer D” values are offered for a sense of scale.

Please forgive that I did not implement Blum Kieffer and Rosenblatt’s 1961 paper or the 2017 “Simplified vs. Easier” papers by Zheng or Pelekis to turn the raw value into a precise probability.
Oye Thar be dragons. The intent for “Hoeffding Integer” use in machine learning is that larger values equal greater probability of associations even if the scale has skew.

Cheers and enjoy Rust language! Dustan Doud (September 2021) PS: the math is documented with comments, although 256 above is a design choice because I have represented integer Qi and Ri and Si in quarters so (Ri-1)(Ri-2)(Si-1)(Si-2) becomes –> 4(Ri-1) * 4(Ri-2) * 4(Si-1) * 4(Si-2) and 4 x 4 x 4 x 4 = 256.

Functions

Hoeffding Dependence Coefficient as Integer - a reasonable way to assess fitness scores to models when their linearity is not known. Use:
// let list1:Vec<&str> = vec![“a”,“b”,“c”,“d”,“e”,“f”,“f”,“g”,“g”,“h”]; // let list2:Vec = vec![ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; // let association_correlation_connection_relationship_or_dependence_of_two_lists = hoeffding_integer_d(list1, list2); //bigger values are more coupled Sometimes you need to compare apples and oranges. Generics let that happen.

hoeffding_integer_maximum returns the minimum dependence value for the integer form of toeffding statistic