Skip to main content

Crate logp

Crate logp 

Source
Expand description

§logp

Information theory primitives: entropies and divergences.

§Scope

Scalar information measures that appear across clustering, ranking, evaluation, and geometry:

  • Shannon entropy and cross-entropy
  • KL / Jensen–Shannon divergences
  • Csiszár (f)-divergences (a.k.a. information monotone divergences)
  • Bhattacharyya coefficient, Rényi/Tsallis families
  • Bregman divergences (convex-analytic, not generally monotone)

§Distances vs divergences (terminology that prevents bugs)

A divergence (D(p:q)) is usually required to satisfy:

  • (D(p:q) \ge 0)
  • (D(p:p) = 0)

but it is typically not symmetric and not a metric (no triangle inequality). Many failures in downstream code are caused by treating a divergence as a distance metric.

§Key invariants (what tests should enforce)

  • Jensen–Shannon is bounded on the simplex: (0 \le JS(p,q) \le \ln 2) (nats).
  • Csiszár (f)-divergences are monotone under coarse-graining (Markov kernels): merging bins cannot increase the divergence.

§Further reading

§Taxonomy of Divergences (Nielsen)

FamilyGeneratorKey Property
f-divergencesConvex (f(t)) with (f(1)=0)Monotone under Markov morphisms (coarse-graining)
BregmanConvex (F(x))Dually flat geometry; generalized Pythagorean theorem
Jensen-Shannon(f)-div + metricSymmetric, bounded ([0, \ln 2]), (\sqrt{JS}) is a metric
Alpha(\rho_\alpha = \int p^\alpha q^{1-\alpha})Encodes Rényi, Tsallis, Bhattacharyya, Hellinger

§References

  • Shannon (1948). “A Mathematical Theory of Communication”
  • Cover & Thomas (2006). “Elements of Information Theory”

Structs§

NegEntropy
Negative-entropy Bregman generator: (F(x) = \sum_i x_i \ln x_i), (\nabla F(x)_i = 1 + \ln x_i).
SquaredL2
Squared Euclidean Bregman generator: (F(x)=\tfrac12|x|_2^2), (\nabla F(x)=x).

Enums§

Error
Errors for information-measure computations.
KsgVariant
Algorithm variant for KSG estimator.

Traits§

BregmanGenerator
Bregman generator: a convex function (F) and its gradient.

Functions§

amari_alpha_divergence
Amari alpha-divergence: a one-parameter family from information geometry that continuously interpolates between forward KL, reverse KL, and squared Hellinger.
bhattacharyya_coeff
Bhattacharyya coefficient: the geometric-mean overlap between two distributions.
bhattacharyya_distance
Bhattacharyya distance (D_B(p,q) = -\ln BC(p,q)).
bregman_divergence
Bregman divergence: the gap between a convex function and its tangent approximation.
chi_squared_divergence
Chi-squared divergence: a member of the Csiszar f-divergence family that is particularly sensitive to tail differences.
conditional_entropy
Conditional entropy in nats: the remaining uncertainty about (X) after observing (Y).
cross_entropy_nats
Cross-entropy in nats: the expected code length when using model (q) to encode data drawn from true distribution (p).
csiszar_f_divergence
Csiszar f-divergence: the most general class of divergences that respect sufficient statistics (information monotonicity).
digamma
Digamma function: the logarithmic derivative of the Gamma function.
entropy_bits
Shannon entropy in bits.
entropy_nats
Shannon entropy in nats: the expected surprise under distribution (p).
entropy_unchecked
Fast Shannon entropy calculation without simplex validation.
hellinger
Hellinger distance: the square root of the squared Hellinger distance.
hellinger_squared
Squared Hellinger distance.
jensen_shannon_divergence
Jensen–Shannon divergence in nats: a symmetric, bounded smoothing of KL divergence.
jensen_shannon_weighted
Weighted Jensen–Shannon divergence: a generalization that allows unequal mixture weights.
kl_divergence
Kullback–Leibler divergence in nats: the information lost when (q) is used to approximate (p).
kl_divergence_gaussians
KL Divergence between two diagonal Multivariate Gaussians.
log_sum_exp
Log-sum-exp: numerically stable computation of ln(exp(a_1) + ... + exp(a_n)).
log_sum_exp2
Log-sum-exp for two values (common special case).
log_sum_exp_iter
Streaming log-sum-exp: single-pass, O(1) memory computation over an iterator.
mutual_information
Mutual information in nats: how much knowing (Y) reduces uncertainty about (X).
mutual_information_ksg
Estimate Mutual Information (I(X;Y)) using the KSG estimator.
normalize_in_place
Normalize a nonnegative vector in-place to sum to 1.
normalized_mutual_information
Normalized mutual information: MI scaled to ([0, 1]) for comparing clusterings of different sizes.
pmi
Pointwise mutual information: the log-ratio measuring how much more (or less) likely two specific outcomes co-occur than if they were independent.
renyi_divergence
Renyi divergence in nats: a one-parameter family that interpolates between different notions of distributional difference.
renyi_entropy
Renyi entropy in nats: a one-parameter generalization of Shannon entropy.
rho_alpha
Alpha-integral: the workhorse behind the entire alpha-family of divergences.
total_bregman_divergence
Total Bregman divergence as shown in Nielsen’s taxonomy diagram:
total_variation
Total variation distance: half the L1 norm between two distributions.
tsallis_divergence
Tsallis divergence: a non-extensive generalization of KL divergence from statistical mechanics.
tsallis_entropy
Tsallis entropy: a non-extensive generalization of Shannon entropy from statistical mechanics.
validate_simplex
Validate that p is a probability distribution on the simplex (within tol).

Type Aliases§

Result
Convenience alias for Result<T, logp::Error>.