Skip to main content

Crate logp

Crate logp 

Source
Expand description

§logp

Information theory primitives: entropies and divergences.

§Scope

This crate is L1 (Logic) in the mathematical foundation: it should stay small and reusable. It provides scalar information measures that appear across clustering, ranking, evaluation, and geometry:

  • Shannon entropy and cross-entropy
  • KL / Jensen–Shannon divergences
  • Csiszár (f)-divergences (a.k.a. information monotone divergences)
  • Bhattacharyya coefficient, Rényi/Tsallis families
  • Bregman divergences (convex-analytic, not generally monotone)

§Distances vs divergences (terminology that prevents bugs)

A divergence (D(p:q)) is usually required to satisfy:

  • (D(p:q) \ge 0)
  • (D(p:p) = 0)

but it is typically not symmetric and not a metric (no triangle inequality). Many failures in downstream code are caused by treating a divergence as a distance metric.

§Key invariants (what tests should enforce)

  • Jensen–Shannon is bounded on the simplex: (0 \le JS(p,q) \le \ln 2) (nats).
  • Csiszár (f)-divergences are monotone under coarse-graining (Markov kernels): merging bins cannot increase the divergence.

§Further reading

§Taxonomy of Divergences (Nielsen)

FamilyGeneratorKey Property
f-divergencesConvex (f(t)) with (f(1)=0)Monotone under Markov morphisms (coarse-graining)
BregmanConvex (F(x))Dually flat geometry; generalized Pythagorean theorem
Jensen-Shannon(f)-div + metricSymmetric, bounded ([0, \ln 2]), (\sqrt{JS}) is a metric
Alpha(\rho_\alpha = \int p^\alpha q^{1-\alpha})Encodes Rényi, Tsallis, Bhattacharyya, Hellinger

§Connections

  • rkhs: MMD and KL both measure distribution “distance”
  • wass: Wasserstein vs entropy-based divergences
  • stratify: NMI for cluster evaluation uses this crate
  • fynch: Temperature scaling affects entropy calibration

§References

  • Shannon (1948). “A Mathematical Theory of Communication”
  • Cover & Thomas (2006). “Elements of Information Theory”

Structs§

SquaredL2
Squared Euclidean Bregman generator: (F(x)=\tfrac12|x|_2^2), (\nabla F(x)=x).

Enums§

Error
Errors for information-measure computations.
KsgVariant
Algorithm variant for KSG estimator.

Constants§

LN_2
Natural log of 2. Useful when converting nats ↔ bits or bounding Jensen–Shannon.

Traits§

BregmanGenerator
Bregman generator: a convex function (F) and its gradient.

Functions§

amari_alpha_divergence
Amari alpha-divergence: a one-parameter family from information geometry that continuously interpolates between forward KL, reverse KL, and squared Hellinger.
bhattacharyya_coeff
Bhattacharyya coefficient: the geometric-mean overlap between two distributions.
bhattacharyya_distance
Bhattacharyya distance (D_B(p,q) = -\ln BC(p,q)).
bregman_divergence
Bregman divergence: the gap between a convex function and its tangent approximation.
cross_entropy_nats
Cross-entropy in nats: the expected code length when using model (q) to encode data drawn from true distribution (p).
csiszar_f_divergence
Csiszar f-divergence: the most general class of divergences that respect sufficient statistics (information monotonicity).
digamma
Digamma function: the logarithmic derivative of the Gamma function.
entropy_bits
Shannon entropy in bits.
entropy_nats
Shannon entropy in nats: the expected surprise under distribution (p).
entropy_unchecked
Fast Shannon entropy calculation without simplex validation.
hellinger
Hellinger distance: the square root of the squared Hellinger distance.
hellinger_squared
Squared Hellinger distance: one minus the Bhattacharyya coefficient.
jensen_shannon_divergence
Jensen–Shannon divergence in nats: a symmetric, bounded smoothing of KL divergence.
kl_divergence
Kullback–Leibler divergence in nats: the information lost when (q) is used to approximate (p).
kl_divergence_gaussians
KL Divergence between two diagonal Multivariate Gaussians.
log_sum_exp
Log-sum-exp: numerically stable computation of ln(exp(a_1) + ... + exp(a_n)).
log_sum_exp2
Log-sum-exp for two values (common special case).
mutual_information
Mutual information in nats: how much knowing (Y) reduces uncertainty about (X).
mutual_information_ksg
Estimate Mutual Information (I(X;Y)) using the KSG estimator.
normalize_in_place
Normalize a nonnegative vector in-place to sum to 1.
pmi
Pointwise mutual information: the log-ratio measuring how much more (or less) likely two specific outcomes co-occur than if they were independent.
renyi_divergence
Renyi divergence in nats: a one-parameter family that interpolates between different notions of distributional difference.
rho_alpha
Alpha-integral: the workhorse behind the entire alpha-family of divergences.
total_bregman_divergence
Total Bregman divergence as shown in Nielsen’s taxonomy diagram:
tsallis_divergence
Tsallis divergence: a non-extensive generalization of KL divergence from statistical mechanics.
validate_simplex
Validate that p is a probability distribution on the simplex (within tol).

Type Aliases§

Result