Expand description
§logp
Information theory primitives: entropies and divergences.
§Scope
This crate is L1 (Logic) in the mathematical foundation: it should stay small and reusable. It provides scalar information measures that appear across clustering, ranking, evaluation, and geometry:
- Shannon entropy and cross-entropy
- KL / Jensen–Shannon divergences
- Csiszár (f)-divergences (a.k.a. information monotone divergences)
- Bhattacharyya coefficient, Rényi/Tsallis families
- Bregman divergences (convex-analytic, not generally monotone)
§Distances vs divergences (terminology that prevents bugs)
A divergence (D(p:q)) is usually required to satisfy:
- (D(p:q) \ge 0)
- (D(p:p) = 0)
but it is typically not symmetric and not a metric (no triangle inequality). Many failures in downstream code are caused by treating a divergence as a distance metric.
§Key invariants (what tests should enforce)
- Jensen–Shannon is bounded on the simplex: (0 \le JS(p,q) \le \ln 2) (nats).
- Csiszár (f)-divergences are monotone under coarse-graining (Markov kernels): merging bins cannot increase the divergence.
§Further reading
- Frank Nielsen, “Divergences” portal (taxonomy diagrams + references): https://franknielsen.github.io/Divergence/index.html
nocotan/awesome-information-geometry(curated reading list): https://github.com/nocotan/awesome-information-geometry- Csiszár (1967): (f)-divergences and information monotonicity.
- Amari & Nagaoka (2000): Methods of Information Geometry.
§Taxonomy of Divergences (Nielsen)
| Family | Generator | Key Property |
|---|---|---|
| f-divergences | Convex (f(t)) with (f(1)=0) | Monotone under Markov morphisms (coarse-graining) |
| Bregman | Convex (F(x)) | Dually flat geometry; generalized Pythagorean theorem |
| Jensen-Shannon | (f)-div + metric | Symmetric, bounded ([0, \ln 2]), (\sqrt{JS}) is a metric |
| Alpha | (\rho_\alpha = \int p^\alpha q^{1-\alpha}) | Encodes Rényi, Tsallis, Bhattacharyya, Hellinger |
§Connections
rkhs: MMD and KL both measure distribution “distance”wass: Wasserstein vs entropy-based divergencesstratify: NMI for cluster evaluation uses this cratefynch: Temperature scaling affects entropy calibration
§References
- Shannon (1948). “A Mathematical Theory of Communication”
- Cover & Thomas (2006). “Elements of Information Theory”
Structs§
- Squared
L2 - Squared Euclidean Bregman generator: (F(x)=\tfrac12|x|_2^2), (\nabla F(x)=x).
Enums§
- Error
- Errors for information-measure computations.
Constants§
- LN_2
- Natural log of 2. Useful when converting nats ↔ bits or bounding Jensen–Shannon.
Traits§
- Bregman
Generator - Bregman generator: a convex function (F) and its gradient.
Functions§
- amari_
alpha_ divergence - Amari (\alpha)-divergence (Amari parameter (\alpha\in\mathbb{R})).
- bhattacharyya_
coeff - Bhattacharyya coefficient (BC(p,q) = \sum_i \sqrt{p_i q_i}).
- bhattacharyya_
distance - Bhattacharyya distance (D_B(p,q) = -\ln BC(p,q)).
- bregman_
divergence - Bregman divergence (B_F(p,q) = F(p) - F(q) - \langle p-q, \nabla F(q)\rangle).
- cross_
entropy_ nats - Cross-entropy (H(p,q) = -\sum_i p_i \ln q_i) (nats).
- csiszar_
f_ divergence - A Csiszár (f)-divergence with the standard form:
- digamma
- Digamma function (\psi(x)), the logarithmic derivative of the Gamma function.
- entropy_
bits - Shannon entropy in bits.
- entropy_
nats - Shannon entropy (H(p) = -\sum_i p_i \ln p_i) (nats).
- entropy_
unchecked - Fast Shannon entropy calculation without simplex validation.
- hellinger
- Hellinger distance (H(p,q) = \sqrt{H^2(p,q)}).
- hellinger_
squared - Squared Hellinger distance: (H^2(p,q) = 1 - \sum_i \sqrt{p_i q_i}).
- jensen_
shannon_ divergence - Jensen–Shannon divergence (nats), defined as:
- kl_
divergence - Kullback–Leibler divergence (D_{KL}(p|q) = \sum_i p_i \ln(p_i/q_i)) (nats).
- kl_
divergence_ gaussians - KL Divergence between two diagonal Multivariate Gaussians.
- mutual_
information - Mutual Information (I(X;Y) = \sum_{x,y} p(x,y) \ln \frac{p(x,y)}{p(x)p(y)}).
- mutual_
information_ ksg - Estimate Mutual Information (I(X;Y)) using the KSG estimator.
- normalize_
in_ place - Normalize a nonnegative vector in-place to sum to 1.
- pmi
- Pointwise Mutual Information (PMI(x;y) = \ln \frac{p(x,y)}{p(x)p(y)}).
- renyi_
divergence - Rényi divergence (nats):
- rho_
alpha - (\rho_\alpha[p:q] = \sum_i p_i^\alpha q_i^{1-\alpha}).
- total_
bregman_ divergence - Total Bregman divergence as shown in Nielsen’s taxonomy diagram:
- tsallis_
divergence - Tsallis divergence:
- validate_
simplex - Validate that
pis a probability distribution on the simplex (withintol).