Expand description
§logp
Information theory primitives: entropies and divergences.
§Scope
Scalar information measures that appear across clustering, ranking, evaluation, and geometry:
- Shannon entropy and cross-entropy
- KL / Jensen–Shannon divergences
- Csiszár (f)-divergences (a.k.a. information monotone divergences)
- Bhattacharyya coefficient, Rényi/Tsallis families
- Bregman divergences (convex-analytic, not generally monotone)
§Distances vs divergences (terminology that prevents bugs)
A divergence (D(p:q)) is usually required to satisfy:
- (D(p:q) \ge 0)
- (D(p:p) = 0)
but it is typically not symmetric and not a metric (no triangle inequality). Many failures in downstream code are caused by treating a divergence as a distance metric.
§Key invariants (what tests should enforce)
- Jensen–Shannon is bounded on the simplex: (0 \le JS(p,q) \le \ln 2) (nats).
- Csiszár (f)-divergences are monotone under coarse-graining (Markov kernels): merging bins cannot increase the divergence.
§Further reading
- Frank Nielsen, “Divergences” portal (taxonomy diagrams + references): https://franknielsen.github.io/Divergence/index.html
nocotan/awesome-information-geometry(curated reading list): https://github.com/nocotan/awesome-information-geometry- Csiszár (1967): (f)-divergences and information monotonicity.
- Amari & Nagaoka (2000): Methods of Information Geometry.
§Taxonomy of Divergences (Nielsen)
| Family | Generator | Key Property |
|---|---|---|
| f-divergences | Convex (f(t)) with (f(1)=0) | Monotone under Markov morphisms (coarse-graining) |
| Bregman | Convex (F(x)) | Dually flat geometry; generalized Pythagorean theorem |
| Jensen-Shannon | (f)-div + metric | Symmetric, bounded ([0, \ln 2]), (\sqrt{JS}) is a metric |
| Alpha | (\rho_\alpha = \int p^\alpha q^{1-\alpha}) | Encodes Rényi, Tsallis, Bhattacharyya, Hellinger |
§References
- Shannon (1948). “A Mathematical Theory of Communication”
- Cover & Thomas (2006). “Elements of Information Theory”
Structs§
- NegEntropy
- Negative-entropy Bregman generator: (F(x) = \sum_i x_i \ln x_i), (\nabla F(x)_i = 1 + \ln x_i).
- Squared
L2 - Squared Euclidean Bregman generator: (F(x)=\tfrac12|x|_2^2), (\nabla F(x)=x).
Enums§
- Error
- Errors for information-measure computations.
- KsgVariant
- Algorithm variant for KSG estimator.
Traits§
- Bregman
Generator - Bregman generator: a convex function (F) and its gradient.
Functions§
- amari_
alpha_ divergence - Amari alpha-divergence: a one-parameter family from information geometry that continuously interpolates between forward KL, reverse KL, and squared Hellinger.
- bhattacharyya_
coeff - Bhattacharyya coefficient: the geometric-mean overlap between two distributions.
- bhattacharyya_
distance - Bhattacharyya distance (D_B(p,q) = -\ln BC(p,q)).
- bregman_
divergence - Bregman divergence: the gap between a convex function and its tangent approximation.
- chi_
squared_ divergence - Chi-squared divergence: a member of the Csiszar f-divergence family that is particularly sensitive to tail differences.
- conditional_
entropy - Conditional entropy in nats: the remaining uncertainty about (X) after observing (Y).
- cross_
entropy_ nats - Cross-entropy in nats: the expected code length when using model (q) to encode data drawn from true distribution (p).
- csiszar_
f_ divergence - Csiszar f-divergence: the most general class of divergences that respect sufficient statistics (information monotonicity).
- digamma
- Digamma function: the logarithmic derivative of the Gamma function.
- entropy_
bits - Shannon entropy in bits.
- entropy_
nats - Shannon entropy in nats: the expected surprise under distribution (p).
- entropy_
unchecked - Fast Shannon entropy calculation without simplex validation.
- hellinger
- Hellinger distance: the square root of the squared Hellinger distance.
- hellinger_
squared - Squared Hellinger distance.
- jensen_
shannon_ divergence - Jensen–Shannon divergence in nats: a symmetric, bounded smoothing of KL divergence.
- jensen_
shannon_ weighted - Weighted Jensen–Shannon divergence: a generalization that allows unequal mixture weights.
- kl_
divergence - Kullback–Leibler divergence in nats: the information lost when (q) is used to approximate (p).
- kl_
divergence_ gaussians - KL Divergence between two diagonal Multivariate Gaussians.
- log_
sum_ exp - Log-sum-exp: numerically stable computation of
ln(exp(a_1) + ... + exp(a_n)). - log_
sum_ exp2 - Log-sum-exp for two values (common special case).
- log_
sum_ exp_ iter - Streaming log-sum-exp: single-pass, O(1) memory computation over an iterator.
- mutual_
information - Mutual information in nats: how much knowing (Y) reduces uncertainty about (X).
- mutual_
information_ ksg - Estimate Mutual Information (I(X;Y)) using the KSG estimator.
- normalize_
in_ place - Normalize a nonnegative vector in-place to sum to 1.
- normalized_
mutual_ information - Normalized mutual information: MI scaled to ([0, 1]) for comparing clusterings of different sizes.
- pmi
- Pointwise mutual information: the log-ratio measuring how much more (or less) likely two specific outcomes co-occur than if they were independent.
- renyi_
divergence - Renyi divergence in nats: a one-parameter family that interpolates between different notions of distributional difference.
- renyi_
entropy - Renyi entropy in nats: a one-parameter generalization of Shannon entropy.
- rho_
alpha - Alpha-integral: the workhorse behind the entire alpha-family of divergences.
- total_
bregman_ divergence - Total Bregman divergence as shown in Nielsen’s taxonomy diagram:
- total_
variation - Total variation distance: half the L1 norm between two distributions.
- tsallis_
divergence - Tsallis divergence: a non-extensive generalization of KL divergence from statistical mechanics.
- tsallis_
entropy - Tsallis entropy: a non-extensive generalization of Shannon entropy from statistical mechanics.
- validate_
simplex - Validate that
pis a probability distribution on the simplex (withintol).
Type Aliases§
- Result
- Convenience alias for
Result<T, logp::Error>.