Crate logp

Expand description

§logp

Information theory primitives: entropies and divergences.

§Scope

This crate is L1 (Logic) in the mathematical foundation: it should stay small and reusable. It provides scalar information measures that appear across clustering, ranking, evaluation, and geometry:

Shannon entropy and cross-entropy
KL / Jensen–Shannon divergences
Csiszár (f)-divergences (a.k.a. information monotone divergences)
Bhattacharyya coefficient, Rényi/Tsallis families
Bregman divergences (convex-analytic, not generally monotone)

§Distances vs divergences (terminology that prevents bugs)

A divergence (D(p:q)) is usually required to satisfy:

(D(p:q) \ge 0)
(D(p:p) = 0)

but it is typically not symmetric and not a metric (no triangle inequality). Many failures in downstream code are caused by treating a divergence as a distance metric.

§Key invariants (what tests should enforce)

Jensen–Shannon is bounded on the simplex: (0 \le JS(p,q) \le \ln 2) (nats).
Csiszár (f)-divergences are monotone under coarse-graining (Markov kernels): merging bins cannot increase the divergence.

§Further reading

Frank Nielsen, “Divergences” portal (taxonomy diagrams + references): https://franknielsen.github.io/Divergence/index.html
nocotan/awesome-information-geometry (curated reading list): https://github.com/nocotan/awesome-information-geometry
Csiszár (1967): (f)-divergences and information monotonicity.
Amari & Nagaoka (2000): Methods of Information Geometry.

§Taxonomy of Divergences (Nielsen)

Family	Generator	Key Property
f-divergences	Convex (f(t)) with (f(1)=0)	Monotone under Markov morphisms (coarse-graining)
Bregman	Convex (F(x))	Dually flat geometry; generalized Pythagorean theorem
Jensen-Shannon	(f)-div + metric	Symmetric, bounded ([0, \ln 2]), (\sqrt{JS}) is a metric
Alpha	(\rho_\alpha = \int p^\alpha q^{1-\alpha})	Encodes Rényi, Tsallis, Bhattacharyya, Hellinger

§Connections

rkhs: MMD and KL both measure distribution “distance”
wass: Wasserstein vs entropy-based divergences
stratify: NMI for cluster evaluation uses this crate
fynch: Temperature scaling affects entropy calibration

§References

Shannon (1948). “A Mathematical Theory of Communication”
Cover & Thomas (2006). “Elements of Information Theory”

Structs§

SquaredL2: Squared Euclidean Bregman generator: (F(x)=\tfrac12|x|_2^2), (\nabla F(x)=x).

Enums§

Error: Errors for information-measure computations.

Constants§

LN_2: Natural log of 2. Useful when converting nats ↔ bits or bounding Jensen–Shannon.

Traits§

BregmanGenerator: Bregman generator: a convex function (F) and its gradient.

Functions§

amari_alpha_divergence: Amari (\alpha)-divergence (Amari parameter (\alpha\in\mathbb{R})).
bhattacharyya_coeff: Bhattacharyya coefficient (BC(p,q) = \sum_i \sqrt{p_i q_i}).
bhattacharyya_distance: Bhattacharyya distance (D_B(p,q) = -\ln BC(p,q)).
bregman_divergence: Bregman divergence (B_F(p,q) = F(p) - F(q) - \langle p-q, \nabla F(q)\rangle).
cross_entropy_nats: Cross-entropy (H(p,q) = -\sum_i p_i \ln q_i) (nats).
csiszar_f_divergence: A Csiszár (f)-divergence with the standard form:
digamma: Digamma function (\psi(x)), the logarithmic derivative of the Gamma function.
entropy_bits: Shannon entropy in bits.
entropy_nats: Shannon entropy (H(p) = -\sum_i p_i \ln p_i) (nats).
entropy_unchecked: Fast Shannon entropy calculation without simplex validation.
hellinger: Hellinger distance (H(p,q) = \sqrt{H^2(p,q)}).
hellinger_squared: Squared Hellinger distance: (H^2(p,q) = 1 - \sum_i \sqrt{p_i q_i}).
jensen_shannon_divergence: Jensen–Shannon divergence (nats), defined as:
kl_divergence: Kullback–Leibler divergence (D_{KL}(p|q) = \sum_i p_i \ln(p_i/q_i)) (nats).
kl_divergence_gaussians: KL Divergence between two diagonal Multivariate Gaussians.
mutual_information: Mutual Information (I(X;Y) = \sum_{x,y} p(x,y) \ln \frac{p(x,y)}{p(x)p(y)}).
mutual_information_ksg: Estimate Mutual Information (I(X;Y)) using the KSG estimator.
normalize_in_place: Normalize a nonnegative vector in-place to sum to 1.
pmi: Pointwise Mutual Information (PMI(x;y) = \ln \frac{p(x,y)}{p(x)p(y)}).
renyi_divergence: Rényi divergence (nats):
rho_alpha: (\rho_\alpha[p:q] = \sum_i p_i^\alpha q_i^{1-\alpha}).
total_bregman_divergence: Total Bregman divergence as shown in Nielsen’s taxonomy diagram:
tsallis_divergence: Tsallis divergence:
validate_simplex: Validate that p is a probability distribution on the simplex (within tol).

Type Aliases§

Result

Crate logp

Crate logp Copy item path

§logp

§Scope

§Distances vs divergences (terminology that prevents bugs)

§Key invariants (what tests should enforce)

§Further reading

§Taxonomy of Divergences (Nielsen)

§Connections

§References

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate logp