Crate logp

Expand description

§logp

Information theory primitives: entropies and divergences.

§Scope

This crate is L1 (Logic) in the mathematical foundation: it should stay small and reusable. It provides scalar information measures that appear across clustering, ranking, evaluation, and geometry:

Shannon entropy and cross-entropy
KL / Jensen–Shannon divergences
Csiszár (f)-divergences (a.k.a. information monotone divergences)
Bhattacharyya coefficient, Rényi/Tsallis families
Bregman divergences (convex-analytic, not generally monotone)

§Distances vs divergences (terminology that prevents bugs)

A divergence (D(p:q)) is usually required to satisfy:

(D(p:q) \ge 0)
(D(p:p) = 0)

but it is typically not symmetric and not a metric (no triangle inequality). Many failures in downstream code are caused by treating a divergence as a distance metric.

§Key invariants (what tests should enforce)

Jensen–Shannon is bounded on the simplex: (0 \le JS(p,q) \le \ln 2) (nats).
Csiszár (f)-divergences are monotone under coarse-graining (Markov kernels): merging bins cannot increase the divergence.

§Further reading

Frank Nielsen, “Divergences” portal (taxonomy diagrams + references): https://franknielsen.github.io/Divergence/index.html
nocotan/awesome-information-geometry (curated reading list): https://github.com/nocotan/awesome-information-geometry
Csiszár (1967): (f)-divergences and information monotonicity.
Amari & Nagaoka (2000): Methods of Information Geometry.

§Taxonomy of Divergences (Nielsen)

Family	Generator	Key Property
f-divergences	Convex (f(t)) with (f(1)=0)	Monotone under Markov morphisms (coarse-graining)
Bregman	Convex (F(x))	Dually flat geometry; generalized Pythagorean theorem
Jensen-Shannon	(f)-div + metric	Symmetric, bounded ([0, \ln 2]), (\sqrt{JS}) is a metric
Alpha	(\rho_\alpha = \int p^\alpha q^{1-\alpha})	Encodes Rényi, Tsallis, Bhattacharyya, Hellinger

§Connections

rkhs: MMD and KL both measure distribution “distance”
wass: Wasserstein vs entropy-based divergences
stratify: NMI for cluster evaluation uses this crate
fynch: Temperature scaling affects entropy calibration

§References

Shannon (1948). “A Mathematical Theory of Communication”
Cover & Thomas (2006). “Elements of Information Theory”

Structs§

SquaredL2: Squared Euclidean Bregman generator: (F(x)=\tfrac12|x|_2^2), (\nabla F(x)=x).

Enums§

Error: Errors for information-measure computations.
KsgVariant: Algorithm variant for KSG estimator.

Constants§

LN_2: Natural log of 2. Useful when converting nats ↔ bits or bounding Jensen–Shannon.

Traits§

BregmanGenerator: Bregman generator: a convex function (F) and its gradient.

Functions§

amari_alpha_divergence: Amari alpha-divergence: a one-parameter family from information geometry that continuously interpolates between forward KL, reverse KL, and squared Hellinger.
bhattacharyya_coeff: Bhattacharyya coefficient: the geometric-mean overlap between two distributions.
bhattacharyya_distance: Bhattacharyya distance (D_B(p,q) = -\ln BC(p,q)).
bregman_divergence: Bregman divergence: the gap between a convex function and its tangent approximation.
cross_entropy_nats: Cross-entropy in nats: the expected code length when using model (q) to encode data drawn from true distribution (p).
csiszar_f_divergence: Csiszar f-divergence: the most general class of divergences that respect sufficient statistics (information monotonicity).
digamma: Digamma function: the logarithmic derivative of the Gamma function.
entropy_bits: Shannon entropy in bits.
entropy_nats: Shannon entropy in nats: the expected surprise under distribution (p).
entropy_unchecked: Fast Shannon entropy calculation without simplex validation.
hellinger: Hellinger distance: the square root of the squared Hellinger distance.
hellinger_squared: Squared Hellinger distance: one minus the Bhattacharyya coefficient.
jensen_shannon_divergence: Jensen–Shannon divergence in nats: a symmetric, bounded smoothing of KL divergence.
kl_divergence: Kullback–Leibler divergence in nats: the information lost when (q) is used to approximate (p).
kl_divergence_gaussians: KL Divergence between two diagonal Multivariate Gaussians.
log_sum_exp: Log-sum-exp: numerically stable computation of ln(exp(a_1) + ... + exp(a_n)).
log_sum_exp2: Log-sum-exp for two values (common special case).
mutual_information: Mutual information in nats: how much knowing (Y) reduces uncertainty about (X).
mutual_information_ksg: Estimate Mutual Information (I(X;Y)) using the KSG estimator.
normalize_in_place: Normalize a nonnegative vector in-place to sum to 1.
pmi: Pointwise mutual information: the log-ratio measuring how much more (or less) likely two specific outcomes co-occur than if they were independent.
renyi_divergence: Renyi divergence in nats: a one-parameter family that interpolates between different notions of distributional difference.
rho_alpha: Alpha-integral: the workhorse behind the entire alpha-family of divergences.
total_bregman_divergence: Total Bregman divergence as shown in Nielsen’s taxonomy diagram:
tsallis_divergence: Tsallis divergence: a non-extensive generalization of KL divergence from statistical mechanics.
validate_simplex: Validate that p is a probability distribution on the simplex (within tol).

Type Aliases§

Result

Crate logp

Crate logp Copy item path

§logp

§Scope

§Distances vs divergences (terminology that prevents bugs)

§Key invariants (what tests should enforce)

§Further reading

§Taxonomy of Divergences (Nielsen)

§Connections

§References

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Crate logp