logp
Information theory primitives in Rust: entropy, KL divergence, mutual information (KSG estimator), and information-monotone divergences.
[]
= "0.1.4"
Discrete distributions: Shannon/Renyi/Tsallis entropy, KL/JS/Hellinger/Bhattacharyya divergence, chi-squared divergence, total variation, f-divergences. Continuous distributions: KSG mutual information estimator (type I and II), differential entropy via k-NN. All validated by property-based tests (KL non-negativity, Pinsker's inequality, JS boundedness, sqrt(JS) triangle inequality).
Examples
Divergence landscape. Sweep all divergences over a family of binary distributions to see how they behave as distributions diverge:
t KL JS Hell Bhatt Renyi0.5 Tsallis2
----------------------------------------------------------------------
0.050 0.49463 0.14234 0.39075 0.16568 0.33136 0.81000
0.250 0.13081 0.03382 0.18459 0.03467 0.06934 0.25000
0.500 0.00000 0.00000 0.00000 -0.00000 -0.00000 0.00000
0.750 0.13081 0.03382 0.18459 0.03467 0.06934 0.25000
0.950 0.49463 0.14234 0.39075 0.16568 0.33136 0.81000
Observations:
- KL diverges as p -> delta; JS stays bounded by ln(2) ~ 0.693
- Hellinger saturates at 1.0; Bhattacharyya diverges
Document similarity via bag-of-words. JS divergence between word frequency distributions as a simple text similarity measure:
Pair JS KL(a|b) KL(b|a)
----------------------------------------
doc_a-doc_b 0.2310 7.0780 7.0780
doc_a-doc_c 0.6931 21.4651 21.0799
Note: JS is symmetric and bounded [0, ln2]; KL is asymmetric and unbounded.
KSG mutual information estimator. Estimate mutual information of correlated Gaussians using the Kraskov-Stogbauer-Grassberger method, then compare against the analytical value:
High-dimensional mutual information. KSG estimation in 10 dimensions (two correlated 5D blocks). Histogram-based MI is infeasible at this dimensionality (10^10 bins), but KSG converges cleanly. Compares estimates against the closed-form Gaussian MI:
Feature selection. Rank 8 candidate features (3 linear, 1 quadratic, 4 noise) by their KSG mutual information with a target variable. The nonlinear feature (quadratic) is correctly ranked alongside the linear ones -- something correlation-based methods miss:
What it provides
Entropies: Shannon (nats/bits), Renyi, Tsallis, cross-entropy, conditional entropy, mutual information (discrete + KSG continuous estimator), normalized MI.
Divergences: KL, Jensen-Shannon (equal and weighted), Hellinger, Bhattacharyya, total variation, chi-squared, Renyi, Tsallis, Amari alpha-family, Csiszar f-divergences, Bregman divergences (SquaredL2 and NegEntropy generators).
Gaussian KL: Closed-form KL between diagonal Gaussians.
Usage
[]
= "0.1.4"
use ;
let p = ;
let q = ;
let h = entropy_nats.unwrap; // Shannon entropy
let kl = kl_divergence.unwrap; // KL(p || q)
let js = jensen_shannon_divergence.unwrap; // JS divergence
The tol parameter controls how strictly inputs are validated as probability distributions. Use 1e-9 for normalized inputs; use 1e-6 if inputs may have minor floating-point drift.
Tests
127 tests (93 unit + 34 doc-tests) covering all public API functions, including property-based tests for KL non-negativity, Pinsker's inequality and tightness, JS boundedness, sqrt(JS) and Hellinger and total variation triangle inequality, Renyi divergence and entropy monotonicity in alpha, Amari alpha-KL correspondence, Csiszar f-divergence with KL/Hellinger/chi-squared generators, Bhattacharyya-Renyi consistency, entropy concavity, cross-entropy decomposition, conditional entropy chain rule, chi-squared/KL upper bound, total Bregman normalization, NegEntropy Bregman/KL equivalence, digamma precision at DLMF reference values, PMI edge cases, near-boundary numerical robustness, data processing inequality for discrete MI, f-divergence monotonicity under coarse-graining, streaming log-sum-exp, weighted JS entropy bounds, and KSG estimator accuracy against Gaussian ground truth.
License
MIT OR Apache-2.0