Expand description
§rkhs
Reproducing Kernel Hilbert Space primitives for distribution comparison.
§Why “RKHS”?
A Reproducing Kernel Hilbert Space is the mathematical structure where kernel methods live. Every positive-definite kernel k(x,y) defines an RKHS (via Mercer’s theorem), and every RKHS has a unique reproducing kernel.
This crate provides the primitives: kernels, Gram matrices, MMD, and Dense Associative Memory energy functions—all operations in or derived from the RKHS.
§Intuition
Kernels measure similarity in a (potentially infinite-dimensional) feature space without ever computing the features explicitly. This “kernel trick” enables nonlinear methods with linear complexity.
MMD (Maximum Mean Discrepancy) uses kernels to test whether two samples come from the same distribution. It embeds distributions into an RKHS and measures the distance between their mean embeddings (kernel mean embeddings).
§Key Functions
| Function | Purpose |
|---|---|
rbf | Radial Basis Function (Gaussian) kernel |
epanechnikov | Optimal kernel for density estimation |
polynomial | Polynomial kernel |
kernel_matrix | Gram matrix K[i,j] = k(x_i, x_j) |
kernel_sum | Sum Σ κ(v, ξ^μ) for AM/kernel machines |
energy_lse | Log-Sum-Exp energy (Dense AM with RBF) |
energy_lsr | Log-Sum-ReLU energy (Dense AM with Epanechnikov) |
retrieve_memory | Memory retrieval via energy descent |
mmd_biased | O(n²) biased MMD estimate |
mmd_unbiased | O(n²) unbiased MMD u-statistic |
mmd_permutation_test | Significance test via permutation |
§Quick Start
use rkhs::{rbf, mmd_unbiased};
let x = vec![vec![0.0, 0.0], vec![0.1, 0.1], vec![0.2, 0.0]];
let y = vec![vec![5.0, 5.0], vec![5.1, 5.1], vec![5.2, 5.0]];
// Different distributions → large MMD
let mmd = mmd_unbiased(&x, &y, |a, b| rbf(a, b, 1.0));
assert!(mmd > 0.5);§Why Kernels Matter for ML
- Associative Memory: Energy functions E = -log Σ κ(v, ξ) define memory landscapes
- GAN evaluation: FID uses MMD-like statistics to compare generated vs real
- Domain adaptation: Minimize MMD between source and target distributions
- Two-sample testing: Detect distribution shift in production systems
- Kernel regression: Nonparametric regression via kernel mean embedding
§Associative Memory
Dense Associative Memory (Krotov et al., 2016-2025) uses kernel sums to define energy landscapes for content-addressable memory:
use rkhs::{energy_lse, energy_lsr, energy_lse_grad, retrieve_memory};
// Store two memories
let memories = vec![
vec![0.0, 0.0],
vec![10.0, 10.0],
];
// Query: corrupted version of first memory
let query = vec![1.0, 1.0];
// Retrieve via energy descent (LSE energy)
let (retrieved, _) = retrieve_memory(
query,
&memories,
|v, m| energy_lse_grad(v, m, 2.0),
0.1,
100,
1e-6,
);
// Should recover [0, 0]
assert!(retrieved[0].abs() < 1.0);The LSR energy (using Epanechnikov kernel) has special properties:
- Exact single-step retrieval
- Novel memory generation at basin intersections
- Compact support (infinite energy outside memory neighborhoods)
§Connections
logp: MMD and KL divergence both measure distribution “distance”wass: Wasserstein and MMD are different ways to compare distributionslapl: Gaussian kernel → Laplacian eigenvalue problems
§What Can Go Wrong
- Bandwidth too small: RBF kernel becomes nearly diagonal, loses structure.
- Bandwidth too large: Everything becomes similar, no discrimination.
- Numerical instability: Very large distances → exp(-large) → 0 underflow.
- MMD variance: With small samples, MMD estimates are noisy. Use permutation test.
- Kernel not characteristic: Not all kernels can distinguish all distributions. RBF is characteristic (good); polynomial is not (bad for two-sample test).
§References
- Gretton et al. (2012). “A Kernel Two-Sample Test” (JMLR)
- Muandet et al. (2017). “Kernel Mean Embedding of Distributions” (Found. & Trends)
- Hoover et al. (2025). “Dense Associative Memory with Epanechnikov Energy”
Re-exports§
pub use clam::am_assign;pub use clam::am_contract;pub use clam::am_soft_assign;pub use clam::clam_loss;
Modules§
- clam
- ClAM: Clustering with Associative Memory helpers. ClAM: Clustering with Associative Memory helpers.
Functions§
- cosine
- Cosine kernel: k(x, y) = cos(π/2 * min(||x-y||/σ, 1))
- energy_
lse - Log-Sum-Exp (LSE) energy for Dense Associative Memory.
- energy_
lse_ grad - Gradient of LSE energy: ∇_v E_LSE(v; Ξ)
- energy_
lsr - Log-Sum-ReLU (LSR) energy for Dense Associative Memory with Epanechnikov kernel.
- energy_
lsr_ grad - Gradient of LSR energy: ∇_v E_LSR(v; Ξ)
- epanechnikov
- Epanechnikov kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)
- kernel_
matrix - Compute the Gram matrix K[i,j] = k(X[i], X[j]).
- kernel_
sum - Compute kernel sum: Σ_μ κ(v, ξ^μ)
- laplacian
- Laplacian kernel: k(x, y) = exp(-||x-y||₁ / σ)
- linear
- Linear kernel: k(x, y) = ⟨x, y⟩
- median_
bandwidth - Median heuristic for RBF bandwidth selection.
- mmd_
biased - Biased MMD estimate in O(n^2) time.
- mmd_
permutation_ test - Permutation test for MMD significance.
- mmd_
unbiased - Unbiased MMD² estimate (u-statistic).
- polynomial
- Polynomial kernel: k(x, y) = (γ⟨x,y⟩ + c)^d
- quartic
- Quartic (Biweight) kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)²
- rbf
- Radial Basis Function (Gaussian) kernel: k(x, y) = exp(-||x-y||² / (2σ²))
- rbf_
kernel_ matrix_ ndarray - RBF Gram matrix for an
ndarraymatrix of points. - retrieve_
memory - Retrieve memory using energy descent.
- triangle
- Triangle kernel: k(x, y) = max(0, 1 - ||x-y|| / σ)
- tricube
- Tricube kernel: k(x, y) = max(0, 1 - (||x-y||/σ)³)³
- triweight
- Triweight kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)³