Skip to main content

Crate rkhs

Crate rkhs 

Source
Expand description

§rkhs

Reproducing Kernel Hilbert Space primitives for distribution comparison.

§Why “RKHS”?

A Reproducing Kernel Hilbert Space is the mathematical structure where kernel methods live. Every positive-definite kernel k(x,y) defines an RKHS (via Mercer’s theorem), and every RKHS has a unique reproducing kernel.

This crate provides the primitives: kernels, Gram matrices, MMD, and kernel quantile embeddings. Dense Associative Memory (AM) functions are re-exported from the hopfield crate.

§Intuition

Kernels measure similarity in a (potentially infinite-dimensional) feature space without ever computing the features explicitly. This “kernel trick” enables nonlinear methods with linear complexity.

MMD (Maximum Mean Discrepancy) uses kernels to test whether two samples come from the same distribution. It embeds distributions into an RKHS and measures the distance between their mean embeddings (kernel mean embeddings).

§Key Functions

FunctionPurpose
rbfRadial Basis Function (Gaussian) kernel
epanechnikovOptimal kernel for density estimation
polynomialPolynomial kernel
kernel_matrixGram matrix K[i,j] = k(x_i, x_j)
kernel_sumSum Σ κ(v, ξ^μ) for AM/kernel machines (from hopfield)
energy_lseLog-Sum-Exp energy (Dense AM with RBF) (from hopfield)
energy_lsrLog-Sum-ReLU energy (Dense AM with Epanechnikov) (from hopfield)
retrieve_memoryMemory retrieval via energy descent (from hopfield)
mmd_biasedO(n²) biased MMD estimate
mmd_unbiasedO(n²) unbiased MMD u-statistic
mmd_permutation_testSignificance test via permutation
kernel_quantile_embeddingKernel embedding at a quantile level
qmmdQuantile MMD (tail-sensitive distribution comparison)
weighted_qmmdQMMD with configurable quantile-level weighting
quantile_function_embeddingKernel-smoothed quantile function at specified levels
quantile_distribution_kernelKernel between distributions via quantile embeddings
quantile_gram_matrixGram matrix restricted to a quantile level

§Modern Hopfield Networks in 10 Lines

Dense Associative Memory (AM) functions are provided by the hopfield crate and re-exported here for convenience:

use rkhs::{energy_lse_grad, retrieve_memory};

// Store three patterns (colours in RGB-ish space)
let memories = vec![
    vec![1.0, 0.0, 0.0],  // red
    vec![0.0, 1.0, 0.0],  // green
    vec![0.0, 0.0, 1.0],  // blue
];

// Noisy query: mostly red but corrupted
let query = vec![0.9, 0.2, 0.1];

// Retrieve via energy descent
let (retrieved, iters) = retrieve_memory(
    query,
    &memories,
    |v, m| energy_lse_grad(v, m, 10.0),  // beta=10 → sharp attractor
    0.1,   // learning rate
    200,   // max iterations
    1e-7,  // convergence tolerance
);

// Nearest pattern is red: [1,0,0]
assert!(retrieved[0] > 0.9, "should converge to red");
assert!(retrieved[1] < 0.1, "green component suppressed");
assert!(retrieved[2] < 0.1, "blue component suppressed");
println!("Converged in {iters} iterations: {retrieved:?}");

§Quick Start (MMD)

use rkhs::{rbf, mmd_unbiased};

let x = vec![vec![0.0, 0.0], vec![0.1, 0.1], vec![0.2, 0.0]];
let y = vec![vec![5.0, 5.0], vec![5.1, 5.1], vec![5.2, 5.0]];

// Different distributions → large MMD
let mmd = mmd_unbiased(&x, &y, |a, b| rbf(a, b, 1.0));
assert!(mmd > 0.5);

§Why Kernels Matter for ML

  • Associative Memory: Energy functions E = -log Σ κ(v, ξ) define memory landscapes
  • GAN evaluation: FID uses MMD-like statistics to compare generated vs real
  • Domain adaptation: Minimize MMD between source and target distributions
  • Two-sample testing: Detect distribution shift in production systems
  • Kernel regression: Nonparametric regression via kernel mean embedding

§Connections

  • logp: MMD and KL divergence both measure distribution “distance”
  • wass: Wasserstein and MMD are different ways to compare distributions
  • lapl: Gaussian kernel → Laplacian eigenvalue problems

§What Can Go Wrong

  1. Bandwidth too small: RBF kernel becomes nearly diagonal, loses structure.
  2. Bandwidth too large: Everything becomes similar, no discrimination.
  3. Numerical instability: Very large distances → exp(-large) → 0 underflow.
  4. MMD variance: With small samples, MMD estimates are noisy. Use permutation test.
  5. Kernel not characteristic: Not all kernels can distinguish all distributions. RBF is characteristic (good); polynomial is not (bad for two-sample test).

§References

  • Gretton et al. (2012). “A Kernel Two-Sample Test” (JMLR)
  • Muandet et al. (2017). “Kernel Mean Embedding of Distributions” (Found. & Trends)
  • Naslidnyk et al. (2025). “Kernel Quantile Embeddings”

Re-exports§

pub use clam::am_assign;Deprecated
pub use clam::am_contract;Deprecated
pub use clam::am_soft_assign;Deprecated
pub use clam::clam_loss;Deprecated
pub use distribution_kernel::expected_likelihood_kernel;
pub use distribution_kernel::fisher_kernel_categorical;
pub use distribution_kernel::jensen_shannon_kernel;
pub use distribution_kernel::probability_product_kernel;
pub use graph_kernel::random_walk_kernel;Deprecated
pub use graph_kernel::sliced_wasserstein_graph_kernel;Deprecated
pub use graph_kernel::structural_node_features;Deprecated
pub use graph_kernel::wl_subtree_kernel;Deprecated
pub use quantile_kernel::kernel_quantile_embedding;
pub use quantile_kernel::qmmd;
pub use quantile_kernel::quantile_distribution_kernel;
pub use quantile_kernel::quantile_function_embedding;
pub use quantile_kernel::quantile_gram_matrix;
pub use quantile_kernel::weighted_qmmd;
pub use quantile_kernel::QuantileWeight;

Modules§

clamDeprecated
CLAM: Clustering with Associative Memory helpers.
distribution_kernel
Kernels on probability distributions. Distribution kernels: similarity measures between probability distributions.
graph_kernel
Kernels on labeled graphs.
quantile_kernel
Kernel quantile embeddings for tail-sensitive distribution comparison. Kernel quantile embeddings for tail-sensitive distribution comparison.

Functions§

cosine
Cosine kernel: k(x, y) = cos(π/2 * min(||x-y||/σ, 1))
energy_lse
Log-Sum-Exp (LSE) energy for Dense Associative Memory.
energy_lse_grad
Gradient of LSE energy: ∇_v E_LSE(v; Ξ)
energy_lsr
Log-Sum-ReLU (LSR) energy for Dense Associative Memory with Epanechnikov kernel.
energy_lsr_grad
Gradient of LSR energy: ∇_v E_LSR(v; Ξ)
epanechnikov
Epanechnikov kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)
kernel_matrix
Compute the Gram matrix K[i,j] = k(X[i], X[j]).
kernel_sum
Compute kernel sum: Σ_μ κ(v, ξ^μ)
laplacian
Laplacian kernel: k(x, y) = exp(-||x-y||₁ / σ)
linear
Linear kernel: k(x, y) = ⟨x, y⟩
median_bandwidth
Median heuristic for RBF bandwidth selection.
mmd_biased
Biased MMD estimate in O(n^2) time.
mmd_permutation_test
Permutation test for MMD significance.
mmd_unbiased
Unbiased MMD² estimate (u-statistic).
polynomial
Polynomial kernel: k(x, y) = (γ⟨x,y⟩ + c)^d
quartic
Quartic (Biweight) kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)²
rbf
Radial Basis Function (Gaussian) kernel: k(x, y) = exp(-||x-y||² / (2σ²))
rbf_kernel_matrix_ndarray
RBF Gram matrix for an ndarray matrix of points.
retrieve_memory
Retrieve memory using energy descent.
triangle
Triangle kernel: k(x, y) = max(0, 1 - ||x-y|| / σ)
tricube
Tricube kernel: k(x, y) = max(0, 1 - (||x-y||/σ)³)³
triweight
Triweight kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)³