Crate rkhs

Expand description

§rkhs

Reproducing Kernel Hilbert Space primitives for distribution comparison.

§Why “RKHS”?

A Reproducing Kernel Hilbert Space is the mathematical structure where kernel methods live. Every positive-definite kernel k(x,y) defines an RKHS (via Mercer’s theorem), and every RKHS has a unique reproducing kernel.

This crate provides the primitives: kernels, Gram matrices, MMD, and random Fourier features—all operations in or derived from the RKHS.

§Intuition

Kernels measure similarity in a (potentially infinite-dimensional) feature space without ever computing the features explicitly. This “kernel trick” enables nonlinear methods with linear complexity.

MMD (Maximum Mean Discrepancy) uses kernels to test whether two samples come from the same distribution. It embeds distributions into an RKHS and measures the distance between their mean embeddings (kernel mean embeddings).

§Key Functions

Function	Purpose
`rbf`	Radial Basis Function (Gaussian) kernel
`epanechnikov`	Optimal kernel for density estimation
`polynomial`	Polynomial kernel
`kernel_matrix`	Gram matrix K[i,j] = k(x_i, x_j)
`kernel_sum`	Sum Σ κ(v, ξ^μ) for AM/kernel machines
`energy_lse`	Log-Sum-Exp energy (Dense AM with RBF)
`energy_lsr`	Log-Sum-ReLU energy (Dense AM with Epanechnikov)
`retrieve_memory`	Memory retrieval via energy descent
`mmd_biased`	O(n²) biased MMD estimate
`mmd_unbiased`	O(n²) unbiased MMD u-statistic
`mmd_permutation_test`	Significance test via permutation

§Quick Start

use rkhs::{rbf, mmd_unbiased};

let x = vec![vec![0.0, 0.0], vec![0.1, 0.1], vec![0.2, 0.0]];
let y = vec![vec![5.0, 5.0], vec![5.1, 5.1], vec![5.2, 5.0]];

// Different distributions → large MMD
let mmd = mmd_unbiased(&x, &y, |a, b| rbf(a, b, 1.0));
assert!(mmd > 0.5);

§Why Kernels Matter for ML

Associative Memory: Energy functions E = -log Σ κ(v, ξ) define memory landscapes
GAN evaluation: FID uses MMD-like statistics to compare generated vs real
Domain adaptation: Minimize MMD between source and target distributions
Two-sample testing: Detect distribution shift in production systems
Kernel regression: Nonparametric regression via kernel mean embedding

§Associative Memory

Dense Associative Memory (Krotov et al., 2016-2025) uses kernel sums to define energy landscapes for content-addressable memory:

use rkhs::{energy_lse, energy_lsr, energy_lse_grad, retrieve_memory};

// Store two memories
let memories = vec![
    vec![0.0, 0.0],
    vec![10.0, 10.0],
];

// Query: corrupted version of first memory
let query = vec![1.0, 1.0];

// Retrieve via energy descent (LSE energy)
let (retrieved, _) = retrieve_memory(
    query,
    &memories,
    |v, m| energy_lse_grad(v, m, 2.0),
    0.1,
    100,
    1e-6,
);

// Should recover [0, 0]
assert!(retrieved[0].abs() < 1.0);

The LSR energy (using Epanechnikov kernel) has special properties:

Exact single-step retrieval
Novel memory generation at basin intersections
Compact support (infinite energy outside memory neighborhoods)

§Connections

logp: MMD and KL divergence both measure distribution “distance”
wass: Wasserstein and MMD are different ways to compare distributions
lapl: Gaussian kernel → Laplacian eigenvalue problems
strata: Kernel k-means uses these kernels
innr: SIMD acceleration for kernel computations (via simd feature)

§SIMD Acceleration

Enable the simd feature for SIMD-accelerated kernel computations:

[dependencies]
rkhs = { version = "0.1", features = ["simd"] }

This uses [innr] for fast L2 distance and dot products.

§What Can Go Wrong

Bandwidth too small: RBF kernel becomes nearly diagonal, loses structure.
Bandwidth too large: Everything becomes similar, no discrimination.
Numerical instability: Very large distances → exp(-large) → 0 underflow.
MMD variance: With small samples, MMD estimates are noisy. Use permutation test.
Kernel not characteristic: Not all kernels can distinguish all distributions. RBF is characteristic (good); polynomial is not (bad for two-sample test).

§References

Gretton et al. (2012). “A Kernel Two-Sample Test” (JMLR)
Muandet et al. (2017). “Kernel Mean Embedding of Distributions” (Found. & Trends)
Rahimi & Recht (2007). “Random Features for Large-Scale Kernel Machines”

Enums§

Error: Errors for kernel operations.

Functions§

cosine: Cosine kernel: k(x, y) = cos(π/2 * min(||x-y||/σ, 1))
energy_descent_step: Single step of energy descent for memory retrieval.
energy_lse: Log-Sum-Exp (LSE) energy for Dense Associative Memory.
energy_lse_grad: Gradient of LSE energy: ∇_v E_LSE(v; Ξ)
energy_lsr: Log-Sum-ReLU (LSR) energy for Dense Associative Memory with Epanechnikov kernel.
energy_lsr_grad: Gradient of LSR energy: ∇_v E_LSR(v; Ξ)
epanechnikov: Epanechnikov kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)
generate_positive_rff_frequencies: Generate random frequencies for positive random features.
kernel_matrix: Compute the Gram matrix K[i,j] = k(X[i], X[j]).
kernel_matrix_cross: Compute cross-kernel matrix K[i,j] = k(X[i], Y[j]).
kernel_matrix_ndarray: Compute the Gram matrix K[i,j] = k(x_i, x_j) for an ndarray matrix of points.
kernel_sum: Compute kernel sum: Σ_μ κ(v, ξ^μ)
laplacian: Laplacian kernel: k(x, y) = exp(-||x-y||₁ / σ)
linear: Linear kernel: k(x, y) = ⟨x, y⟩
median_bandwidth: Median heuristic for RBF bandwidth selection.
mmd_biased: Biased MMD estimate in O(n) time.
mmd_linear_rff: Linear-time MMD estimate using random features.
mmd_linear_rff_with_rng: Linear-time MMD estimate using random Fourier features, with caller-provided RNG.
mmd_linear_rff_with_seed: Linear-time MMD estimate using random Fourier features, with an explicit seed.
mmd_permutation_test: Permutation test for MMD significance.
mmd_unbiased: Unbiased MMD² estimate (u-statistic).
nystrom_approximation: Nyström approximation for kernel matrix.
polynomial: Polynomial kernel: k(x, y) = (γ⟨x,y⟩ + c)^d
positive_random_features: Positive random features for RBF kernel.
quartic: Quartic (Biweight) kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)²
rbf: Radial Basis Function (Gaussian) kernel: k(x, y) = exp(-||x-y||² / (2σ²))
rbf_kernel_matrix_ndarray: RBF Gram matrix for an ndarray matrix of points.
rbf_multiscale: Multi-scale kernel: average over several bandwidths.
retrieve_memory: Retrieve memory using energy descent.
triangle: Triangle kernel: k(x, y) = max(0, 1 - ||x-y|| / σ)
tricube: Tricube kernel: k(x, y) = max(0, 1 - (||x-y||/σ)³)³
triweight: Triweight kernel: k(x, y) = max(0, 1 - ||x-y||² / σ²)³

Type Aliases§

Result

Crate rkhs

Crate rkhs Copy item path

§rkhs

§Why “RKHS”?

§Intuition

§Key Functions

§Quick Start

§Why Kernels Matter for ML

§Associative Memory

§Connections

§SIMD Acceleration

§What Can Go Wrong

§References

Enums§

Functions§

Type Aliases§

Crate rkhs