Module gradient

Expand description

Gradient helpers for DeepKernels.

Two paths are provided:

finite_difference_gradient — central differences over the flat parameter buffer. Works for any base kernel and any feature extractor that implements NeuralFeatureMap; O(2P) forward passes in the number of parameters P and used by the crate’s own correctness tests as a reference.
rbf_dkl_gradient — the analytical gradient for the RBF-base / MLP-extractor special case. Closed form:

∂K_DKL / ∂θ = K_DKL · (-2γ) · Σ_k (g(x) - g(y))_k · ∂g_k(x)/∂θ + K_DKL · ( 2γ) · Σ_k (g(x) - g(y))_k · ∂g_k(y)/∂θ

(the two sums come from ∂/∂θ || g(x) - g(y) ||²). The Jacobians ∂g_k(·)/∂θ are obtained by standard MLP backprop, reusing the per-layer pre/post-activation cache produced by MLPFeatureExtractor::forward_with_cache.

§Scope (v0.2.0 preview)

Analytical chain rule is implemented for the MLPFeatureExtractor + RbfKernel pair only — i.e. the paradigmatic DKL configuration. Other combinations must be gradient-checked via finite differences; autodiff integration is out of scope for this release.
Gradients w.r.t. base-kernel hyperparameters (e.g. the RBF γ) are not implemented here; the mixture side of the workspace (learned_composition) handles that use case.

finite_difference_gradient: Numerical gradient ∂K_DKL/∂θ via central finite differences on the flat parameter buffer. Returns a vector of length kernel.feature_extractor().parameter_count().
rbf_dkl_gradient: Analytical gradient of K_DKL(x, y) w.r.t. the MLP parameters for the RBF-base case. Returns a vector of length kernel.feature_extractor().parameter_count() whose entries mirror the flat parameter layout layer0.weights(row-major) ++ layer0.biases ++ layer1.weights ++ ....