Expand description
Gradient helpers for DeepKernels.
Two paths are provided:
-
finite_difference_gradient— central differences over the flat parameter buffer. Works for any base kernel and any feature extractor that implementsNeuralFeatureMap;O(2P)forward passes in the number of parametersPand used by the crate’s own correctness tests as a reference. -
rbf_dkl_gradient— the analytical gradient for the RBF-base / MLP-extractor special case. Closed form:∂K_DKL / ∂θ = K_DKL · (-2γ) · Σ_k (g(x) - g(y))_k · ∂g_k(x)/∂θ+ K_DKL · ( 2γ) · Σ_k (g(x) - g(y))_k · ∂g_k(y)/∂θ(the two sums come from
∂/∂θ || g(x) - g(y) ||²). The Jacobians∂g_k(·)/∂θare obtained by standard MLP backprop, reusing the per-layer pre/post-activation cache produced byMLPFeatureExtractor::forward_with_cache.
§Scope (v0.2.0 preview)
- Analytical chain rule is implemented for the
MLPFeatureExtractor+RbfKernelpair only — i.e. the paradigmatic DKL configuration. Other combinations must be gradient-checked via finite differences; autodiff integration is out of scope for this release. - Gradients w.r.t. base-kernel hyperparameters (e.g. the RBF
γ) are not implemented here; the mixture side of the workspace (learned_composition) handles that use case.
Functions§
- finite_
difference_ gradient - Numerical gradient
∂K_DKL/∂θvia central finite differences on the flat parameter buffer. Returns a vector of lengthkernel.feature_extractor().parameter_count(). - rbf_
dkl_ gradient - Analytical gradient of
K_DKL(x, y)w.r.t. the MLP parameters for the RBF-base case. Returns a vector of lengthkernel.feature_extractor().parameter_count()whose entries mirror the flat parameter layoutlayer0.weights(row-major) ++ layer0.biases ++ layer1.weights ++ ....