pub struct KernelPcaProjection { /* private fields */ }Expand description
Corpus-fitted projection via kernel PCA with a Gaussian (RBF) kernel.
§Mathematical background
Standard PCA finds the 3 directions of maximum linear variance. Kernel PCA first maps data into an infinite-dimensional feature space F via the kernel trick, then performs PCA there. With the Gaussian kernel k(x, y) = exp(−‖x−y‖²/(2σ²)), every data point Φ(x) lies on a hypersphere S in F (since k(x,x) = 1 for all x). This is a natural fit for SphereQL’s spherical geometry.
The key advantage over linear PCA: kernel PCA captures non-linear manifold structure (curved clusters, rings, spirals) that linear PCA crushes flat. For embedding spaces with complex semantic geometry, this preserves more meaningful neighborhood relationships.
§Limit behaviour
- σ → ∞: kernel PCA converges to standard PCA (Hoffmann, Appendix A).
- σ → 0: all points become orthogonal in F; PCA is meaningless.
§Complexity
- Fitting: O(n²·d) to build the kernel matrix + O(n²·q·iters) for power iteration on the n×n centered kernel matrix.
- Projection: O(n·d) per embedding (n kernel evaluations).
- Memory: O(n·d) for training data + O(n) per eigenvector.
§References
- Schölkopf, Smola, Müller. “Nonlinear component analysis as a kernel eigenvalue problem.” Neural Computation 10 (1998) 1299–1319.
- Hoffmann. “Kernel PCA for novelty detection.” Pattern Recognition 40 (2007) 863–874.
Implementations§
Source§impl KernelPcaProjection
impl KernelPcaProjection
Sourcepub fn fit(embeddings: &[Embedding], radial: RadialStrategy) -> Self
pub fn fit(embeddings: &[Embedding], radial: RadialStrategy) -> Self
Fit kernel PCA with automatic σ selection.
σ is set to the median pairwise Euclidean distance on the normalised embeddings divided by √2, so that the kernel value at the median distance is exp(−1) ≈ 0.37. This is a standard heuristic in the kernel methods literature.
Sourcepub fn fit_with_sigma(
embeddings: &[Embedding],
sigma: f64,
radial: RadialStrategy,
) -> Self
pub fn fit_with_sigma( embeddings: &[Embedding], sigma: f64, radial: RadialStrategy, ) -> Self
Fit kernel PCA with an explicit kernel width σ.
Use this when you have domain knowledge about the appropriate scale, or when benchmarking different σ values.
Sourcepub fn fit_default(embeddings: &[Embedding]) -> Self
pub fn fit_default(embeddings: &[Embedding]) -> Self
Convenience: fit with default radial strategy and auto σ.
Sourcepub fn with_volumetric(self, enabled: bool) -> Self
pub fn with_volumetric(self, enabled: bool) -> Self
Enable volumetric mode: r comes from the kernel PCA projection magnitude instead of the embedding magnitude.
Sourcepub fn num_training_points(&self) -> usize
pub fn num_training_points(&self) -> usize
Number of training points stored (needed for kernel evaluations).
Sourcepub fn explained_variance_ratio(&self) -> f64
pub fn explained_variance_ratio(&self) -> f64
The fraction of total feature-space variance captured by the top-3 kernel principal components.
Analogous to PcaProjection::explained_variance_ratio() but in the
(infinite-dimensional) Gaussian feature space.
Sourcepub fn eigenvalues(&self) -> [f64; 3]
pub fn eigenvalues(&self) -> [f64; 3]
The top-3 eigenvalues of the centred kernel matrix.
Trait Implementations§
Source§impl Clone for KernelPcaProjection
impl Clone for KernelPcaProjection
Source§fn clone(&self) -> KernelPcaProjection
fn clone(&self) -> KernelPcaProjection
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more