Gaussian Process Regression with Compactly Supported Radial Kernel

csrk is a Rust crate for large-scale Gaussian Process regression using compactly supported Wendland kernels, spatial hashing, and sparse LDL^T factorization. It enables training on tens of thousands of points and fast sampling and evaluation of GP realizations with near-constant per-query cost.

It is a CPU-based code implementing the Wendland kernels (piecewise polynomial kernels with compact support) using the sprs crates for sparse matrix operations (sprs) and sparse Cholesky decomposition (sprs-ldl).

Example

// Build the Gaussian Process
let mut gp = csrk::GP::build(
    &training_locations,    // &Vec<Vec<f64>> (N x dim)
    &training_values,       // &Vec<f64> (N)
    &training_error,        // &Vec<f64> (N)
    whitenoise,             // f64
    order,                  // i32 (0, 1, 2, or 3)
    &scale                  // &Vec<f64> (dim)
);
// Train the Gaussian Process
gp.train();
// Estimate mean (expected value) for some sample locations
let mean = gp.predict_mean(
    &sample_locations,      // &[f64] (M x dim)
);
// Get a random number generator
let mut rng = csrk::PCG64Stream::from_OsRng();
// Draw a realization
let r = gp.draw_realization(&mut rng);
// Estimate the value of the realization
let v = r.value(&sample_locations[0]);

Features

compactly supported Wendland kernels
sparse kernel construction via spatial hashing
scalable sparse LDL^T training
fast reproducible GP realizations
pure Rust, no unsafe

Modules

csrk: The library module for the compactly supported radial kernel Gaussian Process

csrk::prng A PCG64Stream implementation for generating random samples

csrk::spatial_hash A SpatialHash struct for sparse kernel operations

csrk::realization A struct for a realization from the Gaussian Process

Performance

Performance table from integration test ./csrk/tests/test_performance.rs

(nx,  ny),      ntrain, nnz,            nnz/ntrain,     build_s, train_s, mean_s, draw_s, eval_s, evals_per_sec
(30,  30),      900,    56260,          62.5111,        0.0023, 0.0070, 0.0014, 0.0005, 0.0057  7.909e5
(50,  50),      2500,   166812,         66.7248,        0.0041, 0.0284, 0.0024, 0.0012, 0.0115  1.090e6
(75,  75),      5625,   385737,         68.5755,        0.0063, 0.1124, 0.0052, 0.0040, 0.0256  1.098e6
(100, 100),     10000,  691968,         69.1968,        0.0112, 0.3430, 0.0092, 0.0093, 0.0470  1.063e6
(125, 125),     15625,  1089013,        69.6968,        0.0188, 0.7938, 0.0145, 0.0185, 0.0733  1.066e6
(150, 150),     22500,  1577508,        70.1115,        0.0286, 1.6002, 0.0212, 0.0318, 0.1051  1.070e6
(200, 200),     40000,  2831216,        70.7804,        0.0470, 4.9282, 0.0378, 0.0780, 0.1916  1.044e6

This test demonstrates that when keeping the number of neighbors approximately the same, building and training the Gaussian Process scales well for a two-dimensional grid.

Specifically:

The build phase appears to scale with O(ntrain)
Training the kernel scales better than O(ntrain^2) for sufficiently sparse training data
Evaluating the mean appears to scale with O(ntrain)

Algorithm overview

Training

Training points are indexed using a spatial hash.
Compactly supported Wendland kernels are used to construct a sparse kernel matrix.
The kernel is factorized using a sparse LDL^T decomposition (sprs-ldl).

Prediction:

Mean prediction uses only nearby neighbors via the spatial hash.
No dense kernel vectors or matrices are constructed.

Sampling:

GP realizations are generated using a whitened representation: ( $f = L \sqrt{D} z, z \in N(0, I)$ )
Each realization stores K^{-1} f, enabling fast evaluation at arbitrary points.

Motivation

The Wendland kernels can be used for training and evaluating GPR interpolators in O(n * m), for n evaluation points and m nearest neighbors. This also affects the Cholesky decomposition of the kernel.

When done properly, this process conserves both compute time and computer memory, as large (n x n) arrays are never allocated.

Previously, I had developed a Python library for doing this: (gaussian-process-api). However, despite the use of sparse matrices in the Cholesky decomposition (using scikit-sparse), and despite the C extension backend for the kernel evaluations, this library still constructs dense array intermediate products which may challenge memory resources.

By writing a new module in Rust, I would like to remove the dependency on scikit-sparse and avoid storing large dense matrices in memory at any point throughout the computation, allowing for a lightweight and fast Gaussian Process regression implementation.

Contributing

I am open to suggestions and pull requests.

Acknowledgements

My background in Gaussian Processes, and the Wenland kernels come from Rasmussen and Williams (2005). I thank the authors for publishing openly without charge.

My own work in implementing sparse Gaussian Processes for signal to noise estimation for binary black hole merger detectability with a single LIGO detector is briefly summarized in Appendix A of Delfavero et al. (2023).

I would like to acknowledge the work done by Esmaeilbeigi et al. (2025) for putting the advantages and limitations of the Wenland kernel that I have stumbled through in practical implementations into the vocabulary of higher mathematics.

I would also like to thank Nicolas Posner, who has accompanied my introduction to rust, and whose blog and contributions to other modules encouraged me to learn Rust.

I would also like to thank Nick Fotopoulos for a thorough and constructive code review!

csrk 1.1.5