Friedrich: Gaussian Process Regression
This library implements Gaussian Process Regression, also known as Kriging, in Rust. Our goal is to provide a solid and well-featured building block for other algorithms (such as Bayesian Optimization).
Gaussian processes have both the ability to extract a lot of information from their training data and to return a prediction and an uncertainty value on their prediction. Furthermore, they can handle non-linear phenomena, take uncertainty on the inputs into account, and encode a prior on the output.
All of those properties make it an algorithm of choice to perform regression when data is scarce or when having uncertainty bars on the output is a desirable property.
However, the O(n^3) complexity of the algorithm makes the classic implementations unsuitable for large datasets.
Functionalities
This implementation lets you:
- define a gaussian process with default parameters or using the builder pattern
- train it on multidimensional data
- fit the parameters (kernel, prior and noise) on the training data
- introduce an optional
cholesky_epsilonto make the Cholesky decomposition infallible in case of badly conditioned problems - add additional samples efficiently (
O(n^2)) and refit the process - predict the mean, variance and covariance matrix for given inputs
- sample the distribution at a given position
- save and load a trained model with serde
(See the todo.md file to get up-to-date information on current developments.)
Code sample
use GaussianProcess;
// trains a gaussian process on a dataset of one-dimensional vectors
let training_inputs = vec!;
let training_outputs = vec!;
let gp = default;
// predicts the mean and variance of a single point
let input = vec!;
let mean = gp.predict;
let var = gp.predict_variance;
println!;
// makes several prediction
let inputs = vec!;
let outputs = gp.predict;
println!;
// samples from the distribution
let new_inputs = vec!;
let sampler = gp.sample_at;
let mut rng = thread_rng;
println!;
Inputs
Most methods of this library can currently work with the following input -> output pairs :
Vec<f64> -> f64a single, multidimensional, sampleVec<Vec<f64>> -> Vec<f64>each inner vector is a training sampleDMatrix<f64> -> DVector<f64>using a nalgebra matrix with one row per sampleArrayBase<f64, Ix1> -> f64a single sample stored in a ndarray array (using thefriedrich_ndarrayfeature)ArrayBase<f64, Ix2> -> Array1<f64>each row is a sample (using thefriedrich_ndarrayfeature)
The Input trait is provided to add your own pairs.
Why call it Friedrich?
Gaussian Processes are named after the Gaussian distribution which is itself named after Carl Friedrich Gauss.