Friedrich : Gaussian Process Regression
This libarie implements Gaussian Process Regression, also known as Kriging, in Rust. Our goal is to provide a building block for other algorithms (such as Bayesian Optimization).
Gaussian process have both the ability to extract a lot of information from their training data and to return a prediction and an uncertainty on their prediction. Furthermore, they can handle non-linear phenomenons, take uncertainty on the inputs into account and encode a prior on the output.
All of those properties make them an algorithm of choice to perform regression when data is scarce or when having uncertainty bars on the ouput is a desirable property.
However, the o(n^3)
complexity of the algorithm makes the classical implementation unsuitable for large datasets.
WARNING: This crate is still in an alpha state, its interface might evolve.
Functionalities
This implementation lets you :
- define a gaussian process with default parameters or using the builder pattern
- train it on multidimensional data
- fit the parameters (kernel and prior) on the training data
- add additional samples and refit the process
- predict the mean and variance and covariance matrix for given inputs
- sample the distribution at a given position
Code sample
use GaussianProcess;
// trains a gaussian process on a dataset of one dimension vectors
let training_inputs = vec!;
let training_outputs = vec!;
let gp = default;
// predicts the mean and variance of a single point
let input = vec!;
let mean = gp.predict;
let var = gp.predict_variance;
println!;
// makes several prediction
let inputs = vec!;
let outputs = gp.predict;
println!;
// samples from the distribution
let new_inputs = vec!;
let sampler = gp.sample_at;
let mut rng = thread_rng;
println!;
Inputs
Most methods of this library can currently work with the following input -> ouput
pairs :
Vec<Vec<f64>> -> Vec<f64>
each inner vector is a multidimentional training sampleVec<f64> -> f64
a single multidimensional sampleDMatrix<f64> -> DVector<f64>
using a nalgebra matrix with one row per sample
A trait is provided to add your own pairs.
Potential future developements
The list of things that could be done to improve on the current implementation includes :
- Add better algorithms to fit kernel parameters (cross validation or gradient descent on likelyhood).
- Improve efficiency of the linear algebra operations used.
- Add function to predict both mean and variance (factoring some code for improved performances).
- Add ndarray support behind a feature flag.
- Add simple kernel regression (not as clever but much faster).
Do not hesitate to send pull request or ask for features.