Expand description
§Introduction
varpro
is a crate for least squares fitting nonlinear models to observations.
It works for so-called separable models, which is a large class of models.
It’s fast, flexible, well-tested, and it’s easy to use.
§Multiple Right Hand Sides
Since version 0.8.0, support for global fitting with multiple right hand sides has been added to this library. This is a powerful technique for suitable problems and is explained at the end of this introduction.
§Overview
Many nonlinear models consist of a mixture of both truly nonlinear and linear model
parameters. These are called separable models and can be written as a linear combination
of $N_{basis}$
nonlinear model basis functions. Think e.g. sums of exponential
decays, which make for a notoriously ill-conditioned fitting problem.
The purpose of this crate is to provide least squares fitting of nonlinear separable models to observations. We also aim to provide excellent usability because all too often it hard to even formulate a fitting problem in code. Thus, this libary provides an API that allows to formulate a separable model in a few lines of code using the SeparableModelBuilder. This is typically already many times faster than using nonlinear least squares solvers directly. That’s because this crates takes advantage of the Variable Projection (VarPro) algorithm for fitting, which makes use of the separable structure of the model.
To shave off the last couple hundreds of microseconds, you can manually implement the SeparableNonlinearModel trait directly to describe your model function.
The test and benchmark suite of this crate should give you a good idea of how fast the fitting can be, and how to e.g. take advantage of caching intermediate results.
§The Fitting Problem
Consider a data vector of observations $\vec{y}= (y_1,\dots,y_{N_{data}})^T$
.
Our goal is to fit a nonlinear, separable model $f$
to the data.
Because the model is separable, we can write it as
\vec{f}(\vec{\alpha},\vec{c}) = \sum_{j=1}^{N_{basis}} c_j \vec{f}_j(S_j(\vec{\alpha}))
Lets look at the components of this equation in more detail. The vector valued function
$\vec{f}(\vec{\alpha},\vec{c}) \in \mathbb{C}^{N_{data}}$
is the model we want to fit. It depends on
two vector valued parameters:
$\vec{\alpha}=(\alpha_1,\dots,\alpha_{N_{params}})^T$
is the vector of nonlinear model parameters. We will get back to these later.$\vec{c}=(c_1,\dots,c_{N_{basis}})^T$
is the vector of coefficients for the basis functions. Those are the linear model parameters.
Note that we call $\vec{\alpha}$
the nonlinear parameters and $\vec{c}$
the coefficients of the model
just to make the distinction between the two types of parameters clear. The coefficients are
the linear parameters of the model, while $\vec{\alpha}$
are the nonlinear parameters.
Of course the model itself is parametrized on both $\vec{\alpha}$
and $\vec{c}$
.
Let’s look at the individual basis functions $\vec{f}_j$
in more detail and also
why the heck we made the basis functions depend this weird $S_j(\vec{\alpha})$
.
§Model Parameters and Basis Functions
The model itself is given as a linear combination of nonlinear basis functions $\vec{f}_j$
with
linear coefficients $c_j$
. The basis functions themselves only depend on
a subset $S_j(\vec{\alpha})$
of the nonlinear model parameters $\vec{\alpha}$
.
Each basis function can depend on a different subset, but there is no restriction on which
parameters a function can depend on. Arbitrary functions might share none of, some of,
or all of the nonlinear parameters. It’s also fine for functions to depend on
parameters that are exclusive to them.
§What VarPro Computes
This crate finds the parameters $\vec{\alpha}$
and $\vec{c}$
that
fit the model $\vec{f}(\vec{\alpha},\vec{c})$
to the observations $\vec{y}$
using a least squares metric. Formally, varpro finds
\arg\min_{\vec{\alpha},\vec{c}} ||\mathbf{W}(\vec{y}-\vec{f}(\vec{\alpha},\vec{c}))||_2^2,
where $\mathbf{W}$
is a weight matrix. Set it to the identity matrix for
unweighted least squares.
§What Makes VarPro Special
The Variable Projection algorithm takes advantage of the fact that the model is a mix of linear
and nonlinear parts. VarPro uses a clever algorithmic trick
to cast the minimization into a problem that only depends on the nonlinear model parameters
$\vec{\alpha}$
. It then lets a nonlinear optimization backend handle this reduced problem.
Currently, this crate uses the levenberg_marquardt
crate as it’s optimization backend. Other optimization backends are planned.
The VarPro algorithm implemented here follows (O’Leary2013), but uses use the Kaufman approximation to calculate the Jacobian.
§Usage and Examples
Using this crate is a step-by-step process:
- Describe the separable model.
- Describe the fitting problem.
- Fit the problem with a model using a solver.
The first step in using this crate is to formulate the separable model. This is done by either creating a type that implements the SeparableNonlinearModel trait or by using the SeparableModelBuilder to create a model in a few lines of code. See the examples for the trait and the builder how to use either to generate a separable nonlinear model.
The builder is great for getting started quickly and is already much faster than using a nonlinear least squares solver directly. For maximum performance, you can look into implementing the trait manually.
let model = /* generate model here */
The second step is to describe the fitting problem by creating a
SeparableProblem
using the
SeparableProblemBuilder
.
To build the fitting problem, we need to provide the model, and the observations.
The initial guess for the nonlinear parameters $\vec{\alpha}$
is a property
of the model. We can additionally configure some other properties to influence the fit.
use varpro::problem::*;
let problem = SeparableProblemBuilder::new(model)
.observations(y)
.build()
.unwrap();
The third step is to use a solver to fit the model to the problem.
Currently, the only the available solver is the
Levenberg-Marquardt algorithm
using the levenberg_marquardt crate.
It is provided via the LevMarSolver
type, which allows us to make additional configurations to the solver.
use varpro::solvers::levmar::LevMarSolver;
let fit_result = LevMarSolver::default().fit(problem).unwrap();
If successful, retrieve the nonlinear parameters $\alpha$
using the
FitResult::nonlinear_params
and the linear
coefficients $\vec{c}$
using FitResult::linear_coefficients
Fit Statistics: To get additional statistical information after the fit has finished, use the LevMarSolver::fit_with_statistics method.
let alpha = fit_result.nonlinear_parameters();
let coeff = fit_result.linear_coefficients().unwrap();
If the minimization was successful, the nonlinear parameters $\vec{\alpha}$
are now stored in the variable alpha
and the linear coefficients $\vec{c}$
are stored in coeff
.
§Example: Double Exponential Fitting
§Preliminaries
The following example demonstrates how to fit a double exponential decay model with constant offset
to observations $\vec{y}$
obtained at grid points $\vec{x}$
. The model itself is a vector valued
function with the same number of elements as $\vec{x}$
and $\vec{y}$
. The component at index
k
is given by
(\vec{f}(\vec{x},\vec{\alpha},\vec{c}))_k= c_1 \exp\left(-x_k/\tau_1\right)+c_2 \exp\left(-x_k/\tau_2\right)+c_3,
which is just a fancy way of saying that the exponential functions are applied element-wise to the vector $\vec{x}$
.
We can see that the model depends on the nonlinear parameters $\vec{\alpha}=(\tau_1,\tau_2)^T$
and the linear coefficients $\vec{c}=(c_1,c_2,c_3)^T$
. Both exponential functions can be modeled
as an exponential decay with signature $\vec{f}_j(\vec{x},\tau)$
. In varpro, the basis functions
must take $\vec{x}$
by reference and a number of parameters as further scalar arguments. So
the decay is implemented as:
use nalgebra::DVector;
fn exp_decay(x :&DVector<f64>, tau : f64) -> DVector<f64> {
x.map(|x|(-x/tau).exp())
}
For our separable model we also need the partial derivatives of the basis functions with
respect to all the parameters that each basis function depends on. Thus, in
this case we have to provide $\partial/\partial\tau\,\vec{f}_j(\vec{x},\tau)$
.
use nalgebra::DVector;
fn exp_decay_dtau(tvec: &DVector<f64>,tau: f64) -> DVector<f64> {
tvec.map(|t| (-t / tau).exp() * t / tau.powi(2))
}
We’ll see in the example how the function method and the partial_deriv methods let us add the function and the derivative as base functions.
There is a second type of basis function, which corresponds to coefficient $c_3$
. This is a constant
function returning a vector of ones. It does not depend on any parameters, which is why there
is a separate way of adding these types of invariant functions to the model. For that, use
invariant_function
of the SeparableModelBuilder.
§Example Code
Using the functions above our example code of fitting a linear model to a vector of data y
obtained
at grid points x
looks like this:
use nalgebra::DVector;
use varpro::prelude::*;
use varpro::solvers::levmar::LevMarSolver;
use varpro::problem::*;
// 1. create the model by giving only the nonlinear parameter names it depends on
let model = SeparableModelBuilder::<f64>::new(&["tau1", "tau2"])
// add the first exponential decay and its partial derivative to the model
// give all parameter names that the function depends on
// and subsequently provide the partial derivative for each parameter
.function(&["tau1"], exp_decay)
.partial_deriv("tau1", exp_decay_dtau)
// add the second exponential decay and its partial derivative to the model
.function(&["tau2"], exp_decay)
.partial_deriv("tau2", exp_decay_dtau)
// add the constant as a vector of ones as an invariant function
.invariant_function(|x|DVector::from_element(x.len(),1.))
// the independent variable (x-axis) is the same for all basis functions
.independent_variable(x)
// the initial guess for the nonlinear parameters is tau1=1, tau2=5
.initial_parameters(vec![1.,5.])
// build the model
.build()
.unwrap();
// 2. Cast the fitting problem as a nonlinear least squares minimization problem
let problem = SeparableProblemBuilder::new(model)
.observations(y)
.build()
.unwrap();
// 3. Solve using the fitting problem
let fit_result = LevMarSolver::default()
.fit(problem)
.expect("fit must succeed");
// the nonlinear parameters after fitting
// they are in the same order as the parameter names given to the model
let alpha = fit_result.nonlinear_parameters();
// the linear coefficients after fitting
// they are in the same order as the basis functions that were added to the model
let c = fit_result.linear_coefficients().unwrap();
§Example 2: Mixed exponential and trigonometric model
This example is taken from the matlab code that is published as part of the O’Leary 2013 paper and fits a mixed exponential and trigonometric model to some noisy and weighted data.
In keeping with the element-wise notation above, we can write the model as
(\vec{f}(\vec{x},\vec{\alpha},\vec{c}))_k = c_1 \exp(-\alpha_2 x_k)\cdot\cos(\alpha_3 x_k)
+ c_2 \exp(-\alpha_1 x_k)\cdot\cos(\alpha_2 x_k).
The code to fit this model to some data is given below. Note also that weights are given for the data points.
use nalgebra::DVector;
use varpro::prelude::*;
use varpro::solvers::levmar::LevMarSolver;
use varpro::problem::*;
// build the model
fn phi1(t: &DVector<f64>, alpha2: f64, alpha3: f64) -> DVector<f64> {
t.map(|t| f64::exp(-alpha2 * t) * f64::cos(alpha3 * t))
}
fn phi2(t: &DVector<f64>, alpha1: f64, alpha2: f64) -> DVector<f64> {
t.map(|t| f64::exp(-alpha1 * t) * f64::cos(alpha2 * t))
}
// the data, weight and initial guesses for our fitting problem
let t = DVector::from_vec(vec![
0., 0.1, 0.22, 0.31, 0.46, 0.50, 0.63, 0.78, 0.85, 0.97,
]);
let y = DVector::from_vec(vec![
6.9842, 5.1851, 2.8907, 1.4199, -0.2473, -0.5243, -1.0156, -1.0260, -0.9165, -0.6805,
]);
let w = DVector::from_vec(vec![1.0, 1.0, 1.0, 0.5, 0.5, 1.0, 0.5, 1.0, 0.5, 0.5]);
let initial_guess = vec![0.5, 2., 3.];
let model = SeparableModelBuilder::new(["alpha1", "alpha2", "alpha3"])
.initial_parameters(initial_guess)
.independent_variable(t)
// phi 1
.function(["alpha2", "alpha3"], phi1)
.partial_deriv("alpha2", |t: &DVector<f64>, alpha2: f64, alpha3: f64| {
phi1(t, alpha2, alpha3).component_mul(&(-1. * t))
})
.partial_deriv("alpha3", |t: &DVector<f64>, alpha2: f64, alpha3: f64| {
t.map(|t| -t * (-alpha2 * t).exp() * (alpha3 * t).sin())
})
.function(["alpha1", "alpha2"], phi2)
.partial_deriv("alpha1", |t: &DVector<f64>, alpha1: f64, alpha2: f64| {
phi2(t, alpha1, alpha2).component_mul(&(-1. * t))
})
.partial_deriv("alpha2", |t: &DVector<f64>, alpha1: f64, alpha2: f64| {
t.map(|t| -t * (-alpha1 * t).exp() * (alpha2 * t).sin())
})
.build()
.unwrap();
// describe the fitting problem
let problem = SeparableProblemBuilder::new(model)
.observations(y)
.weights(w)
.build()
.unwrap();
// fit the data
let fit_result = LevMarSolver::default()
.fit(problem)
.expect("fit must succeed");
// the nonlinear parameters
let alpha = fit_result.nonlinear_parameters();
// the linear coefficients
let c = fit_result.linear_coefficients().unwrap();
§Global Fitting with Multiple Right Hand Sides
Instead of fitting a single data vector (i.e. a single right hand side), this library can also solve a related, but slightly different problem. This is the problem of global fitting for multiple right hand sides. The problem statement is the following:
- We have not only one observation but a set
$\{\vec{y}_s\}$
,$s=1,...,S$
of observations. - We want to fit the separable nonlinear function
$\vec{f}(\vec{\alpha},\vec{c})$
to all vectors of observations, but in such a way that the linear parameters are allowed to vary with$s$
, but the nonlinear parameters are the same for the whole dataset.
This is called global fitting with multiple right hand sides, because the nonlinear parameters are not allowed to change and are optimized for the complete dataset, whereas the linear parameters are allowed to vary with each vector of observations. This is an application where varpro really shines. Note that it is not the same as fitting the data vectors independently. For a more in depth explanation see my article here.
To take advantage of global fitting we don’t need to change anything about the model, we just have to make a slight modification to the way we build a problem. The crucial differences to the single right hand side case are:
- We have to use the
SeparableProblemBuilder::mrhs
constructor rather thannew
. - We have to sort the right hand sides into a matrix, where each right hand side, which is a column-vector on its own, will become a column of the resulting matrix.
For a set of observations $\vec{y}_1,\dots,\vec{y}_S$
(column vectors) we
now have to pass a matrix $Y$
of observations, rather than a single
vector to the builder. As explained above, the resulting matrix would look
like this.
\boldsymbol{Y}=\left(\begin{matrix}
\vert & & \vert \\
\vec{y}_1 & \dots & \vec{y}_S \\
\vert & & \vert \\
\end{matrix}\right)
The order of the vectors in the matrix doesn’t matter, but it will determine
the order of the linear coefficients, see
SeparableProblemBuilder::mrhs
for a more detailed explanation.
§Example
// use the model as before but now invoke the `mrhs`
// constructor for the fitting problem
let problem = SeparableProblemBuilder::mrhs(model)
// the observations is a matrix where each column vector represents
// a single observation
.observations(Y)
.weights(w)
.build()
.unwrap();
// fit the data
let fit_result = LevMarSolver::default()
.fit(problem)
.expect("fit must succeed");
// the nonlinear parameters
// these parameters are fitted globally, meaning they
// are the same for each observation. Hence alpha is a single
// vector of parameters.
let alpha = fit_result.nonlinear_parameters();
// the linear coefficients
// those are a matrix for global fitting with multiple right hand sides.
// Each colum corresponds to the linear coefficients for the same column
// in the matrix of observations Y.
let C = fit_result.linear_coefficients().unwrap();
The main difference to fitting problems with a single right hand side is that the observations now become a matrix. Each column of this matrix is an observation. Since the linear coefficients are allowed to vary, they now also become a matrix instead of a single vector. Each column corresponds to the best fit linear coefficients of the observations in the same matrix column.
§References and Further Reading
(O’Leary2013) O’Leary, D.P., Rust, B.W. Variable projection for nonlinear least squares problems. Comput Optim Appl 54, 579–593 (2013). DOI: 10.1007/s10589-012-9492-9
attention: the O’Leary paper contains errors that are fixed in this blog article of mine.
(Golub2003) Golub, G. , Pereyra, V Separable nonlinear least squares: the variable projection method and its applications. Inverse Problems 19 R1 (2003) https://iopscience.iop.org/article/10.1088/0266-5611/19/2/201
Modules§
- basis_
function - helper implementation to make working with basis functions more seamless
- fit
- result of a fit allowing us to query the best parameters and coefficients
- model
- defining separable models
- prelude
- commonly useful imports
- problem
- defining the separable fitting problem
- solvers
- solvers for the nonlinear minimization problem
- statistics
- statistics for the fit result
- util
- helper module containing some commonly useful types implemented in the nalgebra crate