Expand description
Generalized, recursive, scalar and vector (hyper) dual numbers for the automatic and exact calculation of (partial) derivatives.
§Example
This example defines a generic scalar and a generic vector function that can be called using any (hyper-) dual number and automatically calculates derivatives.
use num_dual::*;
use nalgebra::SVector;
fn foo<D: DualNum<f64>>(x: D) -> D {
x.powi(3)
}
fn bar<D: DualNum<f64>, const N: usize>(x: SVector<D, N>) -> D {
x.dot(&x).sqrt()
}
fn main() {
// Calculate a simple derivative
let (f, df) = first_derivative(foo, 5.0);
assert_eq!(f, 125.0);
assert_eq!(df, 75.0);
// Manually construct the dual number
let x = Dual64::new(5.0, 1.0);
println!("{}", foo(x)); // 125 + 75ε
// Calculate a gradient
let (f, g) = gradient(bar, &SVector::from([4.0, 3.0]));
assert_eq!(f, 5.0);
assert_eq!(g[0], 0.8);
// Calculate a Hessian
let (f, g, h) = hessian(bar, &SVector::from([4.0, 3.0]));
println!("{h}"); // [[0.072, -0.096], [-0.096, 0.128]]
// for x=cos(t) calculate the third derivative of foo w.r.t. t
let (f0, f1, f2, f3) = third_derivative(|t| foo(t.cos()), 1.0);
println!("{f3}"); // 1.5836632930100278
}
§Usage
There are two ways to use the data structures and functions provided in this crate:
- (recommended) Using the provided functions for explicit (
first_derivative
,gradient
, …) and implicit (implicit_derivative
,implicit_derivative_binary
,implicit_derivative_vec
) functions. - (for experienced users) Using the different dual number types (
Dual
,HyperDual
,DualVec
, …) directly.
The following examples and explanations focus on the first way.
§Derivatives of explicit functions
To be able to calculate the derivative of a function, it needs to be generic over the type of dual number used. Most commonly this would look like this:
fn foo<D: DualNum<f64> + Copy>(x: X) -> O {...}
Of course, the function could also use single precision (f32
) or be generic over the precision (F:
DualNumFloat
).
For now, Copy
is not a supertrait of DualNum
to enable the calculation of derivatives with respect
to a dynamic number of variables. However, in practice, using the Copy
trait bound leads to an
implementation that is more similar to one not using AD and there could be severe performance ramifications
when using dynamically allocated dual numbers.
The type X
above is D
for univariate functions, &OVector
for multivariate
functions, and (D, D)
or (&OVector, &OVector)
for partial derivatives. In the simplest case, the output
O
is a scalar D
. However, it is generalized using the Mappable
trait to also include types like
Option<D>
or Result<D, E>
, collections like Vec<D>
or HashMap<K, D>
, or custom structs that
implement the Mappable
trait. Therefore, it is, e.g., possible to calculate the derivative of a fallible
function:
fn foo<D: DualNum<f64> + Copy>(x: D) -> Result<D, E> { todo!() }
fn main() -> Result<(), E> {
let (val, deriv) = first_derivative(foo, 2.0)?;
// ...
Ok(())
}
All dual number types can contain other dual numbers as inner types. Therefore, it is also possible to use the different derivative functions inside of each other.
§Extra arguments
The partial
and partial2
functions are used to pass additional arguments to the function, e.g.:
fn foo<D: DualNum<f64> + Copy>(x: D, args: &(D, D)) -> D { todo!() }
fn main() {
let (val, deriv) = first_derivative(partial(foo, &(3.0, 4.0)), 5.0);
}
All types that implement the DualStruct
trait can be used as additional function arguments. The
only difference between using the partial
and partial2
functions compared to passing the extra
arguments via a closure, is that the type of the extra arguments is automatically adjusted to the correct
dual number type used for the automatic differentiation. Note that the following code would not compile:
fn main() {
let (val, deriv) = first_derivative(|x| foo(x, &(3.0, 4.0)), 5.0);
}
The code created by partial
essentially translates to:
fn main() {
let (val, deriv) = first_derivative(|x| foo(x, &(Dual::from_inner(&3.0), Dual::from_inner(&4.0))), 5.0);
}
§The Gradients
trait
The functions gradient
, hessian
, partial_hessian
and jacobian
are generic over the dimensionality
of the variable vector. However, to use the functions in a generic context requires not using the Copy
trait
bound on the dual number type, because the dynamically sized dual numbers can by construction not implement
Copy
. Also, due to frequent heap allocations, the performance of the automatic differentiation could
suffer significantly for dynamically sized dual numbers compared to statically sized dual numbers. The
Gradients
trait is introduced to overcome these limitations.
fn foo<D: DualNum<f64> + Copy, N: Gradients>(x: OVector<D, N>, n: &D) -> D where DefaultAllocator: Allocator<N> {
x.dot(&x).sqrt() - n
}
fn main() {
let x = vector![1.0, 5.0, 5.0, 7.0];
let (f, grad) = Gradients::gradient(foo, &x, &10.0);
assert_eq!(f, 0.0);
assert_relative_eq!(grad, vector![0.1, 0.5, 0.5, 0.7]);
let x = dvector![1.0, 5.0, 5.0, 7.0];
let (f, grad) = Gradients::gradient(foo, &x, &10.0);
assert_eq!(f, 0.0);
assert_relative_eq!(grad, dvector![0.1, 0.5, 0.5, 0.7]);
}
For dynamically sized input arrays, the Gradients
trait evaluates gradients or higher-order derivatives
by iteratively evaluating scalar derivatives. For functions that do not rely on the Copy
trait bound,
only benchmarking can reveal Whether the increased performance through the avoidance of heap allocations
can overcome the overhead of repeated function evaluations, i.e., if Gradients
outperforms directly
calling gradient
, hessian
, partial_hessian
or jacobian
.
§Derivatives of implicit functions
Implicit differentiation is used to determine the derivative dy/dx
where the output y
is only related
implicitly to the input x
via the equation f(x,y)=0
. Automatic implicit differentiation generalizes the
idea to determining the output y
with full derivative information. Note that the first step in calculating
an implicit derivative is always determining the “real” part (i.e., neglecting all derivatives) of the equation
f(x,y)=0
. The num-dual
library is focused on automatic differentiation and not nonlinear equation
solving. Therefore, this first step needs to be done with your own custom solutions, or Rust crates for
nonlinear equation solving and optimization like, e.g., argmin.
The following example implements a square root for generic dual numbers using implicit differentiation. Of
course, the derivatives of the square root can also be determined explicitly using the chain rule, so the
example serves mostly as illustration. x.re()
provides the “real” part of the dual number which is a f64
and therefore, we can use all the functionalities from the std library (including the square root).
fn implicit_sqrt<D: DualNum<f64> + Copy>(x: D) -> D {
implicit_derivative(|s, x| s * s - x, x.re().sqrt(), &x)
}
fn main() {
// sanity check, not actually calculating any derivative
assert_eq!(implicit_sqrt(25.0), 5.0);
let (sq, deriv) = first_derivative(implicit_sqrt, 25.0);
assert_eq!(sq, 5.0);
// The derivative of sqrt(x) is 1/(2*sqrt(x)) which should evaluate to 0.1
assert_eq!(deriv, 0.1);
}
The implicit_sqrt
or any likewise defined function is generic over the dual type D
and can, therefore, be used anywhere as a part of an arbitrary complex computation. The functions
implicit_derivative_binary
and implicit_derivative_vec
can be used for implicit functions
with more than one variable.
For implicit functions that contain complex models and a large number of parameters, the ImplicitDerivative
interface might come in handy. The idea is to define the implicit function using the ImplicitFunction
trait
and feeding it into the ImplicitDerivative
struct, which internally stores the parameters as dual numbers
and their real parts. The ImplicitDerivative
then provides methods for the evaluation of the real part
of the residual (which can be passed to a nonlinear solver) and the implicit derivative which can be called
after solving for the real part of the solution to reconstruct all the derivatives.
struct ImplicitSqrt;
impl ImplicitFunction<f64> for ImplicitSqrt {
type Parameters<D> = D;
type Variable<D> = D;
fn residual<D: DualNum<f64> + Copy>(x: D, square: &D) -> D {
*square - x * x
}
}
fn main() {
let x = Dual::from_re(25.0).derivative();
let func = ImplicitDerivative::new(ImplicitSqrt, x);
assert_eq!(func.residual(5.0), 0.0);
assert_eq!(x.sqrt(), func.implicit_derivative(5.0));
}
§Combination with nonlinear solver libraries
As mentioned previously, this crate does not contain any algorithms for nonlinear optimization or root finding.
However, combining the capabilities of automatic differentiation with nonlinear solving can be very fruitful.
Most importantly, the calculation of Jacobians or Hessians can be completely automated, if the model can be
expressed within the functionalities of the DualNum
trait. On top of that implicit derivatives can be of
interest, if derivatives of the result of the optimization itself are relevant (e.g., in a bilevel
optimization). The synergy is exploited in the ipopt-ad
crate that
turns the NLP solver IPOPT into a black-box optimization algorithm (i.e.,
it only requires a function that returns the values of the optimization variable and constraints), without
any repercussions regarding the robustness or speed of convergence of the solver.
If you are developing nonlinear optimization algorithms in Rust, feel free to reach out to us. We are happy to discuss how to enhance your algorithms with the automatic differentiation capabilities of this crate.
Modules§
- linalg
- Basic linear algebra functionalities (linear solve and eigenvalues) for matrices containing dual numbers.
Macros§
- chain_
rule - first
- forward_
binop - impl_
add_ sub_ rem - impl_
assign_ ops - impl_
derivatives - impl_
dual - impl_
dual_ struct - impl_
first_ derivatives - impl_
float_ const - impl_
from_ f - impl_
from_ primitive - impl_
inv - impl_
iterator - impl_
neg - impl_
num - impl_
scalar_ op - impl_
second_ derivatives - impl_
signed - impl_
third_ derivatives - impl_
zero_ one - impl_
zeroth_ derivatives - second
- third
- zeroth
Structs§
- Derivative
- Wrapper struct for a derivative vector or matrix.
- Dual
- A scalar dual number for the calculations of first derivatives.
- Dual2
- A scalar second order dual number for the calculation of second derivatives.
- Dual3
- A scalar third order dual number for the calculation of third derivatives.
- Dual2
Vec - A vector second order dual number for the calculation of Hessians.
- DualVec
- A vector dual number for the calculations of gradients or Jacobians.
- Hyper
Dual - A scalar hyper-dual number for the calculation of second partial derivatives.
- Hyper
Dual Vec - A vector hyper-dual number for the calculation of partial Hessians.
- Hyper
Hyper Dual - A scalar hyper-hyper-dual number for the calculation of third partial derivatives.
- Implicit
Derivative - Helper struct that stores parameters in dual and real form and provides functions for evaluating real residuals (for external solvers) and implicit derivatives for arbitrary dual numbers.
- Real
- A real number for the calculations of zeroth derivatives in generic contexts.
Traits§
- Bessel
Dual - Implementation of bessel functions for double precision (hyper) dual numbers.
- DualNum
- A generalized (hyper) dual number.
- Dual
NumFloat - The underlying data type of individual derivatives. Usually f32 or f64.
- Dual
Struct - A struct that contains dual numbers. Needed for arbitrary arguments in ImplicitFunction.
- Gradients
- Evaluation of gradients, hessians, and partial (Nx1) hessians that is generic over the dimensionality of the input vector.
- Implicit
Function - An implicit function g(x, args) = 0 for which derivatives of x can be calculated with the ImplicitDerivative struct.
- Mappable
- Trait for structs used as an output of functions for which derivatives are calculated.
Functions§
- first_
derivative - Calculate the first derivative of a scalar function.
- gradient
- Calculate the gradient of a scalar function
- hessian
- Calculate the Hessian of a scalar function.
- implicit_
derivative - Calculate the derivative of the unary implicit function g(x, args) = 0
- implicit_
derivative_ binary - Calculate the derivative of the binary implicit function g(x, y, args) = 0
- implicit_
derivative_ vec - Calculate the derivative of the multivariate implicit function g(x, args) = 0
- jacobian
- Calculate the Jacobian of a vector function.
- partial
- Evaluate the function
g
with extra argumentsargs
that are automatically adjusted to the correct dual number type. - partial2
- Evaluate the function
g
with extra argumentsargs1
andargs2
that are automatically adjusted to the correct dual number type. - partial_
hessian - Calculate second partial derivatives with respect to vectors.
- second_
derivative - Calculate the second derivative of a univariate function.
- second_
partial_ derivative - Calculate second partial derivatives with respect to scalars.
- third_
derivative - Calculate the third derivative of a univariate function.
- third_
partial_ derivative - Calculate third partial derivatives with respect to scalars.
- third_
partial_ derivative_ vec - Calculate the third partial derivative of a scalar function with arbitrary many variables.
- zeroth_
derivative - Calculate the zeroth derivative of a scalar function.
Type Aliases§
- Dual2D
Vec32 - Dual2D
Vec64 - Dual2S
Vec32 - Dual2S
Vec64 - Dual2
Vec32 - Dual2
Vec64 - Dual2_
32 - Dual2_
64 - Dual3_
32 - Dual3_
64 - Dual32
- Dual64
- DualD
Vec32 - DualD
Vec64 - DualS
Vec - DualS
Vec32 - DualS
Vec64 - Dual
Vec32 - Dual
Vec64 - Hyper
Dual32 - Hyper
Dual64 - Hyper
DualD Vec32 - Hyper
DualD Vec64 - Hyper
DualS Vec32 - Hyper
DualS Vec64 - Hyper
Dual Vec32 - Hyper
Dual Vec64 - Hyper
Hyper Dual32 - Hyper
Hyper Dual64