Crate num_dual

Expand description

Generalized, recursive, scalar and vector (hyper) dual numbers for the automatic and exact calculation of (partial) derivatives.

§Example

This example defines a generic scalar and a generic vector function that can be called using any (hyper-) dual number and automatically calculates derivatives.

use num_dual::*;
use nalgebra::SVector;

fn foo<D: DualNum<f64>>(x: D) -> D {
    x.powi(3)
}

fn bar<D: DualNum<f64>, const N: usize>(x: SVector<D, N>) -> D {
    x.dot(&x).sqrt()
}

fn main() {
    // Calculate a simple derivative
    let (f, df) = first_derivative(foo, 5.0);
    assert_eq!(f, 125.0);
    assert_eq!(df, 75.0);

    // Manually construct the dual number
    let x = Dual64::new(5.0, 1.0);
    println!("{}", foo(x));                     // 125 + 75ε

    // Calculate a gradient
    let (f, g) = gradient(bar, &SVector::from([4.0, 3.0]));
    assert_eq!(f, 5.0);
    assert_eq!(g[0], 0.8);

    // Calculate a Hessian
    let (f, g, h) = hessian(bar, &SVector::from([4.0, 3.0]));
    println!("{h}");                            // [[0.072, -0.096], [-0.096, 0.128]]

    // for x=cos(t) calculate the third derivative of foo w.r.t. t
    let (f0, f1, f2, f3) = third_derivative(|t| foo(t.cos()), 1.0);
    println!("{f3}");                           // 1.5836632930100278
}

§Usage

There are two ways to use the data structures and functions provided in this crate:

(recommended) Using the provided functions for explicit (first_derivative, gradient, …) and implicit (implicit_derivative, implicit_derivative_binary, implicit_derivative_vec) functions.
(for experienced users) Using the different dual number types (Dual, HyperDual, DualVec, …) directly.

The following examples and explanations focus on the first way.

§Derivatives of explicit functions

To be able to calculate the derivative of a function, it needs to be generic over the type of dual number used. Most commonly this would look like this:

fn foo<D: DualNum<f64> + Copy>(x: X) -> O {...}

Of course, the function could also use single precision (f32) or be generic over the precision (F: DualNumFloat). For now, Copy is not a supertrait of DualNum to enable the calculation of derivatives with respect to a dynamic number of variables. However, in practice, using the Copy trait bound leads to an implementation that is more similar to one not using AD and there could be severe performance ramifications when using dynamically allocated dual numbers.

The type X above is D for univariate functions, &OVector for multivariate functions, and (D, D) or (&OVector, &OVector) for partial derivatives. In the simplest case, the output O is a scalar D. However, it is generalized using the Mappable trait to also include types like Option<D> or Result<D, E>, collections like Vec<D> or HashMap<K, D>, or custom structs that implement the Mappable trait. Therefore, it is, e.g., possible to calculate the derivative of a fallible function:

fn foo<D: DualNum<f64> + Copy>(x: D) -> Result<D, E> { todo!() }

fn main() -> Result<(), E> {
    let (val, deriv) = first_derivative(foo, 2.0)?;
    // ...
    Ok(())
}

All dual number types can contain other dual numbers as inner types. Therefore, it is also possible to use the different derivative functions inside of each other.

§Extra arguments

The partial and partial2 functions are used to pass additional arguments to the function, e.g.:

fn foo<D: DualNum<f64> + Copy>(x: D, args: &(D, D)) -> D { todo!() }

fn main() {
    let (val, deriv) = first_derivative(partial(foo, &(3.0, 4.0)), 5.0);
}

All types that implement the DualStruct trait can be used as additional function arguments. The only difference between using the partial and partial2 functions compared to passing the extra arguments via a closure, is that the type of the extra arguments is automatically adjusted to the correct dual number type used for the automatic differentiation. Note that the following code would not compile:

fn main() {
    let (val, deriv) = first_derivative(|x| foo(x, &(3.0, 4.0)), 5.0);
}

The code created by partial essentially translates to:

fn main() {
    let (val, deriv) = first_derivative(|x| foo(x, &(Dual::from_inner(&3.0), Dual::from_inner(&4.0))), 5.0);
}

§The `Gradients` trait

The functions gradient, hessian, partial_hessian and jacobian are generic over the dimensionality of the variable vector. However, to use the functions in a generic context requires not using the Copy trait bound on the dual number type, because the dynamically sized dual numbers can by construction not implement Copy. Also, due to frequent heap allocations, the performance of the automatic differentiation could suffer significantly for dynamically sized dual numbers compared to statically sized dual numbers. The Gradients trait is introduced to overcome these limitations.

fn foo<D: DualNum<f64> + Copy, N: Gradients>(x: OVector<D, N>, n: &D) -> D where DefaultAllocator: Allocator<N> {
    x.dot(&x).sqrt() - n
}

fn main() {
    let x = vector![1.0, 5.0, 5.0, 7.0];
    let (f, grad) = Gradients::gradient(foo, &x, &10.0);
    assert_eq!(f, 0.0);
    assert_relative_eq!(grad, vector![0.1, 0.5, 0.5, 0.7]);

    let x = dvector![1.0, 5.0, 5.0, 7.0];
    let (f, grad) = Gradients::gradient(foo, &x, &10.0);
    assert_eq!(f, 0.0);
    assert_relative_eq!(grad, dvector![0.1, 0.5, 0.5, 0.7]);
}

For dynamically sized input arrays, the Gradients trait evaluates gradients or higher-order derivatives by iteratively evaluating scalar derivatives. For functions that do not rely on the Copy trait bound, only benchmarking can reveal Whether the increased performance through the avoidance of heap allocations can overcome the overhead of repeated function evaluations, i.e., if Gradients outperforms directly calling gradient, hessian, partial_hessian or jacobian.

§Derivatives of implicit functions

Implicit differentiation is used to determine the derivative dy/dx where the output y is only related implicitly to the input x via the equation f(x,y)=0. Automatic implicit differentiation generalizes the idea to determining the output y with full derivative information. Note that the first step in calculating an implicit derivative is always determining the “real” part (i.e., neglecting all derivatives) of the equation f(x,y)=0. The num-dual library is focused on automatic differentiation and not nonlinear equation solving. Therefore, this first step needs to be done with your own custom solutions, or Rust crates for nonlinear equation solving and optimization like, e.g., argmin.

The following example implements a square root for generic dual numbers using implicit differentiation. Of course, the derivatives of the square root can also be determined explicitly using the chain rule, so the example serves mostly as illustration. x.re() provides the “real” part of the dual number which is a f64 and therefore, we can use all the functionalities from the std library (including the square root).

fn implicit_sqrt<D: DualNum<f64> + Copy>(x: D) -> D {
    implicit_derivative(|s, x| s * s - x, x.re().sqrt(), &x)
}

fn main() {
    // sanity check, not actually calculating any derivative
    assert_eq!(implicit_sqrt(25.0), 5.0);
     
    let (sq, deriv) = first_derivative(implicit_sqrt, 25.0);
    assert_eq!(sq, 5.0);
    // The derivative of sqrt(x) is 1/(2*sqrt(x)) which should evaluate to 0.1
    assert_eq!(deriv, 0.1);
}

The implicit_sqrt or any likewise defined function is generic over the dual type D and can, therefore, be used anywhere as a part of an arbitrary complex computation. The functions implicit_derivative_binary and implicit_derivative_vec can be used for implicit functions with more than one variable.

For implicit functions that contain complex models and a large number of parameters, the ImplicitDerivative interface might come in handy. The idea is to define the implicit function using the ImplicitFunction trait and feeding it into the ImplicitDerivative struct, which internally stores the parameters as dual numbers and their real parts. The ImplicitDerivative then provides methods for the evaluation of the real part of the residual (which can be passed to a nonlinear solver) and the implicit derivative which can be called after solving for the real part of the solution to reconstruct all the derivatives.

struct ImplicitSqrt;
impl ImplicitFunction<f64> for ImplicitSqrt {
    type Parameters<D> = D;
    type Variable<D> = D;
    fn residual<D: DualNum<f64> + Copy>(x: D, square: &D) -> D {
        *square - x * x
    }
}

fn main() {
    let x = Dual::from_re(25.0).derivative();
    let func = ImplicitDerivative::new(ImplicitSqrt, x);
    assert_eq!(func.residual(5.0), 0.0);
    assert_eq!(x.sqrt(), func.implicit_derivative(5.0));
}

§Combination with nonlinear solver libraries

As mentioned previously, this crate does not contain any algorithms for nonlinear optimization or root finding. However, combining the capabilities of automatic differentiation with nonlinear solving can be very fruitful. Most importantly, the calculation of Jacobians or Hessians can be completely automated, if the model can be expressed within the functionalities of the DualNum trait. On top of that implicit derivatives can be of interest, if derivatives of the result of the optimization itself are relevant (e.g., in a bilevel optimization). The synergy is exploited in the ipopt-ad crate that turns the NLP solver IPOPT into a black-box optimization algorithm (i.e., it only requires a function that returns the values of the optimization variable and constraints), without any repercussions regarding the robustness or speed of convergence of the solver.

If you are developing nonlinear optimization algorithms in Rust, feel free to reach out to us. We are happy to discuss how to enhance your algorithms with the automatic differentiation capabilities of this crate.

Modules§

linalg: Basic linear algebra functionalities (linear solve and eigenvalues) for matrices containing dual numbers.

Macros§

Structs§

Derivative: Wrapper struct for a derivative vector or matrix.
Dual: A scalar dual number for the calculations of first derivatives.
Dual2: A scalar second order dual number for the calculation of second derivatives.
Dual3: A scalar third order dual number for the calculation of third derivatives.
Dual2Vec: A vector second order dual number for the calculation of Hessians.
DualVec: A vector dual number for the calculations of gradients or Jacobians.
HyperDual: A scalar hyper-dual number for the calculation of second partial derivatives.
HyperDualVec: A vector hyper-dual number for the calculation of partial Hessians.
HyperHyperDual: A scalar hyper-hyper-dual number for the calculation of third partial derivatives.
ImplicitDerivative: Helper struct that stores parameters in dual and real form and provides functions for evaluating real residuals (for external solvers) and implicit derivatives for arbitrary dual numbers.
Real: A real number for the calculations of zeroth derivatives in generic contexts.

Traits§

BesselDual: Implementation of bessel functions for double precision (hyper) dual numbers.
DualNum: A generalized (hyper) dual number.
DualNumFloat: The underlying data type of individual derivatives. Usually f32 or f64.
DualStruct: A struct that contains dual numbers. Needed for arbitrary arguments in ImplicitFunction.
Gradients: Evaluation of gradients, hessians, and partial (Nx1) hessians that is generic over the dimensionality of the input vector.
ImplicitFunction: An implicit function g(x, args) = 0 for which derivatives of x can be calculated with the ImplicitDerivative struct.
Mappable: Trait for structs used as an output of functions for which derivatives are calculated.

Functions§

first_derivative: Calculate the first derivative of a scalar function.
gradient: Calculate the gradient of a scalar function
hessian: Calculate the Hessian of a scalar function.
implicit_derivative: Calculate the derivative of the unary implicit function g(x, args) = 0
implicit_derivative_binary: Calculate the derivative of the binary implicit function g(x, y, args) = 0
implicit_derivative_vec: Calculate the derivative of the multivariate implicit function g(x, args) = 0
jacobian: Calculate the Jacobian of a vector function.
partial: Evaluate the function g with extra arguments args that are automatically adjusted to the correct dual number type.
partial2: Evaluate the function g with extra arguments args1 and args2 that are automatically adjusted to the correct dual number type.
partial_hessian: Calculate second partial derivatives with respect to vectors.
second_derivative: Calculate the second derivative of a univariate function.
second_partial_derivative: Calculate second partial derivatives with respect to scalars.
third_derivative: Calculate the third derivative of a univariate function.
third_partial_derivative: Calculate third partial derivatives with respect to scalars.
third_partial_derivative_vec: Calculate the third partial derivative of a scalar function with arbitrary many variables.
zeroth_derivative: Calculate the zeroth derivative of a scalar function.

Crate num_dual

Crate num_dual Copy item path

§Example

§Usage

§Derivatives of explicit functions

§Extra arguments

§The Gradients trait

§Derivatives of implicit functions

§Combination with nonlinear solver libraries

Modules§

Macros§

Structs§

Traits§

Functions§

Type Aliases§

Crate num_dual

§The `Gradients` trait