# Module easy_ml::differentiation

source · [−]## Expand description

(Automatic) Differentiation helpers

## Automatic Differentiation

This module provides structs for performing Forward and Reverse Automatic Differentiation

### Automatic Differentiation is not Numerical Differentiation

You were probably introduced to differentiation as numeric differentiation,
ie if you have a function 3x^{2} then you can estimate its gradient
at some value x by computing 3x^{2} and 3(x+h)^{2} where h
is a very small value. The tangent line these two points create gives you an approximation
of the gradient when you calculate (f(x+h) - f(x)) / h. Unfortunately floating
point numbers in computers have limited precision, so this method is only approximate
and can result in floating point errors. 1 + 1 might equal 2 but as you go smaller
10^{-i} + 10^{-i} starts to loook rather like 10^{-i} as i goes
into double digits.

### Automatic Differentiation is not Symbolic Differentiation

If you were taught calculus you have probably done plenty of symbolic differentiation
by hand. A function 3x^{2} can be symbolically differentiated into 6x by applying
simple rules to manipulate the algebra. Unfortunately the rules aren’t so simple for
more complex expressions such as exponents,
logs or
trigonometry.
Symbolic differentiation can give you expressions which are just as or more complicated
than the original, and doing it by hand can be error prone. Symbolic Differentiation is
also tricky to relate to algorithmic computations that use control structures.

### Automatic Differentiation

Automatic Differentiation computes the derivative of a function without rewriting the function as symbolic differentiation does and without the precision issues of numerical differentiation by splitting the derivative into lots of tiny steps of basic operations like addition and multiplication. These are combined using the chain rule. The downside is more memory is used than in symbolic or numerical differentiation, as derivatives have to be tracked through the computational graph.

## Forward Differentiation

Forward Differentiation computes all the gradients in a computational graph with respect
to an input. For example, if you have a function f(x, y) = 5x^{3} - 4x^{2} +
10x - y, then for some actual value of x and y you can compute f(x,y) and δf(x,y)/δx
together in one forward pass using forward differentiation. You can also make another pass
and compute f(x,y) and δf(x,y)/δy for some actual value of x and y. It is possible to avoid
redundantly calculating f(x,y) multiple times, but I am waiting on const generics to implement
this. Regardless, forward differentiation requires making at least N+1 passes of the
function to compute the derivatives of the output with respect to N inputs - and the current
implementation will make 2N. However, you do get the gradients for every output in a
single pass. This is poorly suited to neural nets as they often have a single output(loss)
to differentiate many many inputs with respect to.

## Reverse Mode Differentiation

Reverse Mode Differentiation computes all the gradients in a computational graph for
the same output. For example, if you have a function f(x, y) = 5x^{3} -
4x^{2} + 10x - y, then for some actual value of x and y you can compute f(x,y)
and store all the intermediate results. You can then run a backward pass on the output
of f(x, y) and obtain δf(x,y)/δx and δf(x,y)/δy for the actual values of x and y in a
single pass. The catch is that reverse mode must store as many intermediate values as
there are steps in the function which can use much more memory than forward mode.
Reverse mode also requires making N backward passes to get the gradients for N different
outputs. This is well suited to neural nets because we often have a single output (loss)
to differentiate many inputs with respect to. However, reverse mode will be slower than
forward mode if the number of inputs is small or there are many outputs.

## Usage

Both `Trace`

and `Record`

for forward and reverse automatic differentiation respectively
implement `Numeric`

and can generally be treated as normal numbers just like `f32`

and `f64`

.

`Trace`

is literally implemented as a dual number, and is more or less a one to one
substitution. `Record`

requires dynamically building a computational graph of the values
and dependencies of each operation performed on them. This means performing operations on
records have side effects, they add entries onto a `WengertList`

. However, when using
`Record`

the side effects are abstracted away, just create a `WengertList`

before you
start creating Records.

Given some function from N inputs to M outputs you can pass it `Trace`

s or `Record`

s
and retrieve the first derivative from the outputs for all combinations of N and M.
If N >> M then you should use `Record`

as reverse mode automatic differentiation is
much cheaper. If N << M then you should use `Trace`

as it will be much cheaper. If
you have large N and M, or small N and M, you might have to benchmark to find which
method works best. However, most problems are N > M.

For this example we use a function which takes two inputs, r and a, and returns two outputs, x and y.

### Using Trace

```
use easy_ml::differentiation::Trace;
use crate::easy_ml::numeric::extra::Cos;
use crate::easy_ml::numeric::extra::Sin;
fn cartesian(r: Trace<f32>, angle: Trace<f32>) -> (Trace<f32>, Trace<f32>) {
let x = r * angle.cos();
let y = r * angle.sin();
(x, y)
}
// first find dx/dr and dy/dr
let (x, y) = cartesian(Trace::variable(1.0), Trace::constant(2.0));
let dx_dr = x.derivative;
let dy_dr = y.derivative;
// now find dx/da and dy/da
let (x, y) = cartesian(Trace::constant(1.0), Trace::variable(2.0));
let dx_da = x.derivative;
let dy_da = y.derivative;
```

### Using Record

```
use easy_ml::differentiation::{Record, WengertList};
use crate::easy_ml::numeric::extra::{Cos, Sin};
// the lifetimes tell the rust compiler that our inputs and outputs
// can all live as long as the WengertList
fn cartesian<'a>(r: Record<'a, f32>, angle: Record<'a, f32>)
-> (Record<'a, f32>, Record<'a, f32>) {
let x = r * angle.cos();
let y = r * angle.sin();
(x, y)
}
// first we must construct a WengertList to create records from
let list = WengertList::new();
let r = Record::variable(1.0, &list);
let a = Record::variable(2.0, &list);
let (x, y) = cartesian(r, a);
// first find dx/dr and dx/da
let x_derivatives = x.derivatives();
let dx_dr = x_derivatives[&r];
let dx_da = x_derivatives[&a];
// now find dy/dr and dy/da
let y_derivatives = y.derivatives();
let dy_dr = y_derivatives[&r];
let dy_da = y_derivatives[&a];
```

### Differences

Notice how in the above examples all the same 4 derivatives are found, but in
forward mode we rerun the function with a different input as the sole variable,
the rest as constants, whereas in reverse mode we rerun the `derivatives()`

function
on a different output variable. With Reverse mode we would only pass constants into
the `cartesian`

function if we didn’t want to get their derivatives (and avoid wasting
memory on something we didn’t need).

### Substitution

There is no need to rewrite the input functions, as you can use the `Numeric`

and `Real`

traits to write a function that will take floating point numbers, `Trace`

s and `Record`

s.

```
use easy_ml::differentiation::{Trace, Record, WengertList};
use crate::easy_ml::numeric::Numeric;
use crate::easy_ml::numeric::extra::{Real};
fn cartesian<T: Numeric + Real + Copy>(r: T, angle: T) -> (T, T) {
let x = r * angle.cos();
let y = r * angle.sin();
(x, y)
}
let list = WengertList::new();
let r_record = Record::variable(1.0, &list);
let a_record = Record::variable(2.0, &list);
let (x_record, y_record) = cartesian(r_record, a_record);
// find dx/dr using reverse mode automatic differentiation
let x_derivatives = x_record.derivatives();
let dx_dr_reverse = x_derivatives[&r_record];
let (x_trace, y_trace) = cartesian(Trace::variable(1.0), Trace::constant(2.0));
// now find dx/dr with forward automatic differentiation
let dx_dr_forward = x_trace.derivative;
assert_eq!(dx_dr_reverse, dx_dr_forward);
let (x, y) = cartesian(1.0, 2.0);
assert_eq!(x, x_record.number); assert_eq!(x, x_trace.number);
assert_eq!(y, y_record.number); assert_eq!(y, y_trace.number);
```

### Equivalance

Although in this example the derivatives found are identical, in practise, because forward and reverse mode compute things differently and floating point numbers have limited precision, you should not expect the derivatives to be exactly equal.

## Further information

## Modules

Other operator related implementations for the differentiation module

Operator implementations for Records.

Operator implementations for Traces

## Structs

Computed derivatives of a computational graph for some output Record variable.

A wrapper around a real number which records it going through the computational graph. This is used to perform Reverse Mode Automatic Differentiation.

A dual number which traces a real number and keeps track of its derivative. This is used to perform Forward Automatic Differentiation

A list of operations performed in a forward pass of a dynamic computational graph, used for Reverse Mode Automatic Differentiation.

## Traits

A trait with no methods which is implemented for all primitive types.