Module easy_ml::differentiation
source · [−]Expand description
(Automatic) Differentiation helpers
Automatic Differentiation
This module provides structs for performing Forward and Reverse Automatic Differentiation
Automatic Differentiation is not Numerical Differentiation
You were probably introduced to differentiation as numeric differentiation, ie if you have a function 3x2 then you can estimate its gradient at some value x by computing 3x2 and 3(x+h)2 where h is a very small value. The tangent line these two points create gives you an approximation of the gradient when you calculate (f(x+h) - f(x)) / h. Unfortunately floating point numbers in computers have limited precision, so this method is only approximate and can result in floating point errors. 1 + 1 might equal 2 but as you go smaller 10-i + 10-i starts to loook rather like 10-i as i goes into double digits.
Automatic Differentiation is not Symbolic Differentiation
If you were taught calculus you have probably done plenty of symbolic differentiation by hand. A function 3x2 can be symbolically differentiated into 6x by applying simple rules to manipulate the algebra. Unfortunately the rules aren’t so simple for more complex expressions such as exponents, logs or trigonometry. Symbolic differentiation can give you expressions which are just as or more complicated than the original, and doing it by hand can be error prone. Symbolic Differentiation is also tricky to relate to algorithmic computations that use control structures.
Automatic Differentiation
Automatic Differentiation computes the derivative of a function without rewriting the function as symbolic differentiation does and without the precision issues of numerical differentiation by splitting the derivative into lots of tiny steps of basic operations like addition and multiplication. These are combined using the chain rule. The downside is more memory is used than in symbolic or numerical differentiation, as derivatives have to be tracked through the computational graph.
Forward Differentiation
Forward Differentiation computes all the gradients in a computational graph with respect to an input. For example, if you have a function f(x, y) = 5x3 - 4x2 + 10x - y, then for some actual value of x and y you can compute f(x,y) and δf(x,y)/δx together in one forward pass using forward differentiation. You can also make another pass and compute f(x,y) and δf(x,y)/δy for some actual value of x and y. It is possible to avoid redundantly calculating f(x,y) multiple times, but I am waiting on const generics to implement this. Regardless, forward differentiation requires making at least N+1 passes of the function to compute the derivatives of the output with respect to N inputs - and the current implementation will make 2N. However, you do get the gradients for every output in a single pass. This is poorly suited to neural nets as they often have a single output(loss) to differentiate many many inputs with respect to.
Reverse Mode Differentiation
Reverse Mode Differentiation computes all the gradients in a computational graph for the same output. For example, if you have a function f(x, y) = 5x3 - 4x2 + 10x - y, then for some actual value of x and y you can compute f(x,y) and store all the intermediate results. You can then run a backward pass on the output of f(x, y) and obtain δf(x,y)/δx and δf(x,y)/δy for the actual values of x and y in a single pass. The catch is that reverse mode must store as many intermediate values as there are steps in the function which can use much more memory than forward mode. Reverse mode also requires making N backward passes to get the gradients for N different outputs. This is well suited to neural nets because we often have a single output (loss) to differentiate many inputs with respect to. However, reverse mode will be slower than forward mode if the number of inputs is small or there are many outputs.
Usage
Both Trace
and Record
for forward and reverse automatic differentiation respectively
implement Numeric
and can generally be treated as normal numbers just like f32
and f64
.
Trace
is literally implemented as a dual number, and is more or less a one to one
substitution. Record
requires dynamically building a computational graph of the values
and dependencies of each operation performed on them. This means performing operations on
records have side effects, they add entries onto a WengertList
. However, when using
Record
the side effects are abstracted away, just create a WengertList
before you
start creating Records.
Given some function from N inputs to M outputs you can pass it Trace
s or Record
s
and retrieve the first derivative from the outputs for all combinations of N and M.
If N >> M then you should use Record
as reverse mode automatic differentiation is
much cheaper. If N << M then you should use Trace
as it will be much cheaper. If
you have large N and M, or small N and M, you might have to benchmark to find which
method works best. However, most problems are N > M.
For this example we use a function which takes two inputs, r and a, and returns two outputs, x and y.
Using Trace
use easy_ml::differentiation::Trace;
use crate::easy_ml::numeric::extra::Cos;
use crate::easy_ml::numeric::extra::Sin;
fn cartesian(r: Trace<f32>, angle: Trace<f32>) -> (Trace<f32>, Trace<f32>) {
let x = r * angle.cos();
let y = r * angle.sin();
(x, y)
}
// first find dx/dr and dy/dr
let (x, y) = cartesian(Trace::variable(1.0), Trace::constant(2.0));
let dx_dr = x.derivative;
let dy_dr = y.derivative;
// now find dx/da and dy/da
let (x, y) = cartesian(Trace::constant(1.0), Trace::variable(2.0));
let dx_da = x.derivative;
let dy_da = y.derivative;
Using Record
use easy_ml::differentiation::{Record, WengertList};
use crate::easy_ml::numeric::extra::{Cos, Sin};
// the lifetimes tell the rust compiler that our inputs and outputs
// can all live as long as the WengertList
fn cartesian<'a>(r: Record<'a, f32>, angle: Record<'a, f32>)
-> (Record<'a, f32>, Record<'a, f32>) {
let x = r * angle.cos();
let y = r * angle.sin();
(x, y)
}
// first we must construct a WengertList to create records from
let list = WengertList::new();
let r = Record::variable(1.0, &list);
let a = Record::variable(2.0, &list);
let (x, y) = cartesian(r, a);
// first find dx/dr and dx/da
let x_derivatives = x.derivatives();
let dx_dr = x_derivatives[&r];
let dx_da = x_derivatives[&a];
// now find dy/dr and dy/da
let y_derivatives = y.derivatives();
let dy_dr = y_derivatives[&r];
let dy_da = y_derivatives[&a];
Differences
Notice how in the above examples all the same 4 derivatives are found, but in
forward mode we rerun the function with a different input as the sole variable,
the rest as constants, whereas in reverse mode we rerun the derivatives()
function
on a different output variable. With Reverse mode we would only pass constants into
the cartesian
function if we didn’t want to get their derivatives (and avoid wasting
memory on something we didn’t need).
Substitution
There is no need to rewrite the input functions, as you can use the Numeric
and Real
traits to write a function that will take floating point numbers, Trace
s and Record
s.
use easy_ml::differentiation::{Trace, Record, WengertList};
use crate::easy_ml::numeric::Numeric;
use crate::easy_ml::numeric::extra::{Real};
fn cartesian<T: Numeric + Real + Copy>(r: T, angle: T) -> (T, T) {
let x = r * angle.cos();
let y = r * angle.sin();
(x, y)
}
let list = WengertList::new();
let r_record = Record::variable(1.0, &list);
let a_record = Record::variable(2.0, &list);
let (x_record, y_record) = cartesian(r_record, a_record);
// find dx/dr using reverse mode automatic differentiation
let x_derivatives = x_record.derivatives();
let dx_dr_reverse = x_derivatives[&r_record];
let (x_trace, y_trace) = cartesian(Trace::variable(1.0), Trace::constant(2.0));
// now find dx/dr with forward automatic differentiation
let dx_dr_forward = x_trace.derivative;
assert_eq!(dx_dr_reverse, dx_dr_forward);
let (x, y) = cartesian(1.0, 2.0);
assert_eq!(x, x_record.number); assert_eq!(x, x_trace.number);
assert_eq!(y, y_record.number); assert_eq!(y, y_trace.number);
Equivalance
Although in this example the derivatives found are identical, in practise, because forward and reverse mode compute things differently and floating point numbers have limited precision, you should not expect the derivatives to be exactly equal.
Further information
Modules
Other operator related implementations for the differentiation module
Operator implementations for Records.
Operator implementations for Traces
Structs
Computed derivatives of a computational graph for some output Record variable.
A wrapper around a real number which records it going through the computational graph. This is used to perform Reverse Mode Automatic Differentiation.
A dual number which traces a real number and keeps track of its derivative. This is used to perform Forward Automatic Differentiation
A list of operations performed in a forward pass of a dynamic computational graph, used for Reverse Mode Automatic Differentiation.
Traits
A trait with no methods which is implemented for all primitive types.