Expand description

(Automatic) Differentiation helpers

§Automatic Differentiation

This module provides structs for performing Forward and Reverse Automatic Differentiation

§Automatic Differentiation is not Numerical Differentiation

You were probably introduced to differentiation as numeric differentiation, ie if you have a function 3x2 then you can estimate its gradient at some value x by computing 3x2 and 3(x+h)2 where h is a very small value. The tangent line these two points create gives you an approximation of the gradient when you calculate (f(x+h) - f(x)) / h. Unfortunately floating point numbers in computers have limited precision, so this method is only approximate and can result in floating point errors. 1 + 1 might equal 2 but as you go smaller 10-i + 10-i starts to loook rather like 10-i as i goes into double digits.

§Automatic Differentiation is not Symbolic Differentiation

If you were taught calculus you have probably done plenty of symbolic differentiation by hand. A function 3x2 can be symbolically differentiated into 6x by applying simple rules to manipulate the algebra. Unfortunately the rules aren’t so simple for more complex expressions such as exponents, logs or trigonometry. Symbolic differentiation can give you expressions which are just as or more complicated than the original, and doing it by hand can be error prone. Symbolic Differentiation is also tricky to relate to algorithmic computations that use control structures.

§Automatic Differentiation

Automatic Differentiation computes the derivative of a function without rewriting the function as symbolic differentiation does and without the precision issues of numerical differentiation by splitting the derivative into lots of tiny steps of basic operations like addition and multiplication. These are combined using the chain rule. The downside is more memory is used than in symbolic or numerical differentiation, as derivatives have to be tracked through the computational graph.

§Forward Differentiation

Forward Differentiation computes all the gradients in a computational graph with respect to an input. For example, if you have a function f(x, y) = 5x3 - 4x2 + 10x - y, then for some actual value of x and y you can compute f(x,y) and δf(x,y)/δx together in one forward pass using forward differentiation. You can also make another pass and compute f(x,y) and δf(x,y)/δy for some actual value of x and y. It is possible to avoid redundantly calculating f(x,y) multiple times, but I am waiting on const generics to implement this. Regardless, forward differentiation requires making at least N+1 passes of the function to compute the derivatives of the output with respect to N inputs - and the current implementation will make 2N. However, you do get the gradients for every output in a single pass. This is poorly suited to neural nets as they often have a single output(loss) to differentiate many many inputs with respect to.

§Reverse Mode Differentiation

Reverse Mode Differentiation computes all the gradients in a computational graph for the same output. For example, if you have a function f(x, y) = 5x3 - 4x2 + 10x - y, then for some actual value of x and y you can compute f(x,y) and store all the intermediate results. You can then run a backward pass on the output of f(x, y) and obtain δf(x,y)/δx and δf(x,y)/δy for the actual values of x and y in a single pass. The catch is that reverse mode must store as many intermediate values as there are steps in the function which can use much more memory than forward mode. Reverse mode also requires making N backward passes to get the gradients for N different outputs. This is well suited to neural nets because we often have a single output (loss) to differentiate many inputs with respect to. However, reverse mode will be slower than forward mode if the number of inputs is small or there are many outputs.

§Usage

See sub module for usage examples

§Further information

Modules§

  • Record container iterators, for manipulating iterators of Records and converting back to Record containers.
  • Other operator related implementations for the differentiation module
  • Operator implementations for Records.
  • Operator implementations for Traces
  • Usage of Record and Trace

Structs§

  • Computed derivatives of a computational graph for some output Record variable.
  • A wrapper around a real number which records it going through the computational graph. This is used to perform Reverse Mode Automatic Differentiation.
  • A pluralisation of Record that groups together a source of numbers of type T and stores the WengertList only once.
  • A dual number which traces a real number and keeps track of its derivative. This is used to perform Forward Automatic Differentiation
  • A list of operations performed in a forward pass of a dynamic computational graph, used for Reverse Mode Automatic Differentiation.

Traits§

  • A trait with no methods which is implemented for all primitive types.

Type Aliases§

  • WengertLists are indexed with usize.
  • Alias for succinctly referring to RecordContainers backed by a matrix.
  • Alias for succinctly referring to RecordContainers backed by a tensor.