Crate wyrm [−] [src]

A reverse mode, define-by-run, low-overhead autodifferentiation library.

Features

Performs backpropagation through arbitrary, define-by-run computation graphs, emphasizing low overhead estimation of sparse, small models on the CPU.

Highlights:

Low overhead.
Built-in support for sparse gradients.
Define-by-run.
Trivial Hogwild-style parallelisation, scaling linearly with the number of CPU cores available.

Requires the nightly compiler due to use of SIMD intrinsics.

Quickstart

The following defines a univariate linear regression model, then backpropagates through it.

let slope = ParameterNode::new(random_matrix(1, 1));
let intercept = ParameterNode::new(random_matrix(1, 1));

let x = InputNode::new(random_matrix(1, 1));
let y = InputNode::new(random_matrix(1, 1));

let y_hat = slope.clone() * x.clone() + intercept.clone();
let mut loss = (y.clone() - y_hat).square();

To optimize the parameters, create an optimizer object and go through several epochs of learning:

let mut optimizer = SGD::new(0.1, vec![slope.clone(), intercept.clone()]);

for _ in 0..num_epochs {
    let x_value: f32 = rand::random();
    let y_value = 3.0 * x_value + 5.0;

    // You can re-use the computation graph
    // by giving the input nodes new values.
    x.set_value(x_value);
    y.set_value(y_value);

    loss.forward();
    loss.backward(1.0);

    optimizer.step();
    loss.zero_gradient();
}

You can use rayon to fit your model in parallel, by first creating a set of shared parameters, then building a per-thread copy of the model:

let slope_param = Arc::new(HogwildParameter::new(random_matrix(1, 1)));
let intercept_param = Arc::new(HogwildParameter::new(random_matrix(1, 1)));
let num_epochs = 10;

(0..rayon::current_num_threads())
    .into_par_iter()
       .for_each(|_| {
           let slope = ParameterNode::shared(slope_param.clone());
           let intercept = ParameterNode::shared(intercept_param.clone());
           let x = InputNode::new(random_matrix(1, 1));
           let y = InputNode::new(random_matrix(1, 1));
           let y_hat = slope.clone() * x.clone() + intercept.clone();
           let mut loss = (y.clone() - y_hat).square();

           let mut optimizer = SGD::new(0.1, vec![slope.clone(), intercept.clone()]);

           for _ in 0..num_epochs {
               let x_value: f32 = rand::random();
               let y_value = 3.0 * x_value + 5.0;

               x.set_value(x_value);
               y.set_value(y_value);

               loss.forward();
               loss.backward(1.0);

               optimizer.step();
               loss.zero_gradient();
           }
       });

BLAS support

You should enable BLAS support to get (much) better performance out of matrix-multiplication-heavy workloads. To do so, add the following to your Cargo.toml:

ndarray = { version = "0.11.0", features = ["blas", "serde-1"] }
blas-src = { version = "0.1.2", default-features = false, features = ["openblas"] }
openblas-src = { version = "0.5.6", default-features = false, features = ["cblas"] }

Fast numerics

Enable the fast-math option to use fast approximations to transcendental functions. This should give substantial speed gains in networks that are exp, ln, or tanh-heavy.

Modules

nn	Neural network components.

Structs

HogwildParameter	Struct used to hold parameters that need to be shared among multiple `ParameterNode`s for asynchronous, parallel optimization.
IndexInputNode	An input node for integer indices into `ParameterNode`s, used for implementing indexable embedding layers.
InputNode	Input node for the graph.
ParameterNode	Parameter node, holds the optimizable parameters of the model.
SGD	Standard stochastic gradient descent optimizer with a fixed learning rate.
Variable	Handle to a node in the computation graph. The underlying nodes are reference counted, so the handles can be freely cloned to use the nodes multiple times in the same graph.

Enums

Bor	Generalisation over borrowed `RefCell` values and simple references.

Traits

DataInput	Trait describing nodes that can accept new values once the graph has been defined.
Node	Trait representing a computation node. Structs implementing this trait can be used as elements of the computation graph.

Functions

finite_difference	Compute finite difference gradient estimates of the output variable with respect to the input. Use to verify correctness of gradient computations.
simd_dot	SIMD-enabled vector-vector dot product.

Type Definitions

Arr	Alias for a `f32` `ndarray` matrix.