Crate dfdx

source · []
Expand description

Ergonomics & safety focused deep learning in Rust. Main features include:

  1. Const generic tensor library with tensors up to 4d!
  2. A large library of tensor operations (matrix multiplication, arithmetic, activation functions, etc).
  3. Safe & easy to use neural network building blocks.
  4. Standard deep learning optimizers such as Sgd and Adam.
  5. Reverse mode auto differentiation implementation.
  6. Serialization to/from .npy and .npz for transferring models to/from python.

A quick tutorial

  1. crate::tensor::Tensors can be created with normal rust arrays. See crate::tensor.
let x = tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]);
let y: Tensor2D<2, 3> = TensorCreator::ones();
  1. Neural networks are built with types. Tuples are sequential models. See crate::nn.
type Mlp = (
    Linear<5, 3>,
    Linear<3, 2>,
  1. Instantiate models with Default, and randomize with crate::nn::ResetParams
let mut mlp: Linear<5, 2> = Default::default();
mlp.reset_params(&mut rng);
  1. Pass data through networks with crate::nn::Module
let mut mlp: Linear<5, 2> = Default::default();
let x = Tensor1D::zeros(); // compiler knows that x is a `Tensor1D<5>`
let y = mlp.forward(x); // compiler knows that `y` must be `Tensor1D<2>`
  1. Trace gradients using crate::tensor::trace()
// tensors default to not having a tape
let x: Tensor1D<10, NoneTape> = TensorCreator::zeros();

// `.trace()` clones `x` and inserts a gradient tape.
let x_t: Tensor1D<10, OwnedTape> = x.trace();

// The tape from the input is moved through the network during .forward().
let y: Tensor1D<5, NoneTape> = model.forward(x);
let y_t: Tensor1D<5, OwnedTape> = model.forward(x_t);
  1. Compute gradients with crate::tensor_ops::backward(). See crate::tensor_ops.
// compute cross entropy loss
let loss: Tensor0D<OwnedTape> = cross_entropy_with_logits_loss(y, y_true);

// call `backward()` to compute gradients. The tensor *must* have `OwnedTape`!
let gradients: Gradients = loss.backward();
  1. Use an optimizer from crate::optim to optimize your network!
// Use stochastic gradient descent (Sgd), with a learning rate of 1e-2, and 0.9 momentum.
let mut opt = Sgd::new(SgdConfig {
    lr: 1e-2,
    momentum: Some(Momentum::Classic(0.9)),
    weight_decay: None,

// pass the gradients & the model into the optimizer's update method
opt.update(&mut model, gradients);


Collection of traits to describe Nd arrays.
A collection of data utility classes such as one_hot_encode() and SubsetIterator.
Provides implementations for modifying Nd arrays on the Cpu.
Information about the available feature flags.
Implementations of GradientTape and generic Nd array containers via Gradients.
Standard loss functions such as mse_loss(), cross_entropy_with_logits_loss(), and more.
High level neural network building blocks such as Linear, activations, and tuples as Modules. Also includes .save() & .load() for all Modules.
Provides some generic functions to load & save Nd arrays in the .npy format. See load() and save()
Optimizers such as Sgd, Adam, and RMSprop that can optimize neural networks.
Contains all public exports.
The struct definitions for all TensorXD, Tensor trait, and more.
Operations on tensors like relu(), matmul(), softmax(), and more.
A simple implementation of a UID used as a unique key for tensors.


Used to assert things about const generics



Sets a CPU sse flag to flush denormal floating point numbers to zero. The opposite of this is keep_denormals().
Sets a CPU flag to keep denormal floating point numbers. The opposite of this is flush_denormals_to_zero().