Module dfdx::nn

source ·
Expand description

High level neural network building blocks such as modules::Linear, activations, and tuples as Modules. Also includes .save() & .load() for all Modules.

Mutable vs Immutable forwards

This is provided as two separate traits

  1. ModuleMut::forward_mut() which receives &mut self.
  2. Module::forward() which receives &self.

This has nothing to do with whether gradients are being tracked or not. It only controls whether the module itself can be modified. Both OwnedTape and NoneTape can still be passed to both, and all modules should conform to this expected behavior.

In general, ModuleMut::forward_mut() should be used during training, and Module::forward() during evaluation/testing/inference/validation.

Here is a list of existing modules that have different behavior in these two functions:

Fallible forwards

You can also get a result from Module by using ModuleMut::try_forward_mut, and Module::try_forward.

Similar to fallible tensor_ops, the main purpose of this is to handle out of memory errors at the device level.

Initializing

Use DeviceBuildExt for device agnostic module creation/randomization:

use dfdx::nn::builders::{Linear, DeviceBuildExt};
type Model = Linear<5, 2>;
let model = dev.build_module::<Model, f32>();

Here, the return type depends on the device and dtype you are using.

For example, when using device crate::tensor::Cpu and f32, the type is Linear<5, 2, f32, Cpu>. When using a Cuda device and f64, the type is Linear<5, 2, f64, Cuda>.

Alternatively, you can use BuildModule, which requires device specific model definitions:

use dfdx::nn::modules::{Linear, BuildModule};
type Dev = Cpu;
let dev: Dev = Default::default();
let model: Linear<5, 2, f32, Dev> = BuildModule::build(&dev);

Allocating & zeroing gradients

Use ZeroGrads::alloc_grads() and ZeroGrads::zero_grads() to reduce allocations, and enable gradient accumulation! This is the equivalent of pytorch’s Optimizer.zero_grad

use dfdx::nn::ZeroGrads;
let model = dev.build_module::<Model, f32>();
let mut grads: Gradients<f32, _> = model.alloc_grads();
model.zero_grads(&mut grads);

Exponential Moving Average (EMA)

All models implement ModelEMA::ema() to keep track of an exponential moving average of an entire model.

use dfdx::nn::ModelEMA;
let model = dev.build_module::<Model, f32>();
let mut ema_model = dev.build_module::<Model, f32>();
ema_model.ema(&model, 0.001);

Resetting parameters

All modules implement ResetParams, which allows you to reset a module back to a randomized state:

type Model = Linear<5, 2>;
let mut model = dev.build_module::<Model, f32>();
model.reset_params();

Sequential models

Tuple’s implement Module, so you can string multiple module’s together.

Here’s a single layer MLP:

type Mlp = (Linear<5, 3>, ReLU, Linear<3, 2>);

Here’s a more complex feedforward network that takes vectors of 5 elements and maps them to 2 elements.

type ComplexNetwork = (
    DropoutOneIn<2>, // 1. dropout 50% of input
    Linear<5, 3>,    // 2. pass into a linear layer
    LayerNorm1D<3>,  // 3. normalize elements
    ReLU,            // 4. activate with relu
    Residual<(       // 5. residual connection that adds input to the result of it's sub layers
        Linear<3, 3>,// 5.a. Apply linear layer
        ReLU,        // 5.b. Apply Relu
    )>,              // 5.c. the input to the residual is added back in after the sub layers
    Linear<3, 2>,    // 6. Apply another linear layer
);

Saving and Loading

numpy

Enable with the "numpy" feature.

Call SaveToNpz::save() and LoadFromNpz::load() methods. All modules provided here implement it, including tuples. These all save to/from .npz files, which are basically zip files with multiple .npy files.

This is implemented to be fairly portable. For example you can load a simple MLP into pytorch like so:

import torch
import numpy as np
state_dict = {k: torch.from_numpy(v) for k, v in np.load("dfdx-model.npz").items()}
mlp.load_state_dict(state_dict)

safetensors

Enable with the "safetensors" feature.

The feature safetensors allows to do the same with https://github.com/huggingface/safetensors.

Call SaveToSafetensors::save_safetensors() and LoadFromSafetensors::load_safetensors() funcs. All modules provided here implement it, including tuples.

These all save to/from .safetensors files, which are flat layout with JSON header, allowing for super fast loads (with memory mapping).

This is implemented to be fairly portable. For example you can use https://github.com/huggingface/transformers

from transformers import pipeline

pipe = pipeline(model="gpt2")
pipe.save_pretrained("my_local", safe_serialization=True)
# This created `my_local/model.safetensors` file which can now be used.

Re-exports

Modules

Traits

  • Something that can be built. Related to super::BuildOnDevice
  • Something that can be built on a different device than it is on.
  • An extension trait that allows you to build a module with a device method. Also allows easy specification of Dtype.
  • Something that can be loaded from a .npz file (which is a zip file).
  • Something that can be loaded from a .safetensors file.
  • Performs model exponential moving average on two modules.
  • Immutable forward of Input that produces Module::Output. See ModuleMut for mutable forward.
  • Mutable forward of Input that produces ModuleMut::Output. See Module for immutable forward.
  • Marker trait for modules that don’t have different behavior between mutable forwards and non-mutable forwards
  • Get the number of trainable parameters in a model.
  • Reset a module’s parameters with their default reset function:
  • Something that can be saved to a .npz (which is a .zip).
  • Something that can be saved to a .safetensors.
  • Something that can be copied to another Device.
  • Something that can be copied to have a different dtype
  • Zero’s any gradients associated with self.
  • Marker trait for modules with no updatable parameters. These have blanket impls for, and ModuleMut