Module dfdx::nn

Expand description

High level neural network building blocks such as modules::Linear, activations, and tuples as Modules. Also includes .save() & .load() for all Modules.

Mutable vs Immutable forwards

This is provided as two separate traits

ModuleMut::forward_mut() which receives &mut self.
Module::forward() which receives &self.

This has nothing to do with whether gradients are being tracked or not. It only controls whether the module itself can be modified. Both OwnedTape and NoneTape can still be passed to both, and all modules should conform to this expected behavior.

In general, ModuleMut::forward_mut() should be used during training, and Module::forward() during evaluation/testing/inference/validation.

Here is a list of existing modules that have different behavior in these two functions:

Fallible forwards

You can also get a result from Module by using ModuleMut::try_forward_mut, and Module::try_forward.

Similar to fallible tensor_ops, the main purpose of this is to handle out of memory errors at the device level.

Initializing

Use DeviceBuildExt for device agnostic module creation/randomization:

use dfdx::nn::builders::{Linear, DeviceBuildExt};
type Model = Linear<5, 2>;
let model = dev.build_module::<Model, f32>();

Here, the return type depends on the device and dtype you are using.

For example, when using device crate::tensor::Cpu and f32, the type is Linear<5, 2, f32, Cpu>. When using a Cuda device and f64, the type is Linear<5, 2, f64, Cuda>.

Alternatively, you can use BuildModule, which requires device specific model definitions:

use dfdx::nn::modules::{Linear, BuildModule};
type Dev = Cpu;
let dev: Dev = Default::default();
let model: Linear<5, 2, f32, Dev> = BuildModule::build(&dev);

Allocating & zeroing gradients

Use ZeroGrads::alloc_grads() and ZeroGrads::zero_grads() to reduce allocations, and enable gradient accumulation! This is the equivalent of pytorch’s Optimizer.zero_grad

use dfdx::nn::ZeroGrads;
let model = dev.build_module::<Model, f32>();
let mut grads: Gradients<f32, _> = model.alloc_grads();
model.zero_grads(&mut grads);

Exponential Moving Average (EMA)

All models implement ModelEMA::ema() to keep track of an exponential moving average of an entire model.

use dfdx::nn::ModelEMA;
let model = dev.build_module::<Model, f32>();
let mut ema_model = dev.build_module::<Model, f32>();
ema_model.ema(&model, 0.001);

Resetting parameters

All modules implement ResetParams, which allows you to reset a module back to a randomized state:

type Model = Linear<5, 2>;
let mut model = dev.build_module::<Model, f32>();
model.reset_params();

Sequential models

Tuple’s implement Module, so you can string multiple module’s together.

Here’s a single layer MLP:

type Mlp = (Linear<5, 3>, ReLU, Linear<3, 2>);

Here’s a more complex feedforward network that takes vectors of 5 elements and maps them to 2 elements.

type ComplexNetwork = (
    DropoutOneIn<2>, // 1. dropout 50% of input
    Linear<5, 3>,    // 2. pass into a linear layer
    LayerNorm1D<3>,  // 3. normalize elements
    ReLU,            // 4. activate with relu
    Residual<(       // 5. residual connection that adds input to the result of it's sub layers
        Linear<3, 3>,// 5.a. Apply linear layer
        ReLU,        // 5.b. Apply Relu
    )>,              // 5.c. the input to the residual is added back in after the sub layers
    Linear<3, 2>,    // 6. Apply another linear layer
);

Saving and Loading

numpy

Enable with the "numpy" feature.

Call SaveToNpz::save() and LoadFromNpz::load() methods. All modules provided here implement it, including tuples. These all save to/from .npz files, which are basically zip files with multiple .npy files.

This is implemented to be fairly portable. For example you can load a simple MLP into pytorch like so:

import torch
import numpy as np
state_dict = {k: torch.from_numpy(v) for k, v in np.load("dfdx-model.npz").items()}
mlp.load_state_dict(state_dict)

safetensors

Enable with the "safetensors" feature.

The feature safetensors allows to do the same with https://github.com/huggingface/safetensors.

Call SaveToSafetensors::save_safetensors() and LoadFromSafetensors::load_safetensors() funcs. All modules provided here implement it, including tuples.

These all save to/from .safetensors files, which are flat layout with JSON header, allowing for super fast loads (with memory mapping).

This is implemented to be fairly portable. For example you can use https://github.com/huggingface/transformers

from transformers import pipeline

pipe = pipeline(model="gpt2")
pipe.save_pretrained("my_local", safe_serialization=True)
# This created `my_local/model.safetensors` file which can now be used.

Re-exports

pub use tensor_collection::*;

Modules

builders
Simple specification of network structure, without worrying about device or dtype.
modules
Structs containing initialized Tensors & impls for super::Module. See super::builders for helpful utilities in creating these in a device/dtype agnostic way.
prelu
tensor_collection
Traits to define a TensorCollection and how to iterate them using ModuleVisitor. Use RecursiveWalker to do the iteration and TensorVisitor to define functions to iterate through and/or construct Modules.

Traits

BuildModule
Something that can be built. Related to super::BuildOnDevice
BuildOnDevice
Something that can be built on a different device than it is on.
DeviceBuildExt
An extension trait that allows you to build a module with a device method. Also allows easy specification of Dtype.
LoadFromNpz
Something that can be loaded from a .npz file (which is a zip file).
LoadFromSafetensors
Something that can be loaded from a .safetensors file.
ModelEMA
Performs model exponential moving average on two modules.
Module
Immutable forward of Input that produces Module::Output. See ModuleMut for mutable forward.
ModuleMut
Mutable forward of Input that produces ModuleMut::Output. See Module for immutable forward.
NonMutableModule
Marker trait for modules that don’t have different behavior between mutable forwards and non-mutable forwards
NumParams
Get the number of trainable parameters in a model.
ResetParams
Reset a module’s parameters with their default reset function:
SaveToNpz
Something that can be saved to a .npz (which is a .zip).
SaveToSafetensors
Something that can be saved to a .safetensors.
ToDevice
Something that can be copied to another Device.
ToDtype
Something that can be copied to have a different dtype
ZeroGrads
Zero’s any gradients associated with self.
ZeroSizedModule
Marker trait for modules with no updatable parameters. These have blanket impls for, and ModuleMut