Expand description
High level neural network building blocks such as modules::Linear, activations, and tuples as Modules.
Also includes .save()
& .load()
for all Modules.
Mutable vs Immutable forwards
This is provided as two separate traits
- ModuleMut::forward_mut() which receives
&mut self
. - Module::forward() which receives
&self
.
This has nothing to do with whether gradients are being tracked or not. It only controls whether the module itself can be modified. Both OwnedTape and NoneTape can still be passed to both, and all modules should conform to this expected behavior.
In general, ModuleMut::forward_mut() should be used during training, and Module::forward() during evaluation/testing/inference/validation.
Here is a list of existing modules that have different behavior in these two functions:
Fallible forwards
You can also get a result from Module by using ModuleMut::try_forward_mut, and Module::try_forward.
Similar to fallible tensor_ops, the main purpose of this is to handle out of memory errors at the device level.
Initializing
Use DeviceBuildExt for device agnostic module creation/randomization:
use dfdx::nn::builders::{Linear, DeviceBuildExt};
type Model = Linear<5, 2>;
let model = dev.build_module::<Model, f32>();
Here, the return type depends on the device and dtype you are using.
For example, when using device crate::tensor::Cpu and f32
, the type
is Linear<5, 2, f32, Cpu>
. When using
a Cuda
device and f64
, the type is Linear<5, 2, f64, Cuda>
.
Alternatively, you can use BuildModule, which requires device specific model definitions:
use dfdx::nn::modules::{Linear, BuildModule};
type Dev = Cpu;
let dev: Dev = Default::default();
let model: Linear<5, 2, f32, Dev> = BuildModule::build(&dev);
Allocating & zeroing gradients
Use ZeroGrads::alloc_grads() and ZeroGrads::zero_grads() to reduce allocations,
and enable gradient accumulation!
This is the equivalent of pytorch’s Optimizer.zero_grad
use dfdx::nn::ZeroGrads;
let model = dev.build_module::<Model, f32>();
let mut grads: Gradients<f32, _> = model.alloc_grads();
model.zero_grads(&mut grads);
Exponential Moving Average (EMA)
All models implement ModelEMA::ema() to keep track of an exponential moving average of an entire model.
use dfdx::nn::ModelEMA;
let model = dev.build_module::<Model, f32>();
let mut ema_model = dev.build_module::<Model, f32>();
ema_model.ema(&model, 0.001);
Resetting parameters
All modules implement ResetParams, which allows you to reset a module back to a randomized state:
type Model = Linear<5, 2>;
let mut model = dev.build_module::<Model, f32>();
model.reset_params();
Sequential models
Tuple’s implement Module, so you can string multiple module’s together.
Here’s a single layer MLP:
type Mlp = (Linear<5, 3>, ReLU, Linear<3, 2>);
Here’s a more complex feedforward network that takes vectors of 5 elements and maps them to 2 elements.
type ComplexNetwork = (
DropoutOneIn<2>, // 1. dropout 50% of input
Linear<5, 3>, // 2. pass into a linear layer
LayerNorm1D<3>, // 3. normalize elements
ReLU, // 4. activate with relu
Residual<( // 5. residual connection that adds input to the result of it's sub layers
Linear<3, 3>,// 5.a. Apply linear layer
ReLU, // 5.b. Apply Relu
)>, // 5.c. the input to the residual is added back in after the sub layers
Linear<3, 2>, // 6. Apply another linear layer
);
Saving and Loading
numpy
Enable with the "numpy"
feature.
Call SaveToNpz::save() and LoadFromNpz::load() methods. All modules provided here implement it,
including tuples. These all save to/from .npz
files, which are basically zip files with multiple .npy
files.
This is implemented to be fairly portable. For example you can load a simple MLP into pytorch like so:
import torch
import numpy as np
state_dict = {k: torch.from_numpy(v) for k, v in np.load("dfdx-model.npz").items()}
mlp.load_state_dict(state_dict)
safetensors
Enable with the "safetensors"
feature.
The feature safetensors
allows to do the same with
https://github.com/huggingface/safetensors.
Call SaveToSafetensors::save_safetensors() and LoadFromSafetensors::load_safetensors() funcs. All modules provided here implement it, including tuples.
These all save to/from .safetensors
files, which are flat layout with JSON
header, allowing for super fast loads (with memory mapping).
This is implemented to be fairly portable. For example you can use https://github.com/huggingface/transformers
from transformers import pipeline
pipe = pipeline(model="gpt2")
pipe.save_pretrained("my_local", safe_serialization=True)
# This created `my_local/model.safetensors` file which can now be used.
Re-exports
pub use tensor_collection::*;
Modules
- Simple specification of network structure, without worrying about device or dtype.
- Structs containing initialized Tensors & impls for super::Module. See super::builders for helpful utilities in creating these in a device/dtype agnostic way.
- Traits to define a TensorCollection and how to iterate them using ModuleVisitor. Use RecursiveWalker to do the iteration and TensorVisitor to define functions to iterate through and/or construct Modules.
Traits
- Something that can be built. Related to super::BuildOnDevice
- Something that can be built on a different device than it is on.
- An extension trait that allows you to build a module with a device method. Also allows easy specification of Dtype.
- Something that can be loaded from a
.npz
file (which is azip
file). - Something that can be loaded from a
.safetensors
file. - Performs model exponential moving average on two modules.
- Marker trait for modules that don’t have different behavior between mutable forwards and non-mutable forwards
- Get the number of trainable parameters in a model.
- Reset a module’s parameters with their default reset function:
- Something that can be saved to a
.npz
(which is a.zip
). - Something that can be saved to a
.safetensors
. - Something that can be copied to another
Device
. - Something that can be copied to have a different dtype
- Zero’s any gradients associated with
self
. - Marker trait for modules with no updatable parameters. These have blanket impls for, and ModuleMut