## Expand description

## dfdx

dfdx is a cuda accelerated tensor and neural network library, writtten entirely in rust!

Additionally, it can track compile time shapes across tensor operations,
ensuring that all your neural networks are checked **at compile time**.

The following sections provide some high level core concepts & exmaples, and there is more detailed documentation in each of dfdx’s submodules.

See feature_flags for details on feature flags.

## Shapes & Tensors

*See dtypes, shapes, and tensor for more information.*

At its core a `tensor::Tensor`

is just a nd-array. Just like
rust arrays there are two parts:

dfdx represents shapes as **tuples** of dimensions (`shapes::Dim`

),
where a dimension can either be known at:

- Compile time
`shapes::Const<M>`

- Run time
`usize`

You can freely mix and match these dimensions together. Here are some example shapes:

`()`

- unit shape`(usize,)`

- 1d shape with a runtime known dimension`(usize, Const<5>)`

- 2d shape with both types of dimensions`(Const<3>, usize, Const<5>)`

- 3d shape!`Rank3<3, 5, 7>`

- Equivalent to`(Const<3>, Const<5>, Const<7>)`

Here are some comparisons between representing nd arrays in rust vs dfdx:

rust array | dfdx `Tensor` |
---|---|

f32 | Tensor<(), f32, …> |

[u32; 5] | Tensor<Rank1<5>, u32, …> |

[[u8; 3]; 2] | Tensor<Rank2<2, 3>, u8, …> |

Vec<[bool; 5]> | Tensor<(usize, Const<5>), bool, …> |

The `Rank1`

, `Rank2`

shapes used above are actually type aliases for
when **all dimensions are compile time**:

`shapes::Rank0`

is just`()`

.`shapes::Rank1<M>`

is`(Const<M>, )`

`shapes::Rank2<M, N>`

is`(Const<M>, Const<N>)`

## Allocating tensors with Devices

*See tensor for more information.*

Devices are used to allocate tensors (and neural networks!). They are akin to std::alloc::GlobalAlloc in rust - they just allocate memory. They are also used to execute tensor ops, which we will get to later on.

There are two options for this currently, with more planned to be added in the future:

- tensor::Cpu - for tensors stored on the heap
- tensor::Cuda - for tensors stored in GPU memory

Both devices implement Default, you can also create them with a certain seed and ordinal.

Here’s how you might use a device:

```
let dev: Cpu = Default::default();
let t: Tensor<Rank2<2, 3>, f32, _> = dev.zeros();
```

## Tensor Operations (tip of the iceberg)

*See tensor_ops for more information*

Once you’ve instantiated tensors with a device, you can start doing operations on them!
There are **many many** operations, here are a few core ones and how they related
to things like numpy/pytorch:

Operation | dfdx | numpy | pytorch |
---|---|---|---|

Unary Operations | `a.sqrt()` | `a.sqrt()` | `a.sqrt()` |

Binary Operations | `a + b` | `a + b` | `a + b` |

gemm/gemv | tensor_ops::matmul | `a @ b` | `a @ b` |

2d Convolution | tensor_ops::TryConv2D | - | `torch.conv2d` |

2d Transposed Convolution | tensor_ops::TryConvTrans2D | - | `torch.conv_transpose2d` |

Slicing | tensor_ops::slice | `a[...]` | `a[...]` |

Select | tensor_ops::SelectTo | `a[...]` | `torch.select` |

Gather | tensor_ops::GatherTo | `np.take` | `torch.gather` |

Broadcasting | tensor_ops::BroadcastTo | implicit/`np.broadcast` | implicit/`torch.broadcast_to` |

Permute | tensor_ops::PermuteTo | `np.transpose(...)` | `torch.permute` |

Where | tensor_ops::ChooseFrom | `np.where` | `torch.where` |

Reshape | tensor_ops::ReshapeTo | `np.reshape(shape)` | `a.reshape(shape)` |

View | tensor_ops::ReshapeTo | `np.view(...)` | `a.view(...)` |

Roll | tensor_ops::Roll | `np.rollaxis(...)` | `a.roll(...)` |

Stack | tensor_ops::TryStack | `np.stack` | `torch.stack` |

Concat | tensor_ops::TryConcat | `np.concatenate` | `torch.concat` |

and **much much more!**

## Neural networks

*See nn for more information.*

Neural networks are composed of building blocks that you can chain together. In
dfdx, sequential neural networks are represents by **tuples**! For example,
the following two networks are identical:

dfdx | pytorch |
---|---|

`(Linear<3, 5>, ReLU, Linear<5, 10>)` | `nn.Sequential(nn.Linear(3, 5), nn.ReLU(), nn.Linear(5, 10))` |

`((Conv2D<3, 2, 1>, Tanh), Conv2D<3, 2, 1>)` | `nn.Sequential(nn.Sequential(nn.Conv2d(3, 2, 1), nn.Tanh()), nn.Conv2d(3, 2, 1))` |

To build a neural network, you of course need a device:

```
let dev: Cpu = Default::default();
type Model = (Linear<3, 5>, ReLU, Linear<5, 10>);
let model = dev.build_module::<Model, f32>();
```

Note two things:

- We are using nn::DeviceBuildExt to instantiate the model
- We
**need**to pass a dtype (in this case f32) to create the model.

You can then pass tensors into the model with nn::Module::forward():

```
// tensor with runtime batch dimension of 10
let x: Tensor<(usize, Const<3>), f32, _> = dev.sample_normal_like(&(10, Const));
let y = model.forward(x);
```

## Optimizers and Gradients

*See optim for more information*

dfdx supports a number of the standard optimizers:

Optimizer | dfdx | pytorch |
---|---|---|

SGD | optim::Sgd | `torch.optim.SGD` |

Adam | optim::Adam | torch.optim.Adam` |

AdamW | optim::Adam with optim::WeightDecay::Decoupled | `torch.optim.AdamW` |

RMSprop | optim::RMSprop | `torch.optim.RMSprop` |

You can use optimizers to optimize neural networks (or even tensors!). Here’s a simple example of how to do this with nn::ZeroGrads:

```
type Model = (Linear<3, 5>, ReLU, Linear<5, 10>);
let mut model = dev.build_module::<Model, f32>();
// 1. allocate gradients for the model
let mut grads = model.alloc_grads();
// 2. create our optimizer
let mut opt = Sgd::new(&model, Default::default());
// 3. trace gradients through forward pass
let x: Tensor<Rank2<10, 3>, f32, _> = dev.sample_normal();
let y = model.forward_mut(x.traced(grads));
// 4. compute loss & run backpropagation
let loss = y.square().mean();
grads = loss.backward();
// 5. apply gradients
opt.update(&mut model, &grads);
```

## Modules

- A collection of useful data utilities such as ExactSizeDataset, OneHotEncode, Arange, and iterator extension traits!
- Information about the available feature flags.
- Standard loss functions such as mse_loss(), cross_entropy_with_logits_loss(), and more.
- High level neural network building blocks such as modules::Linear, activations, and tuples as Modules. Also includes
`.save()`

&`.load()`

for all Modules. - Contains subset of all public exports.

## Functions

- Sets a CPU
`sse`

flag to flush denormal floating point numbers to zero. The opposite of this is keep_denormals(). - Sets a CPU flag to keep denormal floating point numbers. The opposite of this is flush_denormals_to_zero().