MLI intends to provide modern, data-driven abstractions for machine learning.
MLI provides traits that work much like Combine's
Parser in that one should only
need to have to create tensor processing primitives and then string them together to craft
a large system that continues to be useful as a primitive in an even larger system.
To understand MLI, we actually just need to go over some basic Rust concepts.
In Rust, the following relationships relate the owned versions of data with the borrowed versions:
Most people don't think about the last one, because it is a trait, but it is critical to understand.
FnOnce owns its enclosed environment. It can implement
FnMut because since it owns the environment,
it can create a
&mut to it. Similarly, a
FnMut has a mutable borrow to its environment, thus it can
downgrade that mutable borrow to an immutable borrow.
In MLI, the equivalent is:
Let us decompose the notion of what a compute graph is. A compute graph simply takes several inputs,
produces several intermediary outputs, and then produces the actual outputs. Each one of these
intermediary outputs is important because they affect the gradients of the functions that came after them.
If we discard them, then they will have to be recomputed from the input. If we only have a simple
convolution operation as our whole graph, there are no intermediary computations, just the
input and the output. However, we are more likely to have several layers of convolution, activation
functions, splitting, and merging. If we have just two layers of convolution with no activation function,
the first layer produces an output which is necessary to calculate the gradient
We would like to have an abstraction that treats a whole graph, its inputs, intermediary computations, and
its outputs, the same way we would treat a single simple static function like a
tanh. We would also like
that abstraction to be zero-cost and ideally parallel.
This is where the magic happens. The traits
Backward only require a
&self as an
argument. This is because they use the trainable variables immutably to do a forward and backward
propogation. The trainable variables within the graph are stored as the
This means that if we have a graph composed of
Backward items, we can compute
all of the deltas for a batch without ever mutating the graph, thus we can compute all of the deltas
in parallel for each batch. Since the learning rate is intended to be incorporated into the
delta before hand, you can just sum all of the deltas in the batch and train the network.
The network which is trained though can't possibly be the one we distributed to all the threads,
however, due to Rust's ownership and borrowing. This is where the references come in.
Graph is trainable since we can mutate it. The
&Graph immutably borrows
Graph, which we can propogate
multiple threads. We can then sum all of the deltas from the threads when they are done and then
Graph, which is no longer borrowed.
Currently there are three main traits:
- Implemented on anything that can go into a forward graph.
- Outputs intermediary computations (to compute gradients later) and final output.
- Implemented on anything that can go into a backward graph.
- Propogates gradients backwards through the neural network.
- This is provided the original input and any intermediary computations.
- Propogates the change from the output to the input and trainable variables.
&selfand does not do the actual training, so this can be ran in parallel.
- Implemented on anything that can go into a training graph.
- Uses the change to update the trainable variables.
- Change can be normalized across a mini-batch before being passed.
- Implemented on the
mutableversion of the graph.
This trait indicates support of backwards propogation.
This trait is for algorithms that have an input and produce an output.
This trait is implemented on all operations that can be included in a trainable model.