[−][src]Crate mli
MLI intends to provide modern, data-driven abstractions for machine learning.
MLI provides traits that work much like Combine's Parser
in that one should only
need to have to create tensor processing primitives and then string them together to craft
a large system that continues to be useful as a primitive in an even larger system.
To understand MLI, we actually just need to go over some basic Rust concepts.
In Rust, the following relationships relate the owned versions of data with the borrowed versions:
T
->&mut T
->&T
Vec<T>
->&mut [T]
->&[T]
String
->&mut str
->&str
FnOnce
->FnMut
->Fn
Most people don't think about the last one, because it is a trait, but it is critical to understand.
A FnOnce
owns its enclosed environment. It can implement FnMut
because since it owns the environment,
it can create a &mut
to it. Similarly, a FnMut
has a mutable borrow to its environment, thus it can
downgrade that mutable borrow to an immutable borrow.
In MLI, the equivalent is:
Graph
->&mut Graph
->&Graph
Let us decompose the notion of what a compute graph is. A compute graph simply takes several inputs,
produces several intermediary outputs, and then produces the actual outputs. Each one of these
intermediary outputs is important because they affect the gradients of the functions that came after them.
If we discard them, then they will have to be recomputed from the input. If we only have a simple
convolution operation as our whole graph, there are no intermediary computations, just the
input and the output. However, we are more likely to have several layers of convolution, activation
functions, splitting, and merging. If we have just two layers of convolution with no activation function,
the first layer produces an output which is necessary to calculate the gradient 𝛿output/𝛿filter
.
We would like to have an abstraction that treats a whole graph, its inputs, intermediary computations, and
its outputs, the same way we would treat a single simple static function like a tanh
. We would also like
that abstraction to be zero-cost and ideally parallel.
This is where the magic happens. The traits Forward
and Backward
only require a &self
as an
argument. This is because they use the trainable variables immutably to do a forward and backward
propogation. The trainable variables within the graph are stored as the Backward::TrainDelta
type.
This means that if we have a graph composed of Forward
and Backward
items, we can compute
all of the deltas for a batch without ever mutating the graph, thus we can compute all of the deltas
in parallel for each batch. Since the learning rate is intended to be incorporated into the
delta before hand, you can just sum all of the deltas in the batch and train the network.
The network which is trained though can't possibly be the one we distributed to all the threads,
however, due to Rust's ownership and borrowing. This is where the references come in.
The Graph
is trainable since we can mutate it. The &Graph
immutably borrows
the Graph
, which we can propogate Forward
and Backward
across
multiple threads. We can then sum all of the deltas from the threads when they are done and then
update the Graph
, which is no longer borrowed.
Currently there are three main traits:
Forward
- Implemented on anything that can go into a forward graph.
- Outputs intermediary computations (to compute gradients later) and final output.
Backward
- Implemented on anything that can go into a backward graph.
- Propogates gradients backwards through the neural network.
- This is provided the original input and any intermediary computations.
- Propogates the change from the output to the input and trainable variables.
- Is
&self
and does not do the actual training, so this can be ran in parallel.
Train
- Implemented on anything that can go into a training graph.
- Uses the change to update the trainable variables.
- Change can be normalized across a mini-batch before being passed.
- Implemented on the
mutable
version of the graph.
Structs
Chain |
Traits
Backward | This trait indicates support of backwards propogation. |
Forward | This trait is for algorithms that have an input and produce an output. |
Graph | |
Train | This trait is implemented on all operations that can be included in a trainable model. |