[][src]Crate mli

MLI intends to provide modern, data-driven abstractions for machine learning.

MLI provides traits that work much like Combine's Parser in that one should only need to have to create tensor processing primitives and then string them together to craft a large system that continues to be useful as a primitive in an even larger system.

To understand MLI, we actually just need to go over some basic Rust concepts.

In Rust, the following relationships relate the owned versions of data with the borrowed versions:

  • T -> &mut T -> &T
  • Vec<T> -> &mut [T] -> &[T]
  • String -> &mut str -> &str
  • FnOnce -> FnMut -> Fn

Most people don't think about the last one, because it is a trait, but it is critical to understand. A FnOnce owns its enclosed environment. It can implement FnMut because since it owns the environment, it can create a &mut to it. Similarly, a FnMut has a mutable borrow to its environment, thus it can downgrade that mutable borrow to an immutable borrow.

In MLI, the equivalent is:

  • Graph -> &mut Graph -> &Graph

Let us decompose the notion of what a compute graph is. A compute graph simply takes several inputs, produces several intermediary outputs, and then produces the actual outputs. Each one of these intermediary outputs is important because they affect the gradients of the functions that came after them. If we discard them, then they will have to be recomputed from the input. If we only have a simple convolution operation as our whole graph, there are no intermediary computations, just the input and the output. However, we are more likely to have several layers of convolution, activation functions, splitting, and merging. If we have just two layers of convolution with no activation function, the first layer produces an output which is necessary to calculate the gradient 𝛿output/𝛿filter.

We would like to have an abstraction that treats a whole graph, its inputs, intermediary computations, and its outputs, the same way we would treat a single simple static function like a tanh. We would also like that abstraction to be zero-cost and ideally parallel.

This is where the magic happens. The traits Forward and Backward only require a &self as an argument. This is because they use the trainable variables immutably to do a forward and backward propogation. The trainable variables within the graph are stored as the Backward::TrainDelta type. This means that if we have a graph composed of Forward and Backward items, we can compute all of the deltas for a batch without ever mutating the graph, thus we can compute all of the deltas in parallel for each batch. Since the learning rate is intended to be incorporated into the delta before hand, you can just sum all of the deltas in the batch and train the network.

The network which is trained though can't possibly be the one we distributed to all the threads, however, due to Rust's ownership and borrowing. This is where the references come in. The Graph is trainable since we can mutate it. The &Graph immutably borrows the Graph, which we can propogate Forward and Backward across multiple threads. We can then sum all of the deltas from the threads when they are done and then update the Graph, which is no longer borrowed.

Currently there are three main traits:

  • Forward
    • Implemented on anything that can go into a forward graph.
    • Outputs intermediary computations (to compute gradients later) and final output.
  • Backward
    • Implemented on anything that can go into a backward graph.
    • Propogates gradients backwards through the neural network.
    • This is provided the original input and any intermediary computations.
    • Propogates the change from the output to the input and trainable variables.
    • Is &self and does not do the actual training, so this can be ran in parallel.
  • Train
    • Implemented on anything that can go into a training graph.
    • Uses the change to update the trainable variables.
      • Change can be normalized across a mini-batch before being passed.
    • Implemented on the mutable version of the graph.





This trait indicates support of backwards propogation.


This trait is for algorithms that have an input and produce an output.


This trait is implemented on all operations that can be included in a trainable model.