| The TT-layer serves as a low-rank decomposition
| of a fully connected layer.
|
| The inputs are the same as to a fully connected
| layer, but the number of parameters
| are greatly reduced and forward computation
| time can be drastically reduced especially
| for layers with large weight matrices.
|
| The multiplication is computed as a
| product of the input vector with each
| of the cores that make up the TT layer.
|
| Given the input sizes (inp_sizes),
| output sizes(out_sizes), and the ranks
| of each of the cores (tt_ranks), the
| ith core will have size:
|
| inp_sizes[i] * tt_ranks[i] *
| tt_ranks[i + 1] * out_sizes[i].
|
| The complexity of the computation is
| dictated by the sizes of inp_sizes,
| out_sizes, and tt_ranks, where there
| is the trade off between accuracy of
| the low-rank decomposition and the
| speed of the computation.
|