pub type TransformerEncoderBlock<const M: usize, const I: usize, const K: usize, const H: usize> = (Residual<MultiHeadAttention<M, M, K, M, H>>, LayerNorm1D<M>, Residual<(Linear<M, I>, ReLU, Linear<I, M>)>, LayerNorm1D<M>);
Expand description

Requires Nightly A single transformer encoder block

Generics

  • M The embedding size of token vectors.
  • I The inner size of the feedforward layers.
  • K The size of the keys and queries in the self attention layer.
  • H The number of heads for self attention. TODO: Doctests