pub struct MultiHeadAttention<const EMBED_DIM: usize, const NUM_HEADS: usize, const K_DIM: usize = EMBED_DIM, const V_DIM: usize = EMBED_DIM> {
    pub w_q: Linear<EMBED_DIM, K_DIM>,
    pub w_k: Linear<EMBED_DIM, K_DIM>,
    pub w_v: Linear<EMBED_DIM, V_DIM>,
    pub w_o: Linear<V_DIM, EMBED_DIM>,
}
Expand description

Requires Nightly A multi-head attention layer.

Generics:

  • EMBED_DIM: The size of query vectors.
  • NUM_HEADS The number of heads to split query/key/value into.
  • Optional K_DIM: The size of key vectors. Defaults to EMBED_DIM
  • Optional V_DIM The size of value vectors. Defaults to EMBED_DIM

Pytorch equivalent: torch.nn.MultiheadAttention(EMBED_DIM, NUM_HEADS, batch_first=True)

Examples

  • MultiHeadAttention<8, 2> is an attention layer with 2 heads and 8 token, key and value dims.
  • MultiHeadAttention<8, 2, 6, 4> is an attention layer with the key and value dimension different than the embed dimension TODO: Doctests fail for some reason

Fields

w_q: Linear<EMBED_DIM, K_DIM>w_k: Linear<EMBED_DIM, K_DIM>w_v: Linear<EMBED_DIM, V_DIM>w_o: Linear<V_DIM, EMBED_DIM>

Trait Implementations

Updates self given the GradientProvider. When any parameters that are NOT present in G, then this function should add the tensor’s UniqueId to UnusedTensors. Read more
Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more
Returns the “default value” for a type. Read more
Reads this object from a ZipArchive. r with a base filename of filename_prefix. Read more
Loads data from a .npz zip archive at the specified path. Read more

Encoder-Decoder style self attention where one set of tensors is used for values and keys, and another is used for queries

The type that this unit produces given Input.

Batched Encoder-Decoder style self attention where one set of tensors is used for values and keys, and another is used for queries

The type that this unit produces given Input.
The type that this unit produces given Input.
Forward Input through the module and produce ModuleMut::Output. Read more
Mutate the unit’s parameters using rand::Rng. Each implementor of this trait decides how the parameters are initialized. In fact, some impls may not even use the rng. Read more
Write this object into ZipWriter w with a base filename of filename_prefix. Read more
Save this object into the .npz file determined located at path. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.