Struct dfdx::nn::TransformerDecoderBlock
source · [−]pub struct TransformerDecoderBlock<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> {
pub self_attn: MultiHeadAttention<MODEL_DIM, NUM_HEADS>,
pub norm1: LayerNorm1D<MODEL_DIM>,
pub mh_attn: MultiHeadAttention<MODEL_DIM, NUM_HEADS>,
pub norm2: LayerNorm1D<MODEL_DIM>,
pub ff: Residual<(Linear<M, F>, ReLU, Linear<F, M>)>,
pub norm3: LayerNorm1D<MODEL_DIM>,
}
Expand description
Requires Nightly A transformer decoder block. Different than the normal transformer block as this self attention accepts an additional sequence from the encoder.
Generics
MODEL_DIM
: The size of query/key/value tensors. Given to MultiHeadAttention.NUM_HEADS
: The number of heads in MultiHeadAttention.FF_DIM
: The size of the hidden layer in the feedforward network.
Pytorch equivalent:
decoder = torch.nn.TransformerDecoderLayer(
EMBED_DIM, NUM_HEADS, dim_feedforward=FF_DIM, batch_first=True, dropout=0.0
)
TODO: Doctests
Fields
self_attn: MultiHeadAttention<MODEL_DIM, NUM_HEADS>
norm1: LayerNorm1D<MODEL_DIM>
mh_attn: MultiHeadAttention<MODEL_DIM, NUM_HEADS>
norm2: LayerNorm1D<MODEL_DIM>
ff: Residual<(Linear<M, F>, ReLU, Linear<F, M>)>
norm3: LayerNorm1D<MODEL_DIM>
Trait Implementations
sourceimpl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> CanUpdateWithGradients for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> CanUpdateWithGradients for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
sourcefn update<G: GradientProvider>(
&mut self,
grads: &mut G,
unused: &mut UnusedTensors
)
fn update<G: GradientProvider>(
&mut self,
grads: &mut G,
unused: &mut UnusedTensors
)
Updates self given the GradientProvider. When any parameters that
are NOT present in
G
, then this function should
add the tensor’s UniqueId to UnusedTensors. Read moresourceimpl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Clone for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Clone for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
sourcefn clone(&self) -> TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
fn clone(&self) -> TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
Returns a copy of the value. Read more
1.0.0 · sourcefn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source
. Read moresourceimpl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Debug for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Debug for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
sourceimpl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Default for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Default for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
sourcefn default() -> TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
fn default() -> TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
Returns the “default value” for a type. Read more
sourceimpl<const M: usize, const H: usize, const F: usize> LoadFromNpz for TransformerDecoderBlock<M, H, F>
impl<const M: usize, const H: usize, const F: usize> LoadFromNpz for TransformerDecoderBlock<M, H, F>
sourceimpl<const M: usize, const H: usize, const F: usize, Tgt, Mem> Module<(Tgt, Mem)> for TransformerDecoderBlock<M, H, F>where
Tgt: Tensor<Dtype = f32>,
Mem: Tensor<Dtype = f32, NoTape = Mem> + Clone,
MultiHeadAttention<M, H>: Module<(Tgt, Tgt::NoTape, Tgt::NoTape), Output = Tgt> + Module<(Tgt, Mem, Mem), Output = Tgt>,
LayerNorm1D<M>: Module<Tgt, Output = Tgt>,
Residual<(Linear<M, F>, ReLU, Linear<F, M>)>: Module<Tgt, Output = Tgt>,
impl<const M: usize, const H: usize, const F: usize, Tgt, Mem> Module<(Tgt, Mem)> for TransformerDecoderBlock<M, H, F>where
Tgt: Tensor<Dtype = f32>,
Mem: Tensor<Dtype = f32, NoTape = Mem> + Clone,
MultiHeadAttention<M, H>: Module<(Tgt, Tgt::NoTape, Tgt::NoTape), Output = Tgt> + Module<(Tgt, Mem, Mem), Output = Tgt>,
LayerNorm1D<M>: Module<Tgt, Output = Tgt>,
Residual<(Linear<M, F>, ReLU, Linear<F, M>)>: Module<Tgt, Output = Tgt>,
type Output = Tgt
type Output = Tgt
The type that this unit produces given
Input
.sourcefn forward(&self, (tgt, mem): (Tgt, Mem)) -> Self::Output
fn forward(&self, (tgt, mem): (Tgt, Mem)) -> Self::Output
sourceimpl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> ResetParams for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> ResetParams for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
Auto Trait Implementations
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> RefUnwindSafe for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Send for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Sync for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> Unpin for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
impl<const MODEL_DIM: usize, const NUM_HEADS: usize, const FF_DIM: usize> UnwindSafe for TransformerDecoderBlock<MODEL_DIM, NUM_HEADS, FF_DIM>
Blanket Implementations
sourceimpl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more