pub struct TransformerDecoderConfig {
pub d_model: usize,
pub d_ff: usize,
pub n_heads: usize,
pub n_layers: usize,
pub dropout: f64,
pub norm_first: bool,
pub quiet_softmax: bool,
pub initializer: Initializer,
pub activation: ActivationConfig,
pub layer_norm_eps: f64,
}Expand description
Configuration to create a Transformer Decoder layer using the init function.
Fields§
§d_model: usizeThe size of the model.
d_ff: usizeThe size of the position-wise feed-forward network.
n_heads: usizeThe number of attention heads.
n_layers: usizeThe number of layers.
dropout: f64The dropout rate. Default: 0.1
norm_first: boolLayer norm will be applied first instead of after the other modules.
quiet_softmax: boolUse “quiet softmax” instead of regular softmax.
- Usage may improve performance by allowing attention heads to deposit no information (if the sequence contains no information relevant to that head).
- Usage may reduce the entropy of weights in the model, enhancing quantization and compression.
Reference: https://www.evanmiller.org/attention-is-off-by-one.html
initializer: InitializerThe type of function used to initialize neural network parameters
activation: ActivationConfigThe activation function used in the position-wise feed-forward network. Default: Gelu
layer_norm_eps: f64The epsilon value for layer normalization. Default: 1e-5
Implementations§
Source§impl TransformerDecoderConfig
impl TransformerDecoderConfig
Sourcepub fn new(
d_model: usize,
d_ff: usize,
n_heads: usize,
n_layers: usize,
) -> TransformerDecoderConfig
pub fn new( d_model: usize, d_ff: usize, n_heads: usize, n_layers: usize, ) -> TransformerDecoderConfig
Create a new instance of the config.
§Arguments
§Required Arguments
§d_model
The size of the model.
§d_ff
The size of the position-wise feed-forward network.
§n_heads
The number of attention heads.
§n_layers
The number of layers.
§Default Arguments
§dropout
The dropout rate. Default: 0.1
- Defaults to
0.1
§norm_first
Layer norm will be applied first instead of after the other modules.
- Defaults to
false
§quiet_softmax
Use “quiet softmax” instead of regular softmax.
- Usage may improve performance by allowing attention heads to deposit no information (if the sequence contains no information relevant to that head).
- Usage may reduce the entropy of weights in the model, enhancing quantization and compression.
Reference: https://www.evanmiller.org/attention-is-off-by-one.html
- Defaults to
false
§initializer
The type of function used to initialize neural network parameters
- Defaults to
"Initializer::KaimingUniform{gain:1.0/num_traits::Float::sqrt(3.0), fan_out_only:false}"
§activation
The activation function used in the position-wise feed-forward network. Default: Gelu
- Defaults to
"ActivationConfig::Gelu"
§layer_norm_eps
The epsilon value for layer normalization. Default: 1e-5
- Defaults to
1e-5
Source§impl TransformerDecoderConfig
impl TransformerDecoderConfig
Sourcepub fn with_dropout(self, dropout: f64) -> TransformerDecoderConfig
pub fn with_dropout(self, dropout: f64) -> TransformerDecoderConfig
Sourcepub fn with_norm_first(self, norm_first: bool) -> TransformerDecoderConfig
pub fn with_norm_first(self, norm_first: bool) -> TransformerDecoderConfig
Sets the value for the field norm_first.
Layer norm will be applied first instead of after the other modules.
- Defaults to
false
Sourcepub fn with_quiet_softmax(self, quiet_softmax: bool) -> TransformerDecoderConfig
pub fn with_quiet_softmax(self, quiet_softmax: bool) -> TransformerDecoderConfig
Sets the value for the field quiet_softmax.
Use “quiet softmax” instead of regular softmax.
- Usage may improve performance by allowing attention heads to deposit no information (if the sequence contains no information relevant to that head).
- Usage may reduce the entropy of weights in the model, enhancing quantization and compression.
Reference: https://www.evanmiller.org/attention-is-off-by-one.html
- Defaults to
false
Sourcepub fn with_initializer(
self,
initializer: Initializer,
) -> TransformerDecoderConfig
pub fn with_initializer( self, initializer: Initializer, ) -> TransformerDecoderConfig
Sets the value for the field initializer.
The type of function used to initialize neural network parameters
- Defaults to
"Initializer::KaimingUniform{gain:1.0/num_traits::Float::sqrt(3.0), fan_out_only:false}"
Sourcepub fn with_activation(
self,
activation: ActivationConfig,
) -> TransformerDecoderConfig
pub fn with_activation( self, activation: ActivationConfig, ) -> TransformerDecoderConfig
Sets the value for the field activation.
The activation function used in the position-wise feed-forward network. Default: Gelu
- Defaults to
"ActivationConfig::Gelu"
Sourcepub fn with_layer_norm_eps(
self,
layer_norm_eps: f64,
) -> TransformerDecoderConfig
pub fn with_layer_norm_eps( self, layer_norm_eps: f64, ) -> TransformerDecoderConfig
Sets the value for the field layer_norm_eps.
The epsilon value for layer normalization. Default: 1e-5
- Defaults to
1e-5
Source§impl TransformerDecoderConfig
impl TransformerDecoderConfig
Sourcepub fn init<B>(
&self,
device: &<B as BackendTypes>::Device,
) -> TransformerDecoder<B>where
B: Backend,
pub fn init<B>(
&self,
device: &<B as BackendTypes>::Device,
) -> TransformerDecoder<B>where
B: Backend,
Initialize a new Transformer Decoder module.
Trait Implementations§
Source§impl Clone for TransformerDecoderConfig
impl Clone for TransformerDecoderConfig
Source§fn clone(&self) -> TransformerDecoderConfig
fn clone(&self) -> TransformerDecoderConfig
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more