pub struct XLNetLMHeadModel { /* private fields */ }
Expand description

XLNetLMHeadModel

XLNet model with a language model head for language generation tasks It is made of the following blocks:

  • base_model: XLNetModel
  • lm_head: Linear language modeling head, projecting the hidden state logits to the vocabulary space

Implementations

Build a new XLNetLMHeadModel

Arguments
  • p - Variable store path for the root of the XLNet model
  • config - XLNetConfig object defining the model architecture
Example
use rust_bert::xlnet::{XLNetConfig, XLNetLMHeadModel};
use rust_bert::Config;
use std::path::Path;
use tch::{nn, Device};

let config_path = Path::new("path/to/config.json");
let device = Device::Cpu;
let p = nn::VarStore::new(device);
let config = XLNetConfig::from_file(config_path);
let xlnet_model = XLNetLMHeadModel::new(&p.root(), &config);

Forward pass through the model

Arguments
  • input_ids - Optional input tensor of shape (batch size, sequence_length). This or input_embeds must be provided.
  • attention_mask - Optional attention mask of shape (batch size, sequence_length) for the encoder positions. Positions with a mask with value 0 will be masked.
  • perm_mask - Optional tensor of shape (batch size, sequence_length, sequence_length). Mask to indicate the attention pattern for each input token (only used for pre-training over permutations, rather than simple token masking).
  • target_mapping - Optional tensor of shape (batch size, num_tokens, sequence_length) indicating the position of the masked words to predict.
  • token_type_ids - Optional tensor (batch size, sequence_length) indicating the sentence ID of the token (0: first sentence, 1: second sentence).
  • input_embeds - Optional input tensor of shape (batch size, sequence_length, embeddings dimension). This or input_ids must be provided.
  • old_layer_states - Optional vector of length num_layers containing optional LayerStates containing the last calculated content for the attention layers. This avoids recomputing attention weights at past positions and speeds up decoding.
  • train - boolean flag to turn on/off the dropout layers in the model. Should be set to false for inference.
Returns
  • LMModelOutput containing:
    • lm_logits - Tensor of shape (batch size, sequence_length, vocab_size) representing the logits for each vocab item and position
    • cache - XLNetCache made of Option<Vec<Option<LayerState>>> of length n_layers and shape (past_sequence_length, batch size, hidden_size) containing the previous content
    • encoder_hidden_states - None
    • all_hidden_states - Option<Vec<Tensor>> of length n_layers with shape (batch size, sequence_length, hidden_size)
    • all_attentions - Option<Vec<Tensor>> of length n_layers with shape (batch size, sequence_length, hidden_size)
Example
use rust_bert::xlnet::{XLNetConfig, XLNetLMHeadModel};
let (batch_size, sequence_length) = (64, 128);
let input_tensor = Tensor::rand(&[batch_size, sequence_length], (Int64, device));
let attention_mask = Tensor::ones(&[batch_size, sequence_length], (Int64, device));
let target_tensor = Tensor::ones(&[batch_size, sequence_length], (Int64, device));
let target_mapping = Tensor::zeros(&[64, 1, 128], (Kind::Float, device));
let _ = target_mapping.narrow(2, 3, 1).fill_(1.0);

let model_output = no_grad(|| {
    xlnet_model.forward_t(
        Some(&input_tensor),
        Some(&attention_mask),
        None,
        Some(&target_mapping),
        None,
        None,
        None,
        false,
    )
});

Trait Implementations

Forward pass through the model

Arguments
  • input_ids - Optional input tensor of shape (batch size, sequence_length). This or input_embeds must be provided.
  • attention_mask - Optional attention mask of shape (batch size, sequence_length) for the encoder positions. Positions with a mask with value 0 will be masked.
  • perm_mask - Optional tensor of shape (batch size, sequence_length, sequence_length). Mask to indicate the attention pattern for each input token (only used for pre-training over permutations, rather than simple token masking).
  • target_mapping - Optional tensor of shape (batch size, num_tokens, sequence_length) indicating the position of the masked words to predict.
  • token_type_ids - Optional tensor (batch size, sequence_length) indicating the sentence ID of the token (0: first sentence, 1: second sentence).
  • input_embeds - Optional input tensor of shape (batch size, sequence_length, embeddings dimension). This or input_ids must be provided.
  • old_layer_states - Optional vector of length num_layers containing optional LayerStates containing the last calculated content for the attention layers. This avoids recomputing attention weights at past positions and speeds up decoding.
  • train - boolean flag to turn on/off the dropout layers in the model. Should be set to false for inference.
Returns
  • LMModelOutput containing:
    • lm_logits - Tensor of shape (batch size, sequence_length, vocab_size) representing the logits for each vocab item and position
    • cache - XLNetCache made of Option<Vec<Option<LayerState>>> of length n_layers and shape (past_sequence_length, batch size, hidden_size) containing the previous content
Example
use rust_bert::xlnet::{XLNetConfig, XLNetLMHeadModel};
let (batch_size, sequence_length) = (64, 128);
let input_tensor = Tensor::rand(&[batch_size, sequence_length], (Int64, device));
let attention_mask = Tensor::ones(&[batch_size, sequence_length], (Int64, device));
let target_tensor = Tensor::ones(&[batch_size, sequence_length], (Int64, device));
let target_mapping = Tensor::zeros(&[64, 1, 128], (Kind::Float, device));
let _ = target_mapping.narrow(2, 3, 1).fill_(1.0);

let model_output = no_grad(|| {
    xlnet_model.forward_t(
        Some(&input_tensor),
        Some(&attention_mask),
        None,
        Some(&target_mapping),
        None,
        None,
        None,
        false,
    )
});

Generate text based on a vector of promp texts. Read more

Generate token indices without decoding (useful for token-level operations before returning final text or as validation step during training). Read more

Generate token indices given a list of indices (useful when the input has been pre-tokenized). Returns a list of output tokens that need to be decoded using a tokenizer. Read more

Returns a reference to the text generator’s tokenizer Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

Should always be Self

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.