pub struct MaskedLanguageConfig {
    pub model_type: ModelType,
    pub model_resource: Box<dyn ResourceProvider + Send>,
    pub config_resource: Box<dyn ResourceProvider + Send>,
    pub vocab_resource: Box<dyn ResourceProvider + Send>,
    pub merges_resource: Option<Box<dyn ResourceProvider + Send>>,
    pub lower_case: bool,
    pub strip_accents: Option<bool>,
    pub add_prefix_space: Option<bool>,
    pub mask_token: Option<String>,
    pub device: Device,
}
Expand description

Configuration for MaskedLanguageModel

Contains information regarding the model to load and device to place the model on.

Fields§

§model_type: ModelType

Model type

§model_resource: Box<dyn ResourceProvider + Send>

Model weights resource (default: pretrained BERT model on CoNLL)

§config_resource: Box<dyn ResourceProvider + Send>

Config resource (default: pretrained BERT model on CoNLL)

§vocab_resource: Box<dyn ResourceProvider + Send>

Vocab resource (default: pretrained BERT model on CoNLL)

§merges_resource: Option<Box<dyn ResourceProvider + Send>>

Merges resource (default: None)

§lower_case: bool

Automatically lower case all input upon tokenization (assumes a lower-cased model)

§strip_accents: Option<bool>

Flag indicating if the tokenizer should strip accents (normalization). Only used for BERT / ALBERT models

§add_prefix_space: Option<bool>

Flag indicating if the tokenizer should add a white space before each tokenized input (needed for some Roberta models)

§mask_token: Option<String>

Token used for masking words. This is the token which the model will try to predict.

§device: Device

Device to place the model on (default: CUDA/GPU when available)

Implementations§

Instantiate a new masked language configuration of the supplied type.

Arguments
  • model_type - ModelType indicating the model type to load (must match with the actual data to be loaded!)
  • model_resource - The ResourceProvider pointing to the model to load (e.g. model.ot)
  • config - The ResourceProvider pointing to the model configuration to load (e.g. config.json)
  • vocab - The ResourceProvider pointing to the tokenizer’s vocabulary to load (e.g. vocab.txt/vocab.json)
  • vocab - An optional ResourceProvider pointing to the tokenizer’s merge file to load (e.g. merges.txt), needed only for Roberta.
  • lower_case - A bool indicating whether the tokenizer should lower case all input (in case of a lower-cased model)
  • mask_token - A token used for model to predict masking words..

Trait Implementations§

Provides a BERT language model

Auto Trait Implementations§

Blanket Implementations§

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.
The type for initializers.
Initializes a with the given initializer. Read more
Dereferences the given pointer. Read more
Mutably dereferences the given pointer. Read more
Drops the object pointed to by the given pointer. Read more
Should always be Self
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.
Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more