Struct rust_bert::pipelines::masked_language::MaskedLanguageConfig

source ·

pub struct MaskedLanguageConfig {
    pub model_type: ModelType,
    pub model_resource: Box<dyn ResourceProvider + Send>,
    pub config_resource: Box<dyn ResourceProvider + Send>,
    pub vocab_resource: Box<dyn ResourceProvider + Send>,
    pub merges_resource: Option<Box<dyn ResourceProvider + Send>>,
    pub lower_case: bool,
    pub strip_accents: Option<bool>,
    pub add_prefix_space: Option<bool>,
    pub mask_token: Option<String>,
    pub device: Device,
}

Expand description

Configuration for MaskedLanguageModel

Contains information regarding the model to load and device to place the model on.

Fields§

§model_type: ModelType

Model type

§model_resource: Box<dyn ResourceProvider + Send>

Model weights resource (default: pretrained BERT model on CoNLL)

§config_resource: Box<dyn ResourceProvider + Send>

Config resource (default: pretrained BERT model on CoNLL)

§vocab_resource: Box<dyn ResourceProvider + Send>

Vocab resource (default: pretrained BERT model on CoNLL)

§merges_resource: Option<Box<dyn ResourceProvider + Send>>

Merges resource (default: None)

§lower_case: bool

Automatically lower case all input upon tokenization (assumes a lower-cased model)

§strip_accents: Option<bool>

Flag indicating if the tokenizer should strip accents (normalization). Only used for BERT / ALBERT models

§add_prefix_space: Option<bool>

Flag indicating if the tokenizer should add a white space before each tokenized input (needed for some Roberta models)

§mask_token: Option<String>

Token used for masking words. This is the token which the model will try to predict.

§device: Device

Device to place the model on (default: CUDA/GPU when available)

Implementations§

source §

impl MaskedLanguageConfig

source

pub fn new<RM, RC, RV>(
 model_type: ModelType,
 model_resource: RM,
 config_resource: RC,
 vocab_resource: RV,
 merges_resource: Option<RV>,
 lower_case: bool,
 strip_accents: impl Into<Option<bool>>,
 add_prefix_space: impl Into<Option<bool>>,
 mask_token: impl Into<Option<String>>
) -> MaskedLanguageConfigwhere
 RM: ResourceProvider + Send + 'static,
 RC: ResourceProvider + Send + 'static,
 RV: ResourceProvider + Send + 'static,

Instantiate a new masked language configuration of the supplied type.

Arguments

model_type - ModelType indicating the model type to load (must match with the actual data to be loaded!)
model_resource - The ResourceProvider pointing to the model to load (e.g. model.ot)
config - The ResourceProvider pointing to the model configuration to load (e.g. config.json)
vocab - The ResourceProvider pointing to the tokenizer’s vocabulary to load (e.g. vocab.txt/vocab.json)
vocab - An optional ResourceProvider pointing to the tokenizer’s merge file to load (e.g. merges.txt), needed only for Roberta.
lower_case - A bool indicating whether the tokenizer should lower case all input (in case of a lower-cased model)
mask_token - A token used for model to predict masking words..