Struct rust_bert::pipelines::token_classification::TokenClassificationConfig

source ·

pub struct TokenClassificationConfig {
    pub model_type: ModelType,
    pub model_resource: Box<dyn ResourceProvider + Send>,
    pub config_resource: Box<dyn ResourceProvider + Send>,
    pub vocab_resource: Box<dyn ResourceProvider + Send>,
    pub merges_resource: Option<Box<dyn ResourceProvider + Send>>,
    pub lower_case: bool,
    pub strip_accents: Option<bool>,
    pub add_prefix_space: Option<bool>,
    pub device: Device,
    pub label_aggregation_function: LabelAggregationOption,
    pub batch_size: usize,
}

Expand description

Configuration for TokenClassificationModel

Contains information regarding the model to load and device to place the model on.

Fields§

§model_type: ModelType

Model type

§model_resource: Box<dyn ResourceProvider + Send>

Model weights resource (default: pretrained BERT model on CoNLL)

§config_resource: Box<dyn ResourceProvider + Send>

Config resource (default: pretrained BERT model on CoNLL)

§vocab_resource: Box<dyn ResourceProvider + Send>

Vocab resource (default: pretrained BERT model on CoNLL)

§merges_resource: Option<Box<dyn ResourceProvider + Send>>

Merges resource (default: pretrained BERT model on CoNLL)

§lower_case: bool

Automatically lower case all input upon tokenization (assumes a lower-cased model)

§strip_accents: Option<bool>

Flag indicating if the tokenizer should strip accents (normalization). Only used for BERT / ALBERT models

§add_prefix_space: Option<bool>

Flag indicating if the tokenizer should add a white space before each tokenized input (needed for some Roberta models)

§device: Device

Device to place the model on (default: CUDA/GPU when available)

§label_aggregation_function: LabelAggregationOption

Sub-tokens aggregation method (default: LabelAggregationOption::First)

§batch_size: usize

Batch size for predictions

Implementations§

source §

impl TokenClassificationConfig

source

pub fn new<RM, RC, RV>(
 model_type: ModelType,
 model_resource: RM,
 config_resource: RC,
 vocab_resource: RV,
 merges_resource: Option<RV>,
 lower_case: bool,
 strip_accents: impl Into<Option<bool>>,
 add_prefix_space: impl Into<Option<bool>>,
 label_aggregation_function: LabelAggregationOption
) -> TokenClassificationConfigwhere
 RM: ResourceProvider + Send + 'static,
 RC: ResourceProvider + Send + 'static,
 RV: ResourceProvider + Send + 'static,

Instantiate a new token classification configuration of the supplied type.

Arguments

model_type - ModelType indicating the model type to load (must match with the actual data to be loaded!)
model - The ResourceProvider pointing to the model to load (e.g. model.ot)
config - The ResourceProvider pointing to the model configuration to load (e.g. config.json)
vocab - The ResourceProvider pointing to the tokenizers’ vocabulary to load (e.g. vocab.txt/vocab.json)
vocab - An optional ResourceProvider pointing to the tokenizers’ merge file to load (e.g. merges.txt), needed only for Roberta.
lower_case - A bool indicating whether the tokenizer should lower case all input (in case of a lower-cased model)