Struct rust_tokenizers::TokenizedInput[][src]

pub struct TokenizedInput {
    pub token_ids: Vec<i64>,
    pub segment_ids: Vec<i8>,
    pub special_tokens_mask: Vec<i8>,
    pub overflowing_tokens: Vec<i64>,
    pub num_truncated_tokens: usize,
    pub token_offsets: Vec<Option<Offset>>,
    pub reference_offsets: Vec<Vec<OffsetSize>>,
    pub mask: Vec<Mask>,
}
Expand description

Tokenized Input, ready for processing in language models

This represents the final output of the encoding process (tokenized sentence with encoded values)

Fields

token_ids: Vec<i64>

Vector of token IDs

segment_ids: Vec<i8>

Vector segments ids (for example for BERT segments are separated with a [SEP] marker, each incrementing the segment ID). This vector has the same length as token_ids.

special_tokens_mask: Vec<i8>

Flags tokens as special tokens (1) or not (0). This vector has the same length as token_ids.

overflowing_tokens: Vec<i64>

Vector containing overflowing tokens, populated following a truncation step

num_truncated_tokens: usize

Number of overflowing tokens following a truncation step. this equals the length overflowing_tokens

token_offsets: Vec<Option<Offset>>

Offset information (as start and end positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.

reference_offsets: Vec<Vec<OffsetSize>>

Offset information (as a sequence of positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.

mask: Vec<Mask>

Masks tokens providing information on the type of tokens. This vector has the same length as token_ids.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

This method returns an ordering between self and other values if one exists. Read more

This method tests less than (for self and other) and is used by the < operator. Read more

This method tests less than or equal to (for self and other) and is used by the <= operator. Read more

This method tests greater than (for self and other) and is used by the > operator. Read more

This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

recently added

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.