Struct rust_tokenizers::TokenizedInput [−][src]
pub struct TokenizedInput { pub token_ids: Vec<i64>, pub segment_ids: Vec<i8>, pub special_tokens_mask: Vec<i8>, pub overflowing_tokens: Vec<i64>, pub num_truncated_tokens: usize, pub token_offsets: Vec<Option<Offset>>, pub reference_offsets: Vec<Vec<OffsetSize>>, pub mask: Vec<Mask>, }
Expand description
Tokenized Input, ready for processing in language models
This represents the final output of the encoding process (tokenized sentence with encoded values)
Fields
token_ids: Vec<i64>
Vector of token IDs
segment_ids: Vec<i8>
Vector segments ids (for example for BERT segments are separated with a [SEP] marker, each incrementing the segment ID). This vector has the same length as token_ids.
special_tokens_mask: Vec<i8>
Flags tokens as special tokens (1) or not (0). This vector has the same length as token_ids.
overflowing_tokens: Vec<i64>
Vector containing overflowing tokens, populated following a truncation step
num_truncated_tokens: usize
Number of overflowing tokens following a truncation step. this equals the length overflowing_tokens
token_offsets: Vec<Option<Offset>>
Offset information (as start and end positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
reference_offsets: Vec<Vec<OffsetSize>>
Offset information (as a sequence of positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
mask: Vec<Mask>
Masks tokens providing information on the type of tokens. This vector has the same length as token_ids.
Trait Implementations
This method tests for self
and other
values to be equal, and is used
by ==
. Read more
This method tests for !=
.
This method returns an ordering between self
and other
values if one exists. Read more
This method tests less than (for self
and other
) and is used by the <
operator. Read more
This method tests less than or equal to (for self
and other
) and is used by the <=
operator. Read more
This method tests greater than (for self
and other
) and is used by the >
operator. Read more
Auto Trait Implementations
impl RefUnwindSafe for TokenizedInput
impl Send for TokenizedInput
impl Sync for TokenizedInput
impl Unpin for TokenizedInput
impl UnwindSafe for TokenizedInput
Blanket Implementations
Mutably borrows from an owned value. Read more