Struct rust_tokenizers::TokenizedInput
source · pub struct TokenizedInput {
pub token_ids: Vec<i64>,
pub segment_ids: Vec<i8>,
pub special_tokens_mask: Vec<i8>,
pub overflowing_tokens: Vec<i64>,
pub num_truncated_tokens: usize,
pub token_offsets: Vec<Option<Offset>>,
pub reference_offsets: Vec<Vec<OffsetSize>>,
pub mask: Vec<Mask>,
}
Expand description
Tokenized Input, ready for processing in language models
This represents the final output of the encoding process (tokenized sentence with encoded values)
Fields§
§token_ids: Vec<i64>
Vector of token IDs
segment_ids: Vec<i8>
Vector segments ids (for example for BERT segments are separated with a [SEP] marker, each incrementing the segment ID). This vector has the same length as token_ids.
special_tokens_mask: Vec<i8>
Flags tokens as special tokens (1) or not (0). This vector has the same length as token_ids.
overflowing_tokens: Vec<i64>
Vector containing overflowing tokens, populated following a truncation step
num_truncated_tokens: usize
Number of overflowing tokens following a truncation step. this equals the length overflowing_tokens
token_offsets: Vec<Option<Offset>>
Offset information (as start and end positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
reference_offsets: Vec<Vec<OffsetSize>>
Offset information (as a sequence of positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
mask: Vec<Mask>
Masks tokens providing information on the type of tokens. This vector has the same length as token_ids.
Trait Implementations§
source§impl Clone for TokenizedInput
impl Clone for TokenizedInput
source§fn clone(&self) -> TokenizedInput
fn clone(&self) -> TokenizedInput
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for TokenizedInput
impl Debug for TokenizedInput
source§impl PartialEq for TokenizedInput
impl PartialEq for TokenizedInput
source§fn eq(&self, other: &TokenizedInput) -> bool
fn eq(&self, other: &TokenizedInput) -> bool
self
and other
values to be equal, and is used
by ==
.source§impl PartialOrd for TokenizedInput
impl PartialOrd for TokenizedInput
source§fn partial_cmp(&self, other: &TokenizedInput) -> Option<Ordering>
fn partial_cmp(&self, other: &TokenizedInput) -> Option<Ordering>
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self
and other
) and is used by the <=
operator. Read more