Struct rust_tokenizers::TokenizedInput
source · pub struct TokenizedInput {
pub token_ids: Vec<i64>,
pub segment_ids: Vec<i8>,
pub special_tokens_mask: Vec<i8>,
pub overflowing_tokens: Vec<i64>,
pub num_truncated_tokens: usize,
pub token_offsets: Vec<Option<Offset>>,
pub reference_offsets: Vec<Vec<OffsetSize>>,
pub mask: Vec<Mask>,
}Expand description
Tokenized Input, ready for processing in language models
This represents the final output of the encoding process (tokenized sentence with encoded values)
Fields§
§token_ids: Vec<i64>Vector of token IDs
segment_ids: Vec<i8>Vector segments ids (for example for BERT segments are separated with a [SEP] marker, each incrementing the segment ID). This vector has the same length as token_ids.
special_tokens_mask: Vec<i8>Flags tokens as special tokens (1) or not (0). This vector has the same length as token_ids.
overflowing_tokens: Vec<i64>Vector containing overflowing tokens, populated following a truncation step
num_truncated_tokens: usizeNumber of overflowing tokens following a truncation step. this equals the length overflowing_tokens
token_offsets: Vec<Option<Offset>>Offset information (as start and end positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
reference_offsets: Vec<Vec<OffsetSize>>Offset information (as a sequence of positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
mask: Vec<Mask>Masks tokens providing information on the type of tokens. This vector has the same length as token_ids.
Trait Implementations§
source§impl Clone for TokenizedInput
impl Clone for TokenizedInput
source§fn clone(&self) -> TokenizedInput
fn clone(&self) -> TokenizedInput
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moresource§impl Debug for TokenizedInput
impl Debug for TokenizedInput
source§impl PartialEq<TokenizedInput> for TokenizedInput
impl PartialEq<TokenizedInput> for TokenizedInput
source§fn eq(&self, other: &TokenizedInput) -> bool
fn eq(&self, other: &TokenizedInput) -> bool
self and other values to be equal, and is used
by ==.source§impl PartialOrd<TokenizedInput> for TokenizedInput
impl PartialOrd<TokenizedInput> for TokenizedInput
source§fn partial_cmp(&self, other: &TokenizedInput) -> Option<Ordering>
fn partial_cmp(&self, other: &TokenizedInput) -> Option<Ordering>
1.0.0 · source§fn le(&self, other: &Rhs) -> bool
fn le(&self, other: &Rhs) -> bool
self and other) and is used by the <=
operator. Read more