Struct rust_tokenizers::TokenIdsWithSpecialTokens [−][src]
pub struct TokenIdsWithSpecialTokens { pub token_ids: Vec<i64>, pub segment_ids: Vec<i8>, pub special_tokens_mask: Vec<i8>, pub token_offsets: Vec<Option<Offset>>, pub reference_offsets: Vec<Vec<OffsetSize>>, pub mask: Vec<Mask>, }
Expand description
Encoded input with special tokens
Intermediate tokenization steps before truncation to a maximum length, after encoding and addition of special tokens
Fields
token_ids: Vec<i64>
Vector of token IDs
segment_ids: Vec<i8>
Vector segments ids (for example for BERT segments are separated with a [SEP] marker, each incrementing the segment ID). This vector has the same length as token_ids.
special_tokens_mask: Vec<i8>
Flags tokens as special tokens (1) or not (0). This vector has the same length as token_ids.
token_offsets: Vec<Option<Offset>>
Offset information (as start and end positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
reference_offsets: Vec<Vec<OffsetSize>>
Offset information (as a sequence of positions) in relation to the original text. Tokens that can not be related to the original source are registered as None.
mask: Vec<Mask>
Masks tokens providing information on the type of tokens. This vector has the same length as token_ids.
Trait Implementations
Auto Trait Implementations
impl RefUnwindSafe for TokenIdsWithSpecialTokens
impl Send for TokenIdsWithSpecialTokens
impl Sync for TokenIdsWithSpecialTokens
impl Unpin for TokenIdsWithSpecialTokens
impl UnwindSafe for TokenIdsWithSpecialTokens
Blanket Implementations
Mutably borrows from an owned value. Read more