pub fn truncate_sequences(
token_ids_with_offsets_1: TokenIdsWithOffsets,
token_ids_with_offsets_2: Option<TokenIdsWithOffsets>,
num_tokens_to_remove: usize,
truncation_strategy: &TruncationStrategy,
stride: usize
) -> Result<(TokenIdsWithOffsets, Option<TokenIdsWithOffsets>, Vec<i64>, Vec<Option<Offset>>), TokenizerError>
- tokens_1: list of tokenized input ids. Can be obtained from a string by chaining the
tokenize
and convert_tokens_to_ids
methods.
- tokens_2: Optional second list of input ids. Can be obtained from a string by chaining the
tokenize
and convert_tokens_to_ids
methods.
- offsets: list of offsets for tokens_1 (must be same length or empty if not used at all)
- offsets_2: optional second list of offsets for tokens_2 (must be same length or empty if not used at all)
- tokens_2: Optional second list of input ids. Can be obtained from a string by chaining the
tokenize
and convert_tokens_to_ids
methods.
- num_tokens_to_remove
number of tokens to remove using the truncation strategy
- truncation_strategy: truncation strategy
- TruncationStrategy::LongestFirst (default) Iteratively reduce the inputs sequence until the input is under max_length
starting from the longest one at each token (when there is a pair of input sequences).
Overflowing tokens only contains overflow from the first sequence.
- TruncationStrategy::OnlyFirst: Only truncate the first sequence. raise an error if the first sequence is shorter or equal to than num_tokens_to_remove.
- TruncationStrategy::OnlySecond: Only truncate the second sequence
- TruncationStrategy::DoNotTruncate: Does not truncate (raise an error if the input sequence is longer than max_length)
- stride
If set to a number along with max_length, the overflowing tokens returned will contain some tokens
from the main sequence returned. The value of this argument defines the number of additional tokens.