Skip to main content

batch_encode

Function batch_encode 

Source
pub fn batch_encode<T: TransformerTokenizer>(
    texts: &[&str],
    tokenizer: &T,
    config: &BatchConfig,
) -> BatchEncoding
Expand description

Batch-encode multiple texts using a tokenizer.

Each text is independently encoded, then optionally truncated and padded according to the provided BatchConfig.

Padding is added on the right side (standard for most models). For left-padding, use batch_encode_ext.

ยงArguments

  • texts: slice of input strings
  • tokenizer: any tokenizer implementing TransformerTokenizer
  • config: batch configuration (max_length, padding, truncation, pad token)