Module rten_text::tokenizers

source ·
Expand description

Tokenizers for converting text into sequences of token IDs.

There are two ways to construct a tokenizer:

  1. Load a preconfigured tokenizer from JSON, using Tokenizer::from_json. This crate supports a subset of the tokenizer.json format that Hugging Face Tokenizers generates.

  2. Manually configure a Tokenizer by creating an Encoder implementation, such as WordPiece and then wrap it with a tokenizer using Tokenizer::new.

Modules§

  • Regex patterns used by popular tokenizer models.

Structs§

Enums§

Traits§

  • An Encoder implements a specific method of converting strings into token IDs using a pre-computed model.

Type Aliases§

  • Integer type used to represent token IDs.