Module tokenizers

Module tokenizers 

Source

Re-exports§

pub use hf::HuggingFaceTokenizer;

Modules§

hf
traits

Structs§

DecodeStream
DecodeStream will keep the state necessary to produce individual chunks of strings given an input stream of token_ids.
Error
The Error type, a wrapper around a dynamic error type.
Sequence
Maintains state for an ongoing sequence of tokens and their decoded text
StopSequenceDecoder
A Sequence for decoding a stream of token ids into text and detecting stop sequences. A stop sequence is either a matching token_id or a sequence of texts/strings which match. Matches happen first at the token-level, then at the sequence-level. Hidden takes precedence over visible. For example, if you put the same token_id in both stop_token_ids_visible and stop_token_ids_hidden, the token_id will be treated as hidden.
StopSequenceDecoderBuilder
Tokenizer
Main tokenizer wrapper that provides a unified interface for different tokenizer implementations

Enums§

Encoding
Contains the results of tokenizing text: token IDs, string tokens, and their spans
SequenceDecoderOutput
The output conditions/values of a SequenceDecoder::add_token_id operation. Result of decoding a token, indicating whether text was produced or a stop condition was met
TokenizerType
Represents the type of tokenizer being used

Functions§

create_tokenizer_from_file
Create a tokenizer from a file path to a tokenizer file. The file extension is used to determine the tokenizer type. Supported file types are:

Type Aliases§

Offsets
character offsets in the original text
Result
Result<T, Error>