Skip to main content

Module tokenizer

Module tokenizer 

Source
Expand description

Real tokenizer of the embedding model for accurate token counting and chunking. Token-count utilities for embedding input sizing.

Provides fast approximate token counting used to decide whether a body fits in a single chunk or requires the multi-chunk splitter.

Functionsยง

count_passage_tokens
get_model_max_length
get_tokenizer
passage_token_offsets