Skip to main content

Module tokenizer

sqlite_graphrag

Module tokenizer

Expand description

Real tokenizer of the embedding model for accurate token counting and chunking. Token-count utilities for embedding input sizing.

Provides fast approximate token counting used to decide whether a body fits in a single chunk or requires the multi-chunk splitter.

Functions§

count_passage_tokens: Counts the tokens produced by encoding text with the passage prefix.
get_model_max_length: Returns the model’s model_max_length from tokenizer_config.json.
get_tokenizer: Returns the process-wide Tokenizer singleton, initializing it on first call.
passage_token_offsets: Returns the byte-offset pairs (start, end) for each token in text.