Struct LocalTokenizer

Source

pub struct LocalTokenizer { /* private fields */ }

Expand description

The local Gemini tokenizer.

Matches the Python SDK’s LocalTokenizer interface. Wraps a SentencePiece processor loaded with the Gemma 3 model used by all Gemini models. The model is embedded in the binary at compile time.

§Example

use gemini_tokenizer::LocalTokenizer;

let tok = LocalTokenizer::new("gemini-2.5-pro").unwrap();
let result = tok.count_tokens("Hello, world!", None);
println!("{}", result); // total_tokens=4

Implementations§

Source §

impl LocalTokenizer

Source

pub fn new(model_name: &str) -> Result<Self, TokenizerError>

Creates a new tokenizer for the given Gemini model.

Validates the model name against the supported list (matching the Python SDK’s _local_tokenizer_loader.py) and loads the embedded SentencePiece model.

§Errors

TokenizerError::UnsupportedModel if the model name is not recognized.
TokenizerError::ModelLoadError if the SentencePiece model fails to deserialize.

Source

pub fn model_name(&self) -> &str

Returns the model name this tokenizer was created for.

Source

pub fn vocab_size(&self) -> usize

Returns the vocabulary size of the loaded model.

Source

pub fn count_tokens<'a>( &self, contents: impl Into<Contents<'a>>, config: Option<&CountTokensConfig>, ) -> CountTokensResult

Counts the number of tokens in the given contents.

Accepts either a plain text string or structured Content objects via the Contents enum. An optional CountTokensConfig can provide tools, system instruction, and response schema that contribute additional tokens.

This matches the Python SDK’s LocalTokenizer.count_tokens() method.

§Example

use gemini_tokenizer::LocalTokenizer;

let tok = LocalTokenizer::new("gemini-2.0-flash").unwrap();

// Plain text
let result = tok.count_tokens("What is your name?", None);
assert_eq!(result.total_tokens, 5);

Source

pub fn compute_tokens<'a>( &self, contents: impl Into<Contents<'a>>, ) -> ComputeTokensResult

Computes token IDs and byte pieces for the given contents.

Returns a ComputeTokensResult with one TokensInfo entry per content part, preserving the role from the parent Content object.

This matches the Python SDK’s LocalTokenizer.compute_tokens() method.

§Example

use gemini_tokenizer::LocalTokenizer;

let tok = LocalTokenizer::new("gemini-2.5-pro").unwrap();
let result = tok.compute_tokens("Hello");
assert_eq!(result.tokens_info.len(), 1);
assert!(!result.tokens_info[0].token_ids.is_empty());
assert_eq!(result.tokens_info[0].role, Some("user".to_string()));