pub struct LocalTokenizer { /* private fields */ }Expand description
The local Gemini tokenizer.
Matches the Python SDK’s LocalTokenizer interface. Wraps a SentencePiece
processor loaded with the Gemma 3 model used by all Gemini models. The model
is embedded in the binary at compile time.
§Example
use gemini_tokenizer::LocalTokenizer;
let tok = LocalTokenizer::new("gemini-2.5-pro").unwrap();
let result = tok.count_tokens("Hello, world!", None);
println!("{}", result); // total_tokens=4Implementations§
Source§impl LocalTokenizer
impl LocalTokenizer
Sourcepub fn new(model_name: &str) -> Result<Self, TokenizerError>
pub fn new(model_name: &str) -> Result<Self, TokenizerError>
Creates a new tokenizer for the given Gemini model.
Validates the model name against the supported list (matching the Python
SDK’s _local_tokenizer_loader.py) and loads the embedded SentencePiece
model.
§Errors
TokenizerError::UnsupportedModelif the model name is not recognized.TokenizerError::ModelLoadErrorif the SentencePiece model fails to deserialize.
Sourcepub fn model_name(&self) -> &str
pub fn model_name(&self) -> &str
Returns the model name this tokenizer was created for.
Sourcepub fn vocab_size(&self) -> usize
pub fn vocab_size(&self) -> usize
Returns the vocabulary size of the loaded model.
Sourcepub fn count_tokens<'a>(
&self,
contents: impl Into<Contents<'a>>,
config: Option<&CountTokensConfig>,
) -> CountTokensResult
pub fn count_tokens<'a>( &self, contents: impl Into<Contents<'a>>, config: Option<&CountTokensConfig>, ) -> CountTokensResult
Counts the number of tokens in the given contents.
Accepts either a plain text string or structured Content objects via the
Contents enum. An optional CountTokensConfig can provide tools,
system instruction, and response schema that contribute additional tokens.
This matches the Python SDK’s LocalTokenizer.count_tokens() method.
§Example
use gemini_tokenizer::LocalTokenizer;
let tok = LocalTokenizer::new("gemini-2.0-flash").unwrap();
// Plain text
let result = tok.count_tokens("What is your name?", None);
assert_eq!(result.total_tokens, 5);Sourcepub fn compute_tokens<'a>(
&self,
contents: impl Into<Contents<'a>>,
) -> ComputeTokensResult
pub fn compute_tokens<'a>( &self, contents: impl Into<Contents<'a>>, ) -> ComputeTokensResult
Computes token IDs and byte pieces for the given contents.
Returns a ComputeTokensResult with one TokensInfo entry per
content part, preserving the role from the parent Content object.
This matches the Python SDK’s LocalTokenizer.compute_tokens() method.
§Example
use gemini_tokenizer::LocalTokenizer;
let tok = LocalTokenizer::new("gemini-2.5-pro").unwrap();
let result = tok.compute_tokens("Hello");
assert_eq!(result.tokens_info.len(), 1);
assert!(!result.tokens_info[0].token_ids.is_empty());
assert_eq!(result.tokens_info[0].role, Some("user".to_string()));Sourcepub fn processor(&self) -> &SentencePieceProcessor
pub fn processor(&self) -> &SentencePieceProcessor
Returns a reference to the underlying SentencePiece processor.