Expand description
§gemini-tokenizer
Authoritative Gemini tokenizer for Rust, ported from the official Google Python SDK (v1.6.20).
All Gemini models (gemini-2.0-flash, gemini-2.5-pro, gemini-3-pro-preview, etc.) use the same tokenizer: the Gemma 3 SentencePiece model with a vocabulary of 262,144 tokens. This crate embeds that model and provides a fast, local tokenizer that matches the official Google Python SDK’s behavior.
§Quick Start
use gemini_tokenizer::LocalTokenizer;
let tokenizer = LocalTokenizer::new("gemini-2.5-pro").expect("failed to load tokenizer");
// Count tokens in plain text
let result = tokenizer.count_tokens("What is your name?", None);
assert_eq!(result.total_tokens, 5);
// Get individual token details
let result = tokenizer.compute_tokens("Hello, world!");
for info in &result.tokens_info {
for (id, token) in info.token_ids.iter().zip(&info.tokens) {
println!("id={}, token={:?}", id, token);
}
}§Structured Content
The tokenizer also counts tokens in structured Gemini API content objects,
matching the Google Python SDK’s _TextsAccumulator logic:
use gemini_tokenizer::{LocalTokenizer, Content, Part, CountTokensConfig, Tool,
FunctionDeclaration, Schema};
let tokenizer = LocalTokenizer::new("gemini-2.5-pro").expect("failed to load tokenizer");
let contents = vec![Content {
role: Some("user".to_string()),
parts: Some(vec![Part {
text: Some("What is the weather in NYC?".to_string()),
..Default::default()
}]),
}];
let result = tokenizer.count_tokens(contents.as_slice(), None);
assert!(result.total_tokens > 0);Re-exports§
pub use accumulator::TextAccumulator;pub use types::*;
Modules§
- accumulator
- Text accumulation logic ported from the official Google Python SDK.
- types
- Lightweight types mirroring the Google Gemini API structures needed for token counting.
Structs§
- Local
Tokenizer - The local Gemini tokenizer.
Enums§
- Tokenizer
Error - Errors that can occur when creating or using the tokenizer.
Constants§
- MODEL_
SHA256 - The expected SHA-256 hash of the embedded SentencePiece model.
- VOCAB_
SIZE - The expected vocabulary size of the Gemma 3 tokenizer.
Functions§
- supported_
models - Returns the list of supported Gemini model names.
- verify_
model_ hash - Verifies that the embedded model’s SHA-256 hash matches the expected value.