Skip to main content

LocalTokenizer

Struct LocalTokenizer 

Source
pub struct LocalTokenizer { /* private fields */ }
Expand description

The local Gemini tokenizer.

Matches the Python SDK’s LocalTokenizer interface. Wraps a SentencePiece processor loaded with the Gemma 3 model used by all Gemini models. The model is embedded in the binary at compile time.

§Example

use gemini_tokenizer::LocalTokenizer;

let tok = LocalTokenizer::new("gemini-2.5-pro").unwrap();
let result = tok.count_tokens("Hello, world!", None);
println!("{}", result); // total_tokens=4

Implementations§

Source§

impl LocalTokenizer

Source

pub fn new(model_name: &str) -> Result<Self, TokenizerError>

Creates a new tokenizer for the given Gemini model.

Validates the model name against the supported list (matching the Python SDK’s _local_tokenizer_loader.py) and loads the embedded SentencePiece model.

§Errors
Source

pub fn model_name(&self) -> &str

Returns the model name this tokenizer was created for.

Source

pub fn vocab_size(&self) -> usize

Returns the vocabulary size of the loaded model.

Source

pub fn count_tokens<'a>( &self, contents: impl Into<Contents<'a>>, config: Option<&CountTokensConfig>, ) -> CountTokensResult

Counts the number of tokens in the given contents.

Accepts either a plain text string or structured Content objects via the Contents enum. An optional CountTokensConfig can provide tools, system instruction, and response schema that contribute additional tokens.

This matches the Python SDK’s LocalTokenizer.count_tokens() method.

§Example
use gemini_tokenizer::LocalTokenizer;

let tok = LocalTokenizer::new("gemini-2.0-flash").unwrap();

// Plain text
let result = tok.count_tokens("What is your name?", None);
assert_eq!(result.total_tokens, 5);
Source

pub fn compute_tokens<'a>( &self, contents: impl Into<Contents<'a>>, ) -> ComputeTokensResult

Computes token IDs and byte pieces for the given contents.

Returns a ComputeTokensResult with one TokensInfo entry per content part, preserving the role from the parent Content object.

This matches the Python SDK’s LocalTokenizer.compute_tokens() method.

§Example
use gemini_tokenizer::LocalTokenizer;

let tok = LocalTokenizer::new("gemini-2.5-pro").unwrap();
let result = tok.compute_tokens("Hello");
assert_eq!(result.tokens_info.len(), 1);
assert!(!result.tokens_info[0].token_ids.is_empty());
assert_eq!(result.tokens_info[0].role, Some("user".to_string()));
Source

pub fn processor(&self) -> &SentencePieceProcessor

Returns a reference to the underlying SentencePiece processor.

Trait Implementations§

Source§

impl Debug for LocalTokenizer

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.