Module tokenizer

Module tokenizer

Expand description

Token counting utilities.

This module provides accurate token counting using tiktoken encodings for OpenAI-compatible models, with heuristic fallback for others.

§Supported Encodings

cl100k_base: GPT-3.5, GPT-4, Claude (approximate)
o200k_base: GPT-4o, o1, o3 models
heuristic: ~4 characters per token fallback

§Example

use m2m::tokenizer::{count_tokens, count_tokens_with_encoding};
use m2m::models::Encoding;

// Count with default encoding (cl100k)
let tokens = count_tokens("Hello, world!");
println!("Token count: {}", tokens);

// Count with specific encoding
let tokens = count_tokens_with_encoding("Hello, world!", Encoding::O200kBase);
println!("Token count (o200k): {}", tokens);

Structs§

TokenCounter: Token counter with caching and batch support

Functions§

count_tokens: Count tokens using the default encoding (cl100k_base)
count_tokens_for_model: Count tokens for a specific model ID
count_tokens_with_encoding: Count tokens with a specific encoding
estimate_savings: Estimate token savings from compression