Expand description
Tokenizer implementations for various LLM models
This module provides the core tokenization functionality for supported LLM models.
§Architecture
The tokenization system uses a trait-based design for extensibility:
Tokenizer- Trait for all tokenizer implementationsopenai::OpenAITokenizer- OpenAI model tokenizer using tiktokenregistry::ModelRegistry- Registry of supported models with lazy initialization
§Example
use token_count::tokenizers::registry::ModelRegistry;
// Get the global model registry
let registry = ModelRegistry::global();
// Get a tokenizer for a specific model
let tokenizer = registry.get_tokenizer("gpt-4", false).unwrap();
// Count tokens
let count = tokenizer.count_tokens("Hello world").unwrap();
assert_eq!(count, 2);
// Get model information
let info = tokenizer.get_model_info();
assert_eq!(info.name, "gpt-4");
assert_eq!(info.encoding, "cl100k_base");§Supported Models
Currently supports:
- OpenAI models: GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, GPT-4o
- Claude models: Claude 4.0-4.6 (Opus, Sonnet, Haiku variants)
See registry::ModelRegistry for model configuration and aliases.
Modules§
- claude
- Tokenizer implementation for Anthropic Claude models
- openai
- OpenAI tokenization using tiktoken-rs
- registry
- Model registry for managing supported models
Structs§
- Model
Info - Information about a tokenization model
- Tokenization
Result - Result of tokenization operation
Enums§
- Token
Count - Result of token counting, indicating whether count is estimated or exact
Traits§
- Tokenizer
- Trait for tokenizing text with a specific model