Expand description
Token counting library for LLM models
This library provides exact tokenization for various LLM models using their official tokenizers.
§Features
- Exact tokenization for OpenAI models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o)
- Model aliases with case-insensitive matching
- Fuzzy suggestions for typos and unknown models
- Zero runtime dependencies - all tokenizers embedded
- Fast and efficient - ~2.7µs for small inputs
§Quick Start
use token_count::count_tokens;
// Count tokens for a specific model
let result = count_tokens("Hello world", "gpt-4", false).unwrap();
assert_eq!(result.token_count, 2);
println!("Tokens: {}", result.token_count);
println!("Model: {}", result.model_info.name);§Supported Models
gpt-3.5-turbo- GPT-3.5 Turbo (16K context)gpt-4- GPT-4 (128K context)gpt-4-turbo- GPT-4 Turbo (128K context)gpt-4o- GPT-4o (128K context)
All models support aliases (e.g., gpt4, GPT-4, openai/gpt-4).
§Error Handling
use token_count::{count_tokens, TokenError};
// Unknown model returns an error with suggestions
match count_tokens("test", "gpt-5", false) {
Ok(_) => panic!("Should have failed"),
Err(TokenError::UnknownModel { model, suggestion }) => {
assert_eq!(model, "gpt-5");
assert!(suggestion.contains("Did you mean"));
}
Err(_) => panic!("Wrong error type"),
}§Architecture
The library is organized into several modules:
tokenizers- Core tokenization engine and model registryoutput- Output formatting (simple, verbose, debug)cli- Command-line interface componentserror- Error types and handlingapi- API integration utilities (consent prompts, etc.)
The main entry point is the count_tokens function, which takes text and a model name
and returns a TokenizationResult with the token count and model information.
Re-exports§
pub use error::TokenError;pub use output::select_formatter;pub use output::OutputFormatter;pub use tokenizers::ModelInfo;pub use tokenizers::TokenizationResult;pub use tokenizers::Tokenizer;
Modules§
- api
- API integration utilities for external services
- cli
- CLI module for command-line interface
- error
- Error types for token counting operations
- output
- Output formatting for different verbosity levels
- tokenizers
- Tokenizer implementations for various LLM models
Functions§
- count_
tokens - Count tokens in the given text using the specified model