Crate token_count

Expand description

Token counting library for LLM models

This library provides exact tokenization for various LLM models using their official tokenizers.

§Features

Exact tokenization for OpenAI models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o)
Model aliases with case-insensitive matching
Fuzzy suggestions for typos and unknown models
Zero runtime dependencies - all tokenizers embedded
Fast and efficient - ~2.7µs for small inputs

§Quick Start

use token_count::count_tokens;

// Count tokens for a specific model
let result = count_tokens("Hello world", "gpt-4", false).unwrap();
assert_eq!(result.token_count, 2);
println!("Tokens: {}", result.token_count);
println!("Model: {}", result.model_info.name);

§Supported Models

gpt-3.5-turbo - GPT-3.5 Turbo (16K context)
gpt-4 - GPT-4 (128K context)
gpt-4-turbo - GPT-4 Turbo (128K context)
gpt-4o - GPT-4o (128K context)

All models support aliases (e.g., gpt4, GPT-4, openai/gpt-4).

§Error Handling

use token_count::{count_tokens, TokenError};

// Unknown model returns an error with suggestions
match count_tokens("test", "gpt-5", false) {
    Ok(_) => panic!("Should have failed"),
    Err(TokenError::UnknownModel { model, suggestion }) => {
        assert_eq!(model, "gpt-5");
        assert!(suggestion.contains("Did you mean"));
    }
    Err(_) => panic!("Wrong error type"),
}

§Architecture

The library is organized into several modules:

tokenizers - Core tokenization engine and model registry
output - Output formatting (simple, verbose, debug)
cli - Command-line interface components
error - Error types and handling
api - API integration utilities (consent prompts, etc.)

The main entry point is the count_tokens function, which takes text and a model name and returns a TokenizationResult with the token count and model information.

Re-exports§

pub use error::TokenError;
pub use output::select_formatter;
pub use output::OutputFormatter;
pub use tokenizers::ModelInfo;
pub use tokenizers::TokenizationResult;
pub use tokenizers::Tokenizer;

Modules§

api: API integration utilities for external services
cli: CLI module for command-line interface
error: Error types for token counting operations
output: Output formatting for different verbosity levels
tokenizers: Tokenizer implementations for various LLM models

Functions§

count_tokens: Count tokens in the given text using the specified model

Crate token_count

Crate token_count Copy item path

§Features

§Quick Start

§Supported Models

§Error Handling

§Architecture

Re-exports§

Modules§

Functions§

Crate token_count