Skip to main content

Crate token_count

Crate token_count 

Source
Expand description

Token counting library for LLM models

This library provides exact tokenization for various LLM models using their official tokenizers.

§Features

  • Exact tokenization for OpenAI models (GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o)
  • Model aliases with case-insensitive matching
  • Fuzzy suggestions for typos and unknown models
  • Zero runtime dependencies - all tokenizers embedded
  • Fast and efficient - ~2.7µs for small inputs

§Quick Start

use token_count::count_tokens;

// Count tokens for a specific model
let result = count_tokens("Hello world", "gpt-4", false).unwrap();
assert_eq!(result.token_count, 2);
println!("Tokens: {}", result.token_count);
println!("Model: {}", result.model_info.name);

§Supported Models

  • gpt-3.5-turbo - GPT-3.5 Turbo (16K context)
  • gpt-4 - GPT-4 (128K context)
  • gpt-4-turbo - GPT-4 Turbo (128K context)
  • gpt-4o - GPT-4o (128K context)

All models support aliases (e.g., gpt4, GPT-4, openai/gpt-4).

§Error Handling

use token_count::{count_tokens, TokenError};

// Unknown model returns an error with suggestions
match count_tokens("test", "gpt-5", false) {
    Ok(_) => panic!("Should have failed"),
    Err(TokenError::UnknownModel { model, suggestion }) => {
        assert_eq!(model, "gpt-5");
        assert!(suggestion.contains("Did you mean"));
    }
    Err(_) => panic!("Wrong error type"),
}

§Architecture

The library is organized into several modules:

  • tokenizers - Core tokenization engine and model registry
  • output - Output formatting (simple, verbose, debug)
  • cli - Command-line interface components
  • error - Error types and handling
  • api - API integration utilities (consent prompts, etc.)

The main entry point is the count_tokens function, which takes text and a model name and returns a TokenizationResult with the token count and model information.

Re-exports§

pub use error::TokenError;
pub use output::select_formatter;
pub use output::OutputFormatter;
pub use tokenizers::ModelInfo;
pub use tokenizers::TokenizationResult;
pub use tokenizers::Tokenizer;

Modules§

api
API integration utilities for external services
cli
CLI module for command-line interface
error
Error types for token counting operations
output
Output formatting for different verbosity levels
tokenizers
Tokenizer implementations for various LLM models

Functions§

count_tokens
Count tokens in the given text using the specified model