token-count
A fast, accurate CLI tool for counting tokens in LLM model inputs
Overview
token-count is a POSIX-style command-line tool that counts tokens for various LLM models. It supports exact tokenization for OpenAI models (offline) and adaptive estimation for Claude models (with optional API mode for exact counts). Pipe any text in, get token counts out—fast, offline, and accurate.
# OpenAI models (exact, offline)
|
# Claude models (estimation, offline)
|
# From file
# With context info
|
)
)
Features
✅ Accurate - Exact tokenization for OpenAI, adaptive estimation for Claude
✅ Fast - ~2.7µs for small inputs (3,700x faster than 10ms target)
✅ Efficient - 57MB memory for 12MB files (8.8x under 500MB limit)
✅ Compact - 9.2MB binary with all tokenizers embedded
✅ Offline - Zero runtime dependencies for OpenAI; optional API for Claude
✅ Simple - POSIX-style interface, works like wc or grep
Installation
Quick Install (Recommended)
Linux / macOS:
|
Homebrew (macOS / Linux):
Cargo (All Platforms):
Manual Download:
Download pre-built binaries from GitHub Releases.
For detailed installation instructions, troubleshooting, and platform-specific guidance, see INSTALL.md.
System Requirements
- Platform: Linux x86_64, macOS (Intel/Apple Silicon), Windows x86_64
- Runtime: No dependencies (static binary)
- Build from source: Rust 1.85.0 or later
Usage
Basic Usage
# Default model (gpt-3.5-turbo)
|
# Specific model
|
# From file
# Piped from another command
|
Model Selection
# Use canonical name
# Use alias (case-insensitive)
# With provider prefix
Verbosity Levels
# Simple output (default) - just the number
|
# Verbose (-v) - model info and context usage
|
)
)
# Debug (-vvv) - for troubleshooting
|
)
Model Information
# List all supported models
# Output:
# Supported models:
#
# gpt-3.5-turbo
# Encoding: cl100k_base
# Context window: 16385 tokens
# Aliases: gpt-3.5, gpt35, gpt-35-turbo, openai/gpt-3.5-turbo
#
# gpt-4
# Encoding: cl100k_base
# Context window: 128000 tokens
# Aliases: gpt4, openai/gpt-4
# ...
Help and Version
# Show help
# Show version
Supported Models
OpenAI Models (Exact Tokenization - Offline)
| Model | Encoding | Context Window | Aliases |
|---|---|---|---|
| gpt-3.5-turbo | cl100k_base | 16,385 | gpt-3.5, gpt35, gpt-35-turbo |
| gpt-4 | cl100k_base | 128,000 | gpt4 |
| gpt-4-turbo | cl100k_base | 128,000 | gpt4-turbo, gpt-4turbo |
| gpt-4o | o200k_base | 128,000 | gpt4o |
Anthropic Claude Models (Adaptive Estimation - Offline by Default)
| Model | Context Window | Aliases | Estimation Mode |
|---|---|---|---|
| claude-opus-4-6 | 1,000,000 | opus, opus-4-6, opus-4.6 | ±10% accuracy |
| claude-sonnet-4-6 | 1,000,000 | claude, sonnet, sonnet-4-6, sonnet-4.6 | ±10% accuracy |
| claude-haiku-4-5 | 200,000 | haiku, haiku-4-5, haiku-4.5 | ±10% accuracy |
Claude Tokenization Modes:
Offline Estimation (Default) - No API key needed:
# Fast offline estimation using adaptive content-type detection
|
Exact API Mode (Optional) - Requires ANTHROPIC_API_KEY:
# Exact count via Anthropic API (requires consent)
|
# Prompts: "This will send your input to Anthropic's API... Proceed? (y/N)"
# Output: 8
# Skip prompt for automation
|
How Claude Estimation Works:
- Detects content type (code vs. prose) using punctuation and keyword analysis
- Code: 3.0 chars/token (lots of
{}[]();and keywords) - Prose: 4.5 chars/token (natural language)
- Mixed: 3.75 chars/token (markdown + code blocks)
- Target: ±10% accuracy for typical inputs
All models support:
- Case-insensitive names (e.g.,
GPT-4,gpt-4,Gpt-4) - Provider prefix (e.g.,
openai/gpt-4,anthropic/claude-sonnet-4-6)
Error Handling
token-count provides helpful error messages with suggestions:
# Unknown model with fuzzy suggestions
|
# Typo correction
|
# Invalid UTF-8
Exit Codes
0- Success1- I/O error or invalid UTF-82- Unknown model name
Performance
Benchmarks
Measured on Ubuntu 22.04 with Rust 1.85.0:
| Input Size | Time | Target | Result |
|---|---|---|---|
| 100 bytes | 2.7µs | <10ms | 3,700x faster ⚡ |
| 1 KB | 54µs | <100ms | 1,850x faster ⚡ |
| 10 KB | 534µs | N/A | Excellent |
Memory Usage
- 12MB file: 57 MB resident memory (8.8x under 500MB limit)
- Processing time: 0.76 seconds for 12MB
- No memory leaks: Validated with valgrind
Binary Size
- Release binary: 9.2 MB (5.4x under 50MB target)
- Includes: All 4 OpenAI tokenizers embedded
- Optimizations: Stripped, LTO enabled
Development
Building from Source
# Clone repository
# Run tests
# Run benchmarks
# Build release binary
# Check code quality
# Security audit
Running Tests
# All tests (100 tests)
# Specific test suite
# With output
Project Structure
token-count/
├── src/
│ ├── lib.rs # Public library API
│ ├── main.rs # Binary entry point
│ ├── cli/ # CLI argument parsing
│ │ ├── args.rs # Clap definitions
│ │ ├── input.rs # Stdin reading
│ │ └── mod.rs
│ ├── tokenizers/ # Tokenization engine
│ │ ├── openai.rs # OpenAI tokenizer (tiktoken)
│ │ ├── claude/ # Claude tokenizer
│ │ │ ├── mod.rs # Main tokenizer
│ │ │ ├── estimation.rs # Adaptive estimation
│ │ │ ├── api_client.rs # Anthropic API
│ │ │ └── models.rs # Model definitions
│ │ ├── registry.rs # Model registry
│ │ └── mod.rs
│ ├── api/ # API utilities
│ │ ├── consent.rs # Interactive consent prompt
│ │ └── mod.rs
│ ├── output/ # Output formatters
│ │ ├── simple.rs # Simple formatter
│ │ ├── verbose.rs # Verbose formatter
│ │ ├── debug.rs # Debug formatter
│ │ └── mod.rs
│ └── error.rs # Error types
├── tests/ # Integration tests
│ ├── fixtures/ # Test data
│ ├── model_aliases.rs
│ ├── verbosity.rs
│ ├── performance.rs
│ ├── error_handling.rs
│ ├── end_to_end.rs
│ ├── claude_estimation.rs # Claude estimation tests
│ ├── claude_api.rs # Claude API tests
│ └── ...
├── benches/ # Performance benchmarks
│ └── tokenization.rs
└── .github/
└── workflows/
└── ci.yml # CI configuration
Security
Resource Limits
- Maximum input size: 100MB per invocation
- Memory usage: Typically <100MB, peaks at ~2x input size
- CPU usage: Single-threaded, 100% of one core during processing
Known Limitations
Stack Overflow with Highly Repetitive Inputs: The underlying tiktoken-rs library can experience stack overflow when processing highly repetitive single-character inputs (e.g., 1MB+ of the same character). This is due to regex backtracking in the tokenization engine. Real-world text with varied content works fine at large sizes.
- Workaround: Break extremely large repetitive inputs into smaller chunks
- Impact: Minimal - real documents rarely exhibit this pathological pattern
- Status: Tracked upstream in tiktoken-rs
Best Practices
For CI/CD Pipelines:
# Limit concurrent processes to avoid resource exhaustion
|
For Untrusted Input:
# Use timeout to prevent hangs
For Large Files:
# Monitor memory usage
Security Audit
- Last audit: 2026-03-13
- Findings: 0 critical, 0 high, 0 medium vulnerabilities
- Dependencies: 5 direct, all audited with
cargo audit - Binary: Stripped, no debug symbols, 9.2MB
Run security checks:
Reporting Security Issues
If you discover a security vulnerability, please email hello@burdick.dev (or open a private security advisory on GitHub). Do not open public issues for security concerns.
Architecture
Design Principles
From our Constitution:
- POSIX Simplicity - Behaves like standard Unix utilities
- Accuracy Over Speed - Exact tokenization for supported models
- Zero Runtime Dependencies - Single offline binary
- Fail Fast with Clear Errors - No silent failures
- Semantic Versioning - Predictable upgrade paths
Technical Stack
- Language: Rust 1.85.0+ (stable)
- CLI Parsing: clap 4.6.0+ (derive API)
- Tokenization:
- tiktoken-rs 0.9.1+ (OpenAI models - offline)
- Adaptive estimation algorithm (Claude models - offline)
- Anthropic API via reqwest 0.12+ (Claude accurate mode - optional)
- Async Runtime: tokio 1.0+ (for API calls)
- Error Handling: anyhow 1.0.102+, thiserror 1.0+
- Fuzzy Matching: strsim 0.11+ (Levenshtein distance)
- Testing: 152 tests with criterion benchmarks
Key Features
- Library-first design: Core logic in
lib.rs, thin binary wrapper - Trait-based abstractions: Extensible for future tokenizers
- Strategy pattern: Multiple output formatters
- Registry pattern: Model configuration with lazy initialization
- Streaming support: 64KB chunks for large inputs
Roadmap
v0.1.0 (Current Release) ✅
- OpenAI model support (4 models)
- CLI with model selection and verbosity
- Fuzzy model suggestions
- UTF-8 validation with error reporting
- Comprehensive test suite (100 tests)
- Performance benchmarks
- Cross-platform support (Linux, macOS, Windows)
- Multiple installation methods (install.sh, Homebrew, cargo, manual)
- GitHub release binaries with checksums
- Automated release pipeline
v0.2.0 (Current Release)
- Anthropic Claude support (3 models)
- Adaptive token estimation algorithm (code/prose detection)
- Optional accurate mode via Anthropic API
- Interactive consent prompt for API calls
- Non-interactive mode support (
-yflag)
v0.3.0 (Future - More Models)
- Google Gemini support
- Meta Llama support
- Mistral support
v0.4.0 (Future - Stable API)
- Stable library API for embedding
- Token ID output (debug mode)
- Batch processing mode
- Configuration file support
Contributing
Contributions are welcome! This project follows specification-driven development.
Development Setup
See CONTRIBUTING.md for detailed instructions.
Quick start:
Code Quality Standards
- No disabled lint rules - Fix code to comply, don't silence warnings
- 100% type safety - No
anytypes or suppressions - All public APIs documented - With examples
- Test coverage - All user stories covered
- Zero clippy warnings - Strict linting enforced
License
MIT License - see LICENSE for details.
Acknowledgments
Built with:
- tiktoken-rs - Rust tiktoken implementation
- clap - Command line argument parser
- spec-kit - Specification-driven development
Special thanks to:
- OpenAI for open-sourcing tiktoken
- The Rust community for excellent tooling
Status: ✅ v0.2.0 Complete (Claude Support) | Version: 0.2.0
Author: Shaun Burdick