token-count
A fast, accurate CLI tool for counting tokens in LLM model inputs
Overview
token-count is a POSIX-style command-line tool that counts tokens for various LLM models using exact tokenization. Pipe any text in, get accurate token counts out—no browser, no API calls, just a fast offline binary.
# Quick token count
|
# From file
# With context info
|
)
)
Features
✅ Accurate - Exact tokenization using OpenAI's tiktoken library
✅ Fast - ~2.7µs for small inputs (3,700x faster than 10ms target)
✅ Efficient - 57MB memory for 12MB files (8.8x under 500MB limit)
✅ Compact - 9.2MB binary with all tokenizers embedded
✅ Offline - Zero runtime dependencies, all tokenizers built-in
✅ Simple - POSIX-style interface, works like wc or grep
Installation
Quick Install (Recommended)
Linux / macOS:
|
Homebrew (macOS / Linux):
Cargo (All Platforms):
Manual Download:
Download pre-built binaries from GitHub Releases.
For detailed installation instructions, troubleshooting, and platform-specific guidance, see INSTALL.md.
System Requirements
- Platform: Linux x86_64, macOS (Intel/Apple Silicon), Windows x86_64
- Runtime: No dependencies (static binary)
- Build from source: Rust 1.85.0 or later
Usage
Basic Usage
# Default model (gpt-3.5-turbo)
|
# Specific model
|
# From file
# Piped from another command
|
Model Selection
# Use canonical name
# Use alias (case-insensitive)
# With provider prefix
Verbosity Levels
# Simple output (default) - just the number
|
# Verbose (-v) - model info and context usage
|
)
)
# Debug (-vvv) - for troubleshooting
|
)
Model Information
# List all supported models
# Output:
# Supported models:
#
# gpt-3.5-turbo
# Encoding: cl100k_base
# Context window: 16385 tokens
# Aliases: gpt-3.5, gpt35, gpt-35-turbo, openai/gpt-3.5-turbo
#
# gpt-4
# Encoding: cl100k_base
# Context window: 128000 tokens
# Aliases: gpt4, openai/gpt-4
# ...
Help and Version
# Show help
# Show version
Supported Models
OpenAI Models (Exact Tokenization)
| Model | Encoding | Context Window | Aliases |
|---|---|---|---|
| gpt-3.5-turbo | cl100k_base | 16,385 | gpt-3.5, gpt35, gpt-35-turbo |
| gpt-4 | cl100k_base | 128,000 | gpt4 |
| gpt-4-turbo | cl100k_base | 128,000 | gpt4-turbo, gpt-4turbo |
| gpt-4o | o200k_base | 128,000 | gpt4o |
All models support:
- Case-insensitive names (e.g.,
GPT-4,gpt-4,Gpt-4) - Provider prefix (e.g.,
openai/gpt-4)
Error Handling
token-count provides helpful error messages with suggestions:
# Unknown model with fuzzy suggestions
|
# Typo correction
|
# Invalid UTF-8
Exit Codes
0- Success1- I/O error or invalid UTF-82- Unknown model name
Performance
Benchmarks
Measured on Ubuntu 22.04 with Rust 1.85.0:
| Input Size | Time | Target | Result |
|---|---|---|---|
| 100 bytes | 2.7µs | <10ms | 3,700x faster ⚡ |
| 1 KB | 54µs | <100ms | 1,850x faster ⚡ |
| 10 KB | 534µs | N/A | Excellent |
Memory Usage
- 12MB file: 57 MB resident memory (8.8x under 500MB limit)
- Processing time: 0.76 seconds for 12MB
- No memory leaks: Validated with valgrind
Binary Size
- Release binary: 9.2 MB (5.4x under 50MB target)
- Includes: All 4 OpenAI tokenizers embedded
- Optimizations: Stripped, LTO enabled
Development
Building from Source
# Clone repository
# Run tests
# Run benchmarks
# Build release binary
# Check code quality
# Security audit
Running Tests
# All tests (100 tests)
# Specific test suite
# With output
Project Structure
token-count/
├── src/
│ ├── lib.rs # Public library API
│ ├── main.rs # Binary entry point
│ ├── cli/ # CLI argument parsing
│ │ ├── args.rs # Clap definitions
│ │ ├── input.rs # Stdin reading
│ │ └── mod.rs
│ ├── tokenizers/ # Tokenization engine
│ │ ├── openai.rs # OpenAI tokenizer
│ │ ├── registry.rs # Model registry
│ │ └── mod.rs
│ ├── output/ # Output formatters
│ │ ├── simple.rs # Simple formatter
│ │ ├── verbose.rs # Verbose formatter
│ │ ├── debug.rs # Debug formatter
│ │ └── mod.rs
│ └── error.rs # Error types
├── tests/ # Integration tests
│ ├── fixtures/ # Test data
│ ├── model_aliases.rs
│ ├── verbosity.rs
│ ├── performance.rs
│ ├── error_handling.rs
│ ├── end_to_end.rs
│ └── ...
├── benches/ # Performance benchmarks
│ └── tokenization.rs
└── .github/
└── workflows/
└── ci.yml # CI configuration
Security
Resource Limits
- Maximum input size: 100MB per invocation
- Memory usage: Typically <100MB, peaks at ~2x input size
- CPU usage: Single-threaded, 100% of one core during processing
Known Limitations
Stack Overflow with Highly Repetitive Inputs: The underlying tiktoken-rs library can experience stack overflow when processing highly repetitive single-character inputs (e.g., 1MB+ of the same character). This is due to regex backtracking in the tokenization engine. Real-world text with varied content works fine at large sizes.
- Workaround: Break extremely large repetitive inputs into smaller chunks
- Impact: Minimal - real documents rarely exhibit this pathological pattern
- Status: Tracked upstream in tiktoken-rs
Best Practices
For CI/CD Pipelines:
# Limit concurrent processes to avoid resource exhaustion
|
For Untrusted Input:
# Use timeout to prevent hangs
For Large Files:
# Monitor memory usage
Security Audit
- Last audit: 2026-03-13
- Findings: 0 critical, 0 high, 0 medium vulnerabilities
- Dependencies: 5 direct, all audited with
cargo audit - Binary: Stripped, no debug symbols, 9.2MB
Run security checks:
Reporting Security Issues
If you discover a security vulnerability, please email hello@burdick.dev (or open a private security advisory on GitHub). Do not open public issues for security concerns.
Architecture
Design Principles
From our Constitution:
- POSIX Simplicity - Behaves like standard Unix utilities
- Accuracy Over Speed - Exact tokenization for supported models
- Zero Runtime Dependencies - Single offline binary
- Fail Fast with Clear Errors - No silent failures
- Semantic Versioning - Predictable upgrade paths
Technical Stack
- Language: Rust 1.85.0+ (stable)
- CLI Parsing: clap 4.6.0+ (derive API)
- Tokenization: tiktoken-rs 0.9.1+ (OpenAI models)
- Error Handling: anyhow 1.0.102+, thiserror 1.0+
- Fuzzy Matching: strsim 0.11+ (Levenshtein distance)
- Testing: 100 tests with criterion benchmarks
Key Features
- Library-first design: Core logic in
lib.rs, thin binary wrapper - Trait-based abstractions: Extensible for future tokenizers
- Strategy pattern: Multiple output formatters
- Registry pattern: Model configuration with lazy initialization
- Streaming support: 64KB chunks for large inputs
Roadmap
v0.1.0 (Current Release) ✅
- OpenAI model support (4 models)
- CLI with model selection and verbosity
- Fuzzy model suggestions
- UTF-8 validation with error reporting
- Comprehensive test suite (100 tests)
- Performance benchmarks
- Cross-platform support (Linux, macOS, Windows)
- Multiple installation methods (install.sh, Homebrew, cargo, manual)
- GitHub release binaries with checksums
- Automated release pipeline
v0.2.0 (Future - More Models)
- Anthropic Claude support
- Google Gemini support
- Meta Llama support
- Mistral support
v0.3.0 (Future - Stable API)
- Stable library API for embedding
- Token ID output (debug mode)
- Batch processing mode
- Configuration file support
Contributing
Contributions are welcome! This project follows specification-driven development.
Development Setup
See CONTRIBUTING.md for detailed instructions.
Quick start:
Code Quality Standards
- No disabled lint rules - Fix code to comply, don't silence warnings
- 100% type safety - No
anytypes or suppressions - All public APIs documented - With examples
- Test coverage - All user stories covered
- Zero clippy warnings - Strict linting enforced
License
MIT License - see LICENSE for details.
Acknowledgments
Built with:
- tiktoken-rs - Rust tiktoken implementation
- clap - Command line argument parser
- spec-kit - Specification-driven development
Special thanks to:
- OpenAI for open-sourcing tiktoken
- The Rust community for excellent tooling
Status: ✅ MVP Complete (Linux) | Version: 0.1.0
Author: Shaun Burdick