Expand description
TokenMonster: greedy tiktoken-like tokenizer (cl100k_base approximator)
- Greedy longest-match over an embedded vocabulary (base64-encoded tokens → ids).
- Falls back to raw bytes (0..255) when no match.
- Fast counting suitable for chunking and cost estimates (not exact tiktoken fidelity).
Design
- Lazy vocabulary load with once_cell.
- Hash maps (ahash) for encoder/decoder.
- Small inline vocab under
tiny_vocabfeature for tests/examples.
Re-exports§
pub use greedy::GreedyTokenizer;