Crate tokenmonster

Crate tokenmonster 

Source
Expand description

TokenMonster: greedy tiktoken-like tokenizer (cl100k_base approximator)

  • Greedy longest-match over an embedded vocabulary (base64-encoded tokens → ids).
  • Falls back to raw bytes (0..255) when no match.
  • Fast counting suitable for chunking and cost estimates (not exact tiktoken fidelity).

Design

  • Lazy vocabulary load with once_cell.
  • Hash maps (ahash) for encoder/decoder.
  • Small inline vocab under tiny_vocab feature for tests/examples.

Re-exports§

pub use greedy::GreedyTokenizer;

Modules§

greedy

Structs§

TokenMonster