Expand description
VIL Tokenizer Engine
Native Rust BPE (Byte-Pair Encoding) tokenizer for:
- Token counting (how many tokens in this text?)
- Text truncation (cut to N tokens without breaking words)
- Encoding/decoding (text <-> token IDs)
Compatible with OpenAI tiktoken and Llama sentencepiece vocabularies.
Re-exports§
pub use bpe::BpeTokenizer;pub use vocab::Vocabulary;pub use vocab::VocabSource;pub use counter::TokenCounter;pub use truncate::truncate_to_tokens;pub use truncate::TruncateStrategy;pub use plugin::TokenizerPlugin;pub use semantic::TokenizeEvent;pub use semantic::TokenizeFault;pub use semantic::TokenizerState;