Skip to main content

Crate vil_tokenizer

Crate vil_tokenizer 

Source
Expand description

VIL Tokenizer Engine

Native Rust BPE (Byte-Pair Encoding) tokenizer for:

  • Token counting (how many tokens in this text?)
  • Text truncation (cut to N tokens without breaking words)
  • Encoding/decoding (text <-> token IDs)

Compatible with OpenAI tiktoken and Llama sentencepiece vocabularies.

Re-exports§

pub use bpe::BpeTokenizer;
pub use vocab::Vocabulary;
pub use vocab::VocabSource;
pub use counter::TokenCounter;
pub use truncate::truncate_to_tokens;
pub use truncate::TruncateStrategy;
pub use plugin::TokenizerPlugin;
pub use semantic::TokenizeEvent;
pub use semantic::TokenizeFault;
pub use semantic::TokenizerState;

Modules§

bpe
counter
handlers
VIL pattern HTTP handlers for the tokenizer plugin.
pipeline_sse
SSE pipeline builders for tokenizer operations.
plugin
VilPlugin implementation for tokenizer.
semantic
Semantic types for tokenizer operations (Tier B AI).
truncate
vocab