Skip to main content

Module longest_match

Module longest_match 

Source
Expand description

Vocab-only longest-prefix-match tokenizer.

Walks input left-to-right, emitting the ID of the longest vocab fragment that matches at each position. Suitable for canonical-IR / synthetic test maps. NOT BPE-correct for real model vocabs — use crate::BPETokenizer for those.

Structs§

LongestMatchTokenizer
Vocab-only fallback tokenizer.
Tokenize
Top-level tokenizer factory.