Expand description
SharedDictionary: pre-parsed trie + raw data that can be shared across
multiple tokenizer instances via Arc, avoiding ~150 MB per-tokenizer
trie duplication.
Structs§
- Shared
Dictionary - Shared dictionary state that can be cloned cheaply across tokenizers.
Enums§
- Dict
Data - Dictionary data that can be either an owned
Vec<u8>(when mutation was needed for connection inhibitions) or a memory-mapped file (zero-copy, OS-managed pages).