Expand description
Utility library for working with tokens and token tries.
This crate provides efficient implementations of token tries, which are useful for when constraining large language models with a grammar. A token trie is a prefix tree of all tokens in a tokenizer. For example, the node ‘t’ might have children ‘the’, ‘token’ and ‘try’ (and so on - the node ‘the’ would have its own children ‘there’ and ‘they’). If we know that the token ‘t’ does not currently match our constraints, we know that all of its children (and their children) will not match either.
This crate is a highly optimized implementation of token tries, with a focus on efficient memory access.
§Key types
TokTrie– the token trie itself.SimpleVob– a bit vector representing a set of allowedTokenIds.TokenizerEnv– trait abstracting over tokenizer implementations.
§Constraint interface
The Recognizer trait expresses byte-level constraints that can be
applied to the trie as a byte-stack interface for trie-based filtering.
Modules§
- bytes
- recognizer
- Functional recognizer trait and stack-based adapter.
Structs§
- Anything
Goes - Approximate
TokEnv - Branch
- Inference
Capabilities - Defines what is allowed in Branch
- Simple
Vob - A compact bit vector representing a set of allowed
crate::TokenIds. - Simple
VobIter - Splice
- Describes what to do after sampling.
- StepArg
- TokEnv
With Trie - TokRx
Info - TokTrie
- A prefix tree (trie) of every token in a tokenizer’s vocabulary.
- Trie
Node
Constants§
Traits§
- Recognizer
- Byte-level constraint interface used for trie-based token filtering.
- Tokenizer
Env - Abstraction over tokenizer implementations.
Functions§
- parse_
numeric_ token - Parse a special token of the form \xFF [ 1 2 3 4 ] The initial \xFF is not included in the input. Returns the number of bytes consumed and the token id.
Type Aliases§
- Step
Result - TokEnv
- TokenId
- Numeric identifier for a single token in a tokenizer’s vocabulary.