Skip to main content

Crate toktrie

Crate toktrie 

Source
Expand description

Utility library for working with tokens and token tries.

This crate provides efficient implementations of token tries, which are useful for when constraining large language models with a grammar. A token trie is a prefix tree of all tokens in a tokenizer. For example, the node ‘t’ might have children ‘the’, ‘token’ and ‘try’ (and so on - the node ‘the’ would have its own children ‘there’ and ‘they’). If we know that the token ‘t’ does not currently match our constraints, we know that all of its children (and their children) will not match either.

This crate is a highly optimized implementation of token tries, with a focus on efficient memory access.

§Key types

§Constraint interface

The Recognizer trait expresses byte-level constraints that can be applied to the trie as a byte-stack interface for trie-based filtering.

Modules§

bytes
recognizer
Functional recognizer trait and stack-based adapter.

Structs§

AnythingGoes
ApproximateTokEnv
Branch
InferenceCapabilities
Defines what is allowed in Branch
SimpleVob
A compact bit vector representing a set of allowed crate::TokenIds.
SimpleVobIter
Splice
Describes what to do after sampling.
StepArg
TokEnvWithTrie
TokRxInfo
TokTrie
A prefix tree (trie) of every token in a tokenizer’s vocabulary.
TrieNode

Constants§

INVALID_TOKEN

Traits§

Recognizer
Byte-level constraint interface used for trie-based token filtering.
TokenizerEnv
Abstraction over tokenizer implementations.

Functions§

parse_numeric_token
Parse a special token of the form \xFF [ 1 2 3 4 ] The initial \xFF is not included in the input. Returns the number of bytes consumed and the token id.

Type Aliases§

StepResult
TokEnv
TokenId
Numeric identifier for a single token in a tokenizer’s vocabulary.