Skip to main content

Crate xgrammar

Crate xgrammar 

Source

Modules§

testing

Structs§

BatchGrammarMatcher
A batch version of GrammarMatcher that can fill the next token bitmask for multiple matchers in parallel. It utilizes multiple threads to speed up the computation. It is especially useful when the batch size is large.
CompiledGrammar
This is the primary object to store compiled grammar.
CxxUniquePtr
Binding to C++ std::unique_ptr<T, std::default_delete<T>>.
DLDataType
DLPack data type descriptor (DLDataType).
DLDevice
DLPack device descriptor (DLDevice) (ABI-compatible with dlpack/dlpack.h).
DLManagedTensor
DLPack managed tensor (DLManagedTensor) (owns tensor + deleter).
DLTensor
DLPack tensor view (DLTensor) (does not own memory).
Grammar
This class represents a grammar object in XGrammar, and can be used later in the grammar-guided generation.
GrammarCompiler
The compiler for grammars.
GrammarMatcher
Match the output of the LLM to the specified grammar, then generate the mask for the next token. This is the core class in the grammar-guided generation.
HfMetadata
StructuralTagItem
Deprecated. Definition of a structural tag item.
TokenizerInfo
The tokenizer info contains the vocabulary, the type of the vocabulary, and necessary information for the grammar-guided generation.
cxx_int
Newtype wrapper for an int
cxx_longlong
Newtype wrapper for a long long
cxx_ulong
Newtype wrapper for an unsigned long
cxx_ulonglong
Newtype wrapper for an unsigned long long

Enums§

DLDataTypeCode
DLPack data type code enum (DLDataTypeCode).
DLDeviceType
DLPack device type enum (DLDeviceType).
VocabType

Functions§

allocate_token_bitmask
Allocate the bitmask for the next token prediction. The bitmask is an int32 tensor on CPU with shape (batch_size, ceil(vocab_size / 32)). Users who have their own needs to manage CUDA memory can construct the tensor with get_bitmask_shape and bitmask_dtype themselves.
apply_token_bitmask_inplace_cpu
detect_metadata_from_hf
get_bitmask_shape
Return the shape of the bitmask: (batch_size, ceil(vocab_size / 32)).
get_max_recursion_depth
Get the maximum allowed recursion depth. The depth is shared per process.
get_serialization_version
Get the serialization version number. The current version is “v11”.
reset_token_bitmask
Reset the bitmask to the full mask.
set_max_recursion_depth
Set the maximum allowed recursion depth. The depth is shared per process. This method is thread-safe.