A batch version of GrammarMatcher that can fill the next token bitmask for multiple
matchers in parallel. It utilizes multiple threads to speed up the computation. It is
especially useful when the batch size is large.
Match the output of the LLM to the specified grammar, then generate the mask for the next
token. This is the core class in the grammar-guided generation.
Allocate the bitmask for the next token prediction. The bitmask is an int32 tensor on
CPU with shape (batch_size, ceil(vocab_size / 32)). Users who have their own needs to
manage CUDA memory can construct the tensor with get_bitmask_shape and bitmask_dtype
themselves.