ferrotorch-grammar
Constrained-decoding grammar processors for ferrotorch — turn a JSON Schema into a per-step token-allow mask so an LLM can only emit text that satisfies the schema.
Originally lived inside ferrotorch-llama; extracted into its own
crate so non-Llama models (e.g. ferrotorch-bert decoders, future
encoder-decoder seq2seq models) can reuse the machinery without
pulling in the entire Llama 3 stack (#1120).
What it provides
Schema— internal AST for a (subset-of) JSON Schema document. Parsed once viaSchema::from_json_schema(&serde_json::Value).JsonGrammar— state machine tracking the partially-emitted JSON value. Knows which characters are legal at the current position (e.g. after{only"or}are allowed; mid-string only non-control chars + escape sequences).JsonSchemaProcessor— public type that wraps a tokenizer vocabularyVec<String>and produces aTokenMask(oneu32flag per token) on everycompute_mask()call. Advance withstep_token(token_id)after each sample.TokenMask—Vec<u32>of 0/1 flags, one per vocab token. Apply to logits viaferrotorch_cubecl::apply_token_mask_to_gpu(GPU) or simpleif mask[i] == 0 { logits[i] = -inf }(CPU).
Quick start
use JsonSchemaProcessor;
use json;
let schema = json!;
let vocab: = vec!;
let mut proc = new?;
loop
Supported JSON Schema subset
| Keyword | Status |
|---|---|
type: object |
yes (with properties, required) |
type: array |
yes (with items) |
type: string |
yes (with enum) |
type: integer |
yes |
type: number |
yes |
type: boolean |
yes |
type: null |
yes |
nullable: true |
yes |
oneOf / anyOf |
partial (no allOf) |
$ref / $defs |
yes (intra-document) |
pattern |
not yet |
minLength / etc. |
partial (length / min / max / multipleOf) |
Schemas using unsupported keywords return GrammarError::Schema.
Feature flags
| Feature | Default | Description |
|---|---|---|
cuda |
no | Enables gpu_dispatch::{PackedVocab, compute_mask_gpu} for masking that runs on the GPU via ferrotorch-cubecl |
Part of ferrotorch
This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.
License
MIT OR Apache-2.0