ferrotorch-grammar 0.6.2

Constrained-decoding grammar processors (JSON-schema → token-allow masks) for ferrotorch
Documentation

ferrotorch-grammar

Constrained-decoding grammar processors for ferrotorch — turn a JSON Schema into a per-step token-allow mask so an LLM can only emit text that satisfies the schema.

Originally lived inside ferrotorch-llama; extracted into its own crate so non-Llama models (e.g. ferrotorch-bert decoders, future encoder-decoder seq2seq models) can reuse the machinery without pulling in the entire Llama 3 stack (#1120).

What it provides

  • Schema — internal AST for a (subset-of) JSON Schema document. Parsed once via Schema::from_json_schema(&serde_json::Value).
  • JsonGrammar — state machine tracking the partially-emitted JSON value. Knows which characters are legal at the current position (e.g. after { only " or } are allowed; mid-string only non-control chars + escape sequences).
  • JsonSchemaProcessor — public type that wraps a tokenizer vocabulary Vec<String> and produces a TokenMask (one u32 flag per token) on every compute_mask() call. Advance with step_token(token_id) after each sample.
  • TokenMaskVec<u32> of 0/1 flags, one per vocab token. Apply to logits via ferrotorch_cubecl::apply_token_mask_to_gpu (GPU) or simple if mask[i] == 0 { logits[i] = -inf } (CPU).

Quick start

use ferrotorch_grammar::JsonSchemaProcessor;
use serde_json::json;

let schema = json!({
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age":  {"type": "integer"}
    },
    "required": ["name", "age"]
});
let vocab: Vec<String> = vec![/* tokenizer vocab */];

let mut proc = JsonSchemaProcessor::new(&schema, vocab)?;

loop {
    let mask = proc.compute_mask();
    let logits = model.forward(/* ... */);
    let token_id = sample_with_mask(logits, &mask);
    if proc.is_complete() { break; }
    proc.step_token(token_id)?;
}

Supported JSON Schema subset

Keyword Status
type: object yes (with properties, required)
type: array yes (with items)
type: string yes (with enum)
type: integer yes
type: number yes
type: boolean yes
type: null yes
nullable: true yes
oneOf / anyOf partial (no allOf)
$ref / $defs yes (intra-document)
pattern not yet
minLength / etc. partial (length / min / max / multipleOf)

Schemas using unsupported keywords return GrammarError::Schema.

Feature flags

Feature Default Description
cuda no Enables gpu_dispatch::{PackedVocab, compute_mask_gpu} for masking that runs on the GPU via ferrotorch-cubecl

Part of ferrotorch

This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.

License

MIT OR Apache-2.0