lextrail-test 0.1.0

A library for constraining language model outputs to follow CFG, REGEX and JSON (experimental).
Documentation

lextrail-red

A Rust implementation of the Python library lextrail.

Features

  • Zero dependencies
  • Parses all context-free grammars, including ambiguous grammars
  • Returns tokens constrained to a specified vocabulary if needed
  • Native Rust performance

Quick Start

Installation

cargo add lextrail

Usage Modes

The library supports two ways to generate constrained text, depending on your use case:

Trail

Use a Trail object when you want to generate the complete next element without vocabulary constraints.

CFG

use lextrail::build::trail_cfg;

let example = r#"
      start: expression

      expression: term (("+" | "-") term)

      term: factor (("*" | "/") factor)

      factor: NUMBER

      NUMBER: /-?[0-9]+/
"#;

let (schema, mut state) = trail_cfg(example).expect("Expected `Trail`, but got a `TrailError`.");

Regex

use lextrail::guide::trail_rex;

let example = r#"[a-z]+@[a-z]+\.(com|org|net)"#;

let (schema, mut state) = trail_rex(example).expect("Expected `Trail`, but got a `TrailError`.");

You can also combine both TERMINAL and REGEX expressions using trail_exp.

use lextrail::guide::trail_exp;

let example = r#"/[0-9]\.[0-9]/ "+" /[0-9]\.[0-9]/"#;

let (schema, mut state) = trail_exp(example).expect("Expected `Trail`, but got a `TrailError`.");

JSON

This is an experimental version. Not intended for production use.

  • Currently supported keywords: type, enum, const, properties, required, items, prefixItems, oneOf
  • Constraint intersection (e.g., combining prefixItems with items, or const with enum) is not yet implemented
use lextrail::json::trail_json;

let example = r#"
    {
        "type": "object",
        "properties": {
            "user": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"}
                },
                "required": ["email"]
            }
        }
    }
"#;

let (schema, mut state) = trail_json(example).expect("Expected `Trail`, but got a `TrailError`.");

Then, run a random simulation.

use rand::prelude::IteratorRandom;
use rand::rng;

use lextrail::guide::get_next_proposals;

let (mut response, mut value) = (Vec::new(), String::new());

loop {
    let values = get_next_proposals(&schema, &mut state, &value).expect("Expected `Vec<String>`, but got a `TrailError`.");

    if values.is_empty() {
        break;
    }

    value = values.into_iter().choose(&mut rng()).unwrap();
    response.push(value.clone());
}

println!("{}", response.join(""));

You can pretty-print JSON output using lextrail::json::format_json_instance.

ASM

Use an ASM object when you need to constrain the next token to a predefined vocabulary.

Example

use lextrail::assemble::asm_cfg;

let example = r#"
    start: L0

    L0: ("A" | "B")+ L1

    L1: ("C" | "D") L2

    L2: "E" L3*

    L3: /FGH/
"#

let asm = asm_cfg(example, vec![String::from("AD"), String::from("EF"), String::from("GH")]).expect("Expected `ASM`, but got a `TrailError`.");

If you launch a simulation, then the proposals will be elements of the provided vocabulary.

use rand::prelude::IteratorRandom;
use rand::rng;

use lextrail::assemble::get_next_tokens;

let (mut response, mut value) = (Vec::new(), String::new());

loop {
    let values = get_next_tokens(&schema, &mut state, &value).expect("Expected `Vec<String>`, but got a `TrailError`.");

    if values.is_empty() {
        break;
    }

    value = values.into_iter().choose(&mut rng()).unwrap();
    response.push(value.clone());
}

assert_eq!(response, vec![String::from("AD"), String::from("EF"), String::from("GH"), String::new()]);

println!("{}", response.join(""));

You can do it with any of the formats.

# CFG
use lextrail::assemble::asm_cfg;

asm_cfg(.., .., vec![..]);

# REGEX
use lextrail::assemble::asm_rex;

asm_rex(.., .., vec![..]);

# MIXED
use lextrail::assemble::asm_exp;

asm_exp(.., .., vec![..]);

# JSON
use lextrail::json::asm_json;

asm_json(.., .., vec![..]);

Playground

I've built a playground to showcase the different simulations, see the Python implementation.