lextrail-test 0.1.0

A library for constraining language model outputs to follow CFG, REGEX and JSON (experimental).
Documentation
<div align="center">

![lextrail-red](https://github.com/user-attachments/assets/2930e0f2-4b12-4e75-bb55-c199aaa23fab)

_A Rust implementation of the Python library [lextrail](https://github.com/miftahmoha/lextrail)._

</div>

## Features

- Zero dependencies
- Parses all context-free grammars, including ambiguous grammars
- Returns tokens constrained to a specified vocabulary if needed
- Native Rust performance 

## Quick Start

### Installation

``` bash
cargo add lextrail
```

## Usage Modes

The library supports two ways to generate constrained text, depending on your use case:

### Trail

Use a **Trail** object when you want to generate the complete next element without vocabulary constraints.

**CFG**

```rust
use lextrail::build::trail_cfg;

let example = r#"
      start: expression

      expression: term (("+" | "-") term)

      term: factor (("*" | "/") factor)

      factor: NUMBER

      NUMBER: /-?[0-9]+/
"#;

let (schema, mut state) = trail_cfg(example).expect("Expected `Trail`, but got a `TrailError`.");
```

**Regex**

```rust
use lextrail::guide::trail_rex;

let example = r#"[a-z]+@[a-z]+\.(com|org|net)"#;

let (schema, mut state) = trail_rex(example).expect("Expected `Trail`, but got a `TrailError`.");
```

You can also combine both TERMINAL and REGEX expressions using `trail_exp`.

```rust
use lextrail::guide::trail_exp;

let example = r#"/[0-9]\.[0-9]/ "+" /[0-9]\.[0-9]/"#;

let (schema, mut state) = trail_exp(example).expect("Expected `Trail`, but got a `TrailError`.");
```

**JSON**

_This is an experimental version. Not intended for production use._

- Currently supported keywords: `type`, `enum`, `const`, `properties`, `required`, `items`, `prefixItems`, `oneOf`
- Constraint intersection (e.g., combining `prefixItems` with `items`, or `const` with `enum`) is not yet implemented

```rust
use lextrail::json::trail_json;

let example = r#"
    {
        "type": "object",
        "properties": {
            "user": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"}
                },
                "required": ["email"]
            }
        }
    }
"#;

let (schema, mut state) = trail_json(example).expect("Expected `Trail`, but got a `TrailError`.");
```

Then, run a random simulation.

```rust
use rand::prelude::IteratorRandom;
use rand::rng;

use lextrail::guide::get_next_proposals;

let (mut response, mut value) = (Vec::new(), String::new());

loop {
    let values = get_next_proposals(&schema, &mut state, &value).expect("Expected `Vec<String>`, but got a `TrailError`.");

    if values.is_empty() {
        break;
    }

    value = values.into_iter().choose(&mut rng()).unwrap();
    response.push(value.clone());
}

println!("{}", response.join(""));
```

_You can pretty-print JSON output using `lextrail::json::format_json_instance`._

### ASM

Use an **ASM** object when you need to constrain the next token to a predefined vocabulary.

#### Example

```rust
use lextrail::assemble::asm_cfg;

let example = r#"
    start: L0

    L0: ("A" | "B")+ L1

    L1: ("C" | "D") L2

    L2: "E" L3*

    L3: /FGH/
"#

let asm = asm_cfg(example, vec![String::from("AD"), String::from("EF"), String::from("GH")]).expect("Expected `ASM`, but got a `TrailError`.");
```

If you launch a simulation, then the proposals will be elements of the provided vocabulary.

```rust
use rand::prelude::IteratorRandom;
use rand::rng;

use lextrail::assemble::get_next_tokens;

let (mut response, mut value) = (Vec::new(), String::new());

loop {
    let values = get_next_tokens(&schema, &mut state, &value).expect("Expected `Vec<String>`, but got a `TrailError`.");

    if values.is_empty() {
        break;
    }

    value = values.into_iter().choose(&mut rng()).unwrap();
    response.push(value.clone());
}

assert_eq!(response, vec![String::from("AD"), String::from("EF"), String::from("GH"), String::new()]);

println!("{}", response.join(""));
```

You can do it with any of the formats.

```rust
# CFG
use lextrail::assemble::asm_cfg;

asm_cfg(.., .., vec![..]);

# REGEX
use lextrail::assemble::asm_rex;

asm_rex(.., .., vec![..]);

# MIXED
use lextrail::assemble::asm_exp;

asm_exp(.., .., vec![..]);

# JSON
use lextrail::json::asm_json;

asm_json(.., .., vec![..]);
```

## Playground

_I've built a playground to showcase the different simulations, see the Python [implementation](https://github.com/miftahmoha/lextrail)._