fuzzy-parser
Automatic JSON repair for LLM-generated output
Overview
LLM-generated JSON often contains typos and syntax errors. fuzzy-parser automatically repairs these issues, enabling robust LLM integration.
use ;
// LLM output (typos + syntax errors)
let llm_output = r#"{"type": "AddDeriv", "taget": "User", "derives": ["Debg",],}"#;
// Step 1: Fix syntax errors
let sanitized = sanitize_json;
// Step 2: Fix typos
let schema = new
.with_enum_array;
let result = repair_tagged_enum_json?;
assert_eq!; // AddDeriv → AddDerive
assert_eq!; // taget → target
assert_eq!; // Debg → Debug
Features
JSON Sanitization (Syntax Repair)
| Error | Before | After |
|---|---|---|
| Trailing comma (object) | {"a": 1,} |
{"a": 1} |
| Trailing comma (array) | [1, 2,] |
[1, 2] |
| Missing closing brace | {"a": 1 |
{"a": 1} |
| Missing closing bracket | ["a" |
["a"] |
| Unclosed string | {"a": "test |
{"a": "test"} |
Fuzzy Repair (Typo Correction)
| Target | Before | After |
|---|---|---|
| Tag value (enum discriminator) | "AddDeriv" |
"AddDerive" |
| Field name | "taget" |
"target" |
| Enum array value | ["Debg"] |
["Debug"] |
| Nested object field | {"timout": 30} |
{"timeout": 30} |
Installation
[]
= "0.1"
Usage
Basic Usage
use ;
// Define schema
let schema = new;
// Repair
let json = r#"{"type": "AddDeriv", "taget": "User"}"#;
let result = repair_tagged_enum_json?;
println!;
println!;
Enum Array Repair
let schema = new
.with_enum_array;
let json = r#"{"type": "AddDerive", "target": "User", "derives": ["Debg", "Clne"]}"#;
let result = repair_tagged_enum_json?;
// derives: ["Debug", "Clone"]
Nested Object Repair
let schema = new
.with_nested_object;
let json = r#"{"type": "Configure", "name": "api", "config": {"timout": 30, "retres": 3}}"#;
let result = repair_tagged_enum_json?;
// config: {"timeout": 30, "retries": 3}
Combined Sanitization + Repair
use ;
let schema = new
.with_nested_object;
// LLM output (syntax errors + typos)
let malformed = r#"{
"type": "Acton",
"nam": "test",
"data": {"valeu": 42,},
}"#;
// Step 1: Sanitize (fix trailing commas, missing braces, etc.)
let sanitized = sanitize_json;
// Step 2: Repair (fix typos)
let result = repair_tagged_enum_json?;
Custom Options
use ;
// Customize similarity threshold and algorithm
let options = default
.with_min_similarity // default: 0.7
.with_algorithm; // default: JaroWinkler
Inspecting Corrections
let result = repair_tagged_enum_json?;
if result.has_corrections
Algorithms
| Algorithm | Characteristics | Best For |
|---|---|---|
| Jaro-Winkler (default) | Prefix-weighted, handles transpositions | General typo correction |
| Levenshtein | Equal cost for insert/delete/substitute | Edit distance based |
| Damerau-Levenshtein | Levenshtein + transposition support | Transposition-heavy typos |
Design Principles
- Two-stage processing: Syntax repair (sanitize) and typo repair (repair) are separated
- Schema-driven: Caller defines the schema (library remains generic)
- Transparency: All corrections are recorded as
Correctionstructs - Safety: No corrections made below similarity threshold
License
MIT OR Apache-2.0