# fuzzy-parser
Automatic JSON repair for LLM-generated output
[](https://crates.io/crates/fuzzy-parser)
[](https://docs.rs/fuzzy-parser)
[](LICENSE)
## Overview
LLM-generated JSON often contains typos and syntax errors. `fuzzy-parser` automatically repairs these issues, enabling robust LLM integration.
```rust
use fuzzy_parser::{sanitize_json, repair_tagged_enum_json, TaggedEnumSchema, FuzzyOptions};
// LLM output (typos + syntax errors)
let llm_output = r#"{"type": "AddDeriv", "taget": "User", "derives": ["Debg",],}"#;
// Step 1: Fix syntax errors
let sanitized = sanitize_json(llm_output);
// Step 2: Fix typos
assert_eq!(result.repaired["type"], "AddDerive"); // AddDeriv → AddDerive
assert_eq!(result.repaired["target"], "User"); // taget → target
assert_eq!(result.repaired["derives"][0], "Debug"); // Debg → Debug
```
## Features
### JSON Sanitization (Syntax Repair)
| Trailing comma (object) | `{"a": 1,}` | `{"a": 1}` |
| Trailing comma (array) | `[1, 2,]` | `[1, 2]` |
| Missing closing brace | `{"a": 1` | `{"a": 1}` |
| Missing closing bracket | `["a"` | `["a"]` |
| Unclosed string | `{"a": "test` | `{"a": "test"}` |
### Fuzzy Repair (Typo Correction)
| Tag value (enum discriminator) | `"AddDeriv"` | `"AddDerive"` |
| Field name | `"taget"` | `"target"` |
| Enum array value | `["Debg"]` | `["Debug"]` |
| Nested object field | `{"timout": 30}` | `{"timeout": 30}` |
## Installation
```toml
[dependencies]
fuzzy-parser = "0.1"
```
## Usage
### Basic Usage
```rust
use fuzzy_parser::{sanitize_json, repair_tagged_enum_json, TaggedEnumSchema, FuzzyOptions};
// Define schema
let schema = TaggedEnumSchema::new(
"type", // tag field name
&["AddDerive", "RemoveDerive", "Rename"], // valid tag values
|tag| match tag {
"AddDerive" | "RemoveDerive" => Some(&["target", "derives"][..]),
"Rename" => Some(&["from", "to"][..]),
_ => None,
},
);
// Repair
let json = r#"{"type": "AddDeriv", "taget": "User"}"#;
let result = repair_tagged_enum_json(json, &schema, &FuzzyOptions::default())?;
println!("Repaired: {}", result.repaired);
println!("Corrections: {:?}", result.corrections);
```
### Enum Array Repair
```rust
let result = repair_tagged_enum_json(json, &schema, &FuzzyOptions::default())?;
// derives: ["Debug", "Clone"]
```
### Nested Object Repair
```rust
let result = repair_tagged_enum_json(json, &schema, &FuzzyOptions::default())?;
// config: {"timeout": 30, "retries": 3}
```
### Combined Sanitization + Repair
```rust
use fuzzy_parser::{sanitize_json, repair_tagged_enum_json, TaggedEnumSchema, FuzzyOptions};
let malformed = r#"{
"type": "Acton",
"nam": "test",
"data": {"valeu": 42,},
}"#;
// Step 1: Sanitize (fix trailing commas, missing braces, etc.)
let sanitized = sanitize_json(malformed);
// Step 2: Repair (fix typos)
let result = repair_tagged_enum_json(&sanitized, &schema, &FuzzyOptions::default())?;
```
### Custom Options
```rust
use fuzzy_parser::{FuzzyOptions, Algorithm};
// Customize similarity threshold and algorithm
let options = FuzzyOptions::default()
.with_min_similarity(0.8) // default: 0.7
.with_algorithm(Algorithm::Levenshtein); // default: JaroWinkler
```
### Inspecting Corrections
```rust
let result = repair_tagged_enum_json(json, &schema, &options)?;
if result.has_corrections() {
println!("{} corrections made:", result.correction_count());
for correction in &result.corrections {
println!(
" {} → {} (similarity: {:.2}, path: {})",
correction.original,
correction.corrected,
correction.similarity,
correction.field_path
);
}
}
```
## Algorithms
| **Jaro-Winkler** (default) | Prefix-weighted, handles transpositions | General typo correction |
| Levenshtein | Equal cost for insert/delete/substitute | Edit distance based |
| Damerau-Levenshtein | Levenshtein + transposition support | Transposition-heavy typos |
## Design Principles
1. **Two-stage processing**: Syntax repair (sanitize) and typo repair (repair) are separated
2. **Schema-driven**: Caller defines the schema (library remains generic)
3. **Transparency**: All corrections are recorded as `Correction` structs
4. **Safety**: No corrections made below similarity threshold
## License
MIT OR Apache-2.0