AnyRepair
A comprehensive Rust crate for repairing LLM responses across multiple formats including JSON, YAML, Markdown, XML, TOML, CSV, and INI files.
Why AnyRepair?
Critical for Agentic AI and Tool Use
In the era of agentic AI systems, reliable data parsing is essential for:
- Tool Function Calls: AI agents must parse structured data from external APIs and tools
- MCP (Model Context Protocol): Ensures reliable communication between AI models and external systems
- Agent Workflows: Multi-step AI processes depend on consistent data format handling
- Error Recovery: When AI outputs malformed data, repair enables graceful recovery
- Production Reliability: Real-world AI applications need robust data handling
The Problem with LLM Output
LLMs often generate:
- Malformed JSON with missing quotes, trailing commas, or syntax errors
- Inconsistent YAML with indentation issues or missing colons
- Broken Markdown with malformed headers, links, or code blocks
- Invalid XML/TOML/CSV/INI with structural problems
Without repair, these errors cause:
- Tool failures in agentic workflows
- MCP communication breakdowns
- Cascading errors in multi-step AI processes
- Poor user experience with unreliable AI applications
AnyRepair's Solution
While there are several JSON repair tools available in Rust, AnyRepair addresses the unique challenges of LLM-generated content across multiple formats:
Multi-Format Support
Unlike single-format tools like json-repair-rs or json5, AnyRepair handles 7 different formats (JSON, YAML, Markdown, XML, TOML, CSV, INI) with auto-detection, making it perfect for LLM responses that can be in any format.
LLM-Specific Optimizations
- Intelligent Format Detection: Automatically detects format from damaged content
- Context-Aware Repair: Understands LLM output patterns and common errors
- Rule-Based Confidence Scoring: Provides repair quality metrics using pattern-based rules (no LLM required)
- Parallel Processing: Optimized for batch processing of LLM responses
Advanced Repair Strategies
- Adaptive Repair: Strategies that adapt based on content complexity
- Multi-Pass Processing: Applies multiple repair strategies in optimal order
- Custom Rules: User-defined repair patterns for specific use cases
- Plugin System: Extensible architecture for custom repair logic
Production-Ready Features
- Comprehensive Testing: 116+ test cases with fuzz testing for robustness
- High Performance: Regex caching with 99.6% performance improvement
- CLI & Library: Both command-line tool and Rust library for integration
- Configuration Management: TOML-based configuration with custom rules
Comparison with Other Tools
| Feature | AnyRepair | json-repair-rs | json5 | Other Tools |
|---|---|---|---|---|
| Multi-format | ✅ 7 formats | ❌ JSON only | ❌ JSON only | ❌ Single format |
| Auto-detection | ✅ | ❌ | ❌ | ❌ |
| LLM-optimized | ✅ | ❌ | ❌ | ❌ |
| Agentic AI support | ✅ | ❌ | ❌ | ❌ |
| MCP integration | ✅ | ❌ | ❌ | ❌ |
| Custom rules | ✅ | ❌ | ❌ | ❌ |
| Plugin system | ✅ | ❌ | ❌ | ❌ |
| Rule-based confidence scoring | ✅ | ❌ | ❌ | ❌ |
| Parallel processing | ✅ | ❌ | ❌ | ❌ |
| Fuzz testing | ✅ | ❌ | ❌ | ❌ |
Features
- Multi-format repair: JSON, YAML, Markdown, XML, TOML, CSV, INI
- Auto-detection: Automatically detects format and applies appropriate repairs
- High performance: Regex caching with up to 99.6% performance improvement
- CLI tool: Command-line interface for easy usage
- Comprehensive testing: 116+ test cases with snapshot and fuzz testing
- Parallel processing: Multi-threaded strategy application for better performance
- Advanced strategies: Intelligent format detection, adaptive repair, and context-aware processing
- Plugin system: Extensible architecture for custom repair strategies
- Custom rules: User-defined repair rules with full CLI management
- Fuzz testing: Comprehensive property-based testing for robustness
- Configuration: TOML-based configuration with custom rules and plugin settings
Rule-Based Confidence Scoring
AnyRepair uses sophisticated pattern-based rules to calculate confidence scores without requiring any LLM calls:
JSON Confidence Rules
- Structure Detection: Checks for balanced braces
{}and brackets[] - Key-Value Patterns: Detects colon
:separators and quote patterns - Syntax Validation: Validates JSON structure without parsing
- Balance Scoring: Rewards properly balanced opening/closing delimiters
YAML Confidence Rules
- Indentation Analysis: Evaluates consistent indentation patterns
- Key-Value Detection: Identifies colon
:separators and list indicators- - Document Structure: Recognizes YAML document separators
--- - Content Patterns: Detects YAML-specific syntax elements
Format-Specific Rules
Each format has tailored confidence rules:
- Markdown: Header patterns
#, code blocks ````, link syntax[]() - XML: Tag structure
<>, attribute patterns, entity encoding - TOML: Table headers
[], key-value pairs=, array syntax - CSV: Delimiter consistency, quote patterns, row structure
- INI: Section headers
[], key-value pairs=, comment patterns
Benefits of Rule-Based Approach
- Fast: No external API calls or network requests
- Reliable: Consistent scoring based on deterministic rules
- Transparent: Clear understanding of how scores are calculated
- Customizable: Rules can be extended through the plugin system
Agentic AI & MCP Integration
Model Context Protocol (MCP) Support
AnyRepair is designed to work seamlessly with MCP implementations:
use repair;
// MCP tool call response repair
let mcp_response = r#"{"tool": "search", "params": {"query": "AI news"}, "result": "..."}"#;
let repaired = repair?;
// MCP context data repair
let context_data = r#"name: AI Assistant
version: 1.0
capabilities: [search, analyze, generate]"#;
let repaired_context = repair?;
Agentic AI Workflow Integration
Perfect for AI agent systems that need reliable data handling:
// Agent tool execution with repair
async
Use Cases
- AI Agent Tool Calls: Repair malformed responses from external APIs
- MCP Communication: Ensure reliable data exchange between AI models and tools
- Multi-Agent Systems: Handle data format inconsistencies across different agents
- Production AI Apps: Robust error handling for real-world AI applications
- LLM Output Processing: Clean and validate AI-generated structured data
Installation
[]
= "0.1.2"
Usage
Library
use repair;
// Auto-detect format and repair
let content = r#"{"name": "John", "age": 30,}"#;
let repaired = repair?;
println!; // {"name": "John", "age": 30}
CLI
# Install
# Repair a file
# Batch process
# Get statistics
Supported Formats
- JSON: Missing quotes, trailing commas, malformed numbers, boolean/null values, nested structures
- YAML: Indentation, missing colons, list formatting, complex structures, document separators
- Markdown: Headers, code blocks, lists, tables, links, images, bold/italic formatting
- XML: Unclosed tags, malformed attributes, missing quotes, invalid characters, entity encoding
- TOML: Missing quotes, malformed arrays, table headers, numbers, dates, inline tables
- CSV: Unquoted strings, malformed quotes, extra/missing commas, headers, field escaping
- INI: Malformed sections, missing equals signs, unquoted values, comments, key-value pairs
Plugin System
AnyRepair features a powerful plugin system that allows you to extend functionality with custom repair strategies, validators, and repairers.
Plugin Management
# List available plugins
# Show plugin information
# Enable/disable plugins
# Show plugin statistics
# Discover plugins in directories
Custom Rules
Create and manage custom repair rules:
# Initialize configuration with templates
# Add a custom rule
# Test a rule
# List all rules
Plugin Development
See Plugin Development Guide for detailed information on creating custom plugins.
License
Apache-2.0