anyrepair 0.2.3

A comprehensive Rust crate for repairing malformed structured data including JSON, YAML, XML, TOML, CSV, INI, Markdown, and Diff with format auto-detection
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
# AnyRepair

[![GitHub stars](https://img.shields.io/github/stars/yingkitw/anyrepair?style=social)](https://github.com/yingkitw/anyrepair)

A Rust crate for repairing malformed structured data across multiple formats (JSON, YAML, Markdown, XML, TOML, CSV, INI, Diff).

## Quick Start

### Installation

```toml
[dependencies]
anyrepair = "0.2.2"
```

### Basic Usage

```rust
use anyrepair::repair;

// Auto-detect format and repair
let malformed = r#"{"name": "John", age: 30,}"#;
let repaired = repair(malformed)?;
println!("{}", repaired); // {"name": "John", "age": 30}
```

### CLI

```bash
# Install
cargo install anyrepair

# Auto-detect and repair
anyrepair repair input.json

# Format-specific repair
anyrepair repair input.json --format json
anyrepair repair input.yaml --format yaml
anyrepair repair input.md --format markdown

# Show confidence score
anyrepair repair input.json --format json --confidence

# Batch process multiple files
anyrepair batch --input ./data --output ./repaired --recursive

# Stream large files
anyrepair stream --input large_file.json --output repaired.json --format json

# Validation without repair
anyrepair validate --input input.json --format json

# Custom rules management
anyrepair rules list

# Show supported formats
anyrepair stats
```

## What's New

### v0.2.2 - Latest Release

**πŸ“ Documentation & Maintenance**
- Updated version to 0.2.2 with comprehensive documentation
- Removed unused dependencies (pulldown-cmark, anyhow)
- Enhanced README with GitHub star badge and feedback invitation
- Added CHANGELOG entries for versions 0.1.6 through 0.2.2
- Updated documentation index to reflect current project state

**πŸ—οΈ v0.2.0 - KISS/DRY/SoC Refactoring**
- Centralized format registry: single source of truth for format→repairer/validator mapping
- Unified CLI: `repair --format <fmt>` replaces 8 per-format subcommands
- Extracted `format_detection` module for clean separation of concerns
- Removed dead code (`BaseRepairer` trait, standalone `apply_strategies`)
- ~400 lines of duplicated code eliminated

**πŸ”§ 8 Format Support**
- JSON, YAML, Markdown, XML, TOML, CSV, INI, Diff/Unified Diff
- Auto-detection from malformed content
- Format-specific validation and repair strategies

**🐍 Python-Compatible API**
- Drop-in compatible with Python's `jsonrepair` library
- `jsonrepair()` function and `JsonRepair` class API

**πŸ”Œ MCP Server Integration**
- Native Claude Desktop integration via MCP
- 10 MCP tools for all 8 formats plus auto-detect and validate

**⚑ Performance & Quality**
- **318 test cases** with 100% pass rate
- **99.6% improvement** from regex caching
- **Streaming support** for files larger than RAM
- **Zero compilation warnings**

See [CHANGELOG.md](docs/CHANGELOG.md) for complete version history.

## Why AnyRepair?

Structured data from LLMs, APIs, or manual editing is often malformed. AnyRepair fixes common issues:

- **JSON**: Missing quotes, trailing commas, syntax errors
- **YAML**: Indentation issues, missing colons
- **Markdown**: Malformed headers, broken links
- **XML/TOML/CSV/INI/Diff**: Format-specific repairs

**Key Features:**
- βœ… Auto-detects format from damaged content
- βœ… Multi-format support (8 formats)
- βœ… High performance (regex caching, optimized binaries)
- βœ… MCP server for Claude integration
- βœ… Streaming support for large files
- βœ… 318 tests, 100% pass rate

## Usage Examples

### Multi-Format Auto-Detection

```rust
use anyrepair::repair;

// JSON - auto-detected and repaired
let json = repair(r#"{"key": value,}"#)?;
// Output: {"key": "value"}

// YAML - auto-detected and repaired
let yaml = repair("name: John\nage: 30")?;

// Markdown - auto-detected and repaired
let markdown = repair("# Header\n[link](url")?;

// Diff - auto-detected and repaired
let diff = repair("@@ -1,3 +1,4 @@\n-line 1\n+line 1 modified")?;
```

### Python-Compatible JSON API

```rust
use anyrepair::{jsonrepair, JsonRepair};

// Function-based API (like Python's jsonrepair)
let repaired = jsonrepair(r#"{"name": "John", age: 30,}"#)?;

// Class-based API (like Python's JsonRepair class)
let mut jr = JsonRepair::new();
let repaired1 = jr.jsonrepair(r#"{"key": "value",}"#)?;
let repaired2 = jr.jsonrepair(r#"{name: "John"}"#)?;
```

### Format-Specific Repairers

```rust
use anyrepair::{create_repairer, repair_with_format, traits::Repair};

// Via registry (recommended)
let mut repairer = create_repairer("json")?;
let repaired = repairer.repair(malformed_json)?;
let confidence = repairer.confidence(&repaired);

// Shorthand
let repaired = repair_with_format(malformed_yaml, "yaml")?;

// Direct struct usage still works
use anyrepair::json::JsonRepairer;
let mut json_repairer = JsonRepairer::new();
let repaired = json_repairer.repair(malformed_json)?;
```

### Streaming Large Files

```rust
use anyrepair::StreamingRepair;
use std::fs::File;
use std::io::BufReader;

let input = BufReader::new(File::open("large_file.json")?);
let mut output = File::create("repaired.json")?;

// Configure buffer size (default 8192 bytes)
let processor = StreamingRepair::with_buffer_size(65536);

// Process with automatic format detection
processor.process(input, &mut output, None)?;

// Or specify format explicitly
processor.process(input, &mut output, Some("json"))?;
```

### Batch Processing

```rust
use anyrepair::BatchProcessor;

let processor = BatchProcessor::new();

// Process directory with options
let results = processor.process_directory(
    "./data",
    true,  // recursive
    "*.json",  // file filter
)?;

// Get per-file results
for result in results {
    println!("{}: {:?} ({}ms)",
        result.file_path,
        result.status,
        result.repair_time_ms
    );
}

// Get analytics
let analytics = processor.get_analytics();
println!("Success rate: {}%", analytics.success_rate());
```

### MCP Server Integration

The MCP server provides seamless integration with Claude Desktop:

```bash
# Install and run MCP server
cargo install anyrepair
anyrepair-mcp
```

**Configure in `claude_desktop_config.json`:**
```json
{
  "mcpServers": {
    "anyrepair": {
      "command": "anyrepair-mcp"
    }
  }
}
```

**Available MCP Tools:**
- `repair` - Auto-detect and repair any format
- `repair_json`, `repair_yaml`, `repair_markdown`, `repair_xml`
- `repair_toml`, `repair_csv`, `repair_ini`, `repair_diff`
- `validate` - Validate content without repair

**Usage in Claude:**
```
Please repair this JSON: {"name": "John", age: 30,}
(Claude will use the anyrepair MCP tool to fix it)
```

See [MCP_SERVER.md](docs/MCP_SERVER.md) for complete documentation.

## Supported Formats

| Format | Common Issues Fixed |
|--------|---------------------|
| **JSON** | Missing quotes, trailing commas, malformed numbers, boolean/null values |
| **YAML** | Indentation, missing colons, list formatting, document separators |
| **Markdown** | Headers, code blocks, lists, tables, links, images |
| **XML** | Unclosed tags, malformed attributes, missing quotes, entity encoding |
| **TOML** | Missing quotes, malformed arrays, table headers, dates |
| **CSV** | Unquoted strings, malformed quotes, extra/missing commas |
| **INI** | Malformed sections, missing equals signs, unquoted values |
| **Diff** | Missing hunk headers, incorrect line prefixes, malformed ranges |

## Advanced Features

### Custom Rules

```bash
# Add custom repair rule via CLI
anyrepair rules add --id "fix_undefined" --format "json" \
  --pattern "undefined" --replacement "null" --priority 90

# List all rules
anyrepair rules list

# Enable/disable rules
anyrepair rules enable "fix_undefined"
anyrepair rules disable "fix_undefined"

# Remove a rule
anyrepair rules remove "fix_undefined"
```

**Configuration file (anyrepair.toml):**
```toml
# Custom rules configuration
[[rules]]
id = "fix_trailing_comma"
format = "json"
pattern = ",\\s*}"
replacement = "}"
priority = 95

[[rules]]
id = "fix_js_comments"
format = "json"
pattern = "//.*\\n"
replacement = ""
priority = 80
```

## Performance

- **Regex Caching**: 99.6% performance improvement over uncached operations
- **Optimized Binaries**: 1.5 MB release builds (94% size reduction)
- **Streaming**: Process files larger than available RAM using configurable buffers
- **Lazy Evaluation**: Skip unnecessary strategies for faster repairs

**Build Profiles:**
```bash
# Standard release (size-optimized)
cargo build --release

# Distribution profile (maximum optimization)
cargo build --profile dist
```

## Testing

- **318 test cases** with 100% pass rate
  - 137 library tests (incl. format repairers, validators)
  - 35 diff tests
  - 34 fuzz tests
  - 26 streaming tests
  - 18 complex damage tests
  - 18 complex streaming tests
  - 18 damage scenarios
  - 17 integration tests
  - 15 CLI tests
- **Fuzz testing** using proptest for robustness
- **Integration tests** for end-to-end workflows

See [TEST_SUMMARY.md](docs/TEST_SUMMARY.md) for details.

## Comparison

| Feature | AnyRepair | json-repair-rs | json5 | Python jsonrepair |
|---------|-----------|----------------|-------|-------------------|
| **Multi-format** | βœ… 8 formats | ❌ JSON only | ❌ JSON only | ❌ JSON only |
| **Auto-detection** | βœ… Smart detection | ❌ | ❌ | ❌ |
| **MCP integration** | βœ… Native | ❌ | ❌ | ❌ |
| **Streaming** | βœ… Large file support | ❌ | ❌ | ❌ |
| **Custom rules** | βœ… CLI + API | ❌ | ❌ | ❌ |
| **Python API** | βœ… Compatible | ❌ | ❌ | βœ… Native |
| **Language** | Rust | Rust | Rust | Python |
| **Binary size** | 1.5 MB | ~500 KB | ~200 KB | N/A |

**Why AnyRepair?**
- Most comprehensive format support (8 formats vs JSON-only alternatives)
- Only Rust crate with Python-compatible API and MCP integration
- Battle-tested with 318 tests covering real-world failures
- Zero compilation warnings

## Documentation

- **[ARCHITECTURE.md]docs/ARCHITECTURE.md** - System design and architecture
- **[MCP_SERVER.md]docs/MCP_SERVER.md** - MCP server integration guide
- **[TEST_SUMMARY.md]docs/TEST_SUMMARY.md** - Test coverage details
- **[CHANGELOG.md]docs/CHANGELOG.md** - Version history and changes
- **[INDEX.md]docs/INDEX.md** - Complete documentation index
- **[STREAMING_FEATURE.md]docs/STREAMING_FEATURE.md** - Streaming support details
- **[BUILD_OPTIMIZATION.md]docs/BUILD_OPTIMIZATION.md** - Build optimization guide

### Quick Links

- **Report Issues**: [GitHub Issues]https://github.com/yingkitw/anyrepair/issues
- **Contributing**: See [CONTRIBUTING.md]CONTRIBUTING.md (if available)
- **Changelog**: [CHANGELOG.md]docs/CHANGELOG.md
- **API Docs**: [docs.rs]https://docs.rs/anyrepair

## Examples

See the [examples/](examples/) directory for:

- **[mcp_repair_json.rs]examples/mcp_repair_json.rs** - MCP JSON repair usage
- **[mcp_multi_format.rs]examples/mcp_multi_format.rs** - Multi-format MCP repair
- **[mcp_server_usage.rs]examples/mcp_server_usage.rs** - MCP server setup and usage

Run examples:
```bash
cargo run --example mcp_repair_json
```

## Roadmap

See [TODO.md](TODO.md) for planned features and improvement areas. Highlights include:

- Additional format support (Properties, .env, Protobuf)
- CLI enhancements (diff preview, dry-run, colored output)
- Web interface and REST API
- Language bindings (Python, Node.js, Go)
- Format-preserving repairs, repair explanations

## License

Apache-2.0

## Repository

**⭐ If you find AnyRepair useful, please consider starring the repo on GitHub!** It helps others discover the project and motivates continued development.

**πŸ’¬ Feedback welcome!** Share your experience, suggestions, or report issues via [GitHub Issues](https://github.com/yingkitw/anyrepair/issues), or leave a review on [crates.io](https://crates.io/crates/anyrepair).

https://github.com/yingkitw/anyrepair