tryparse
Multi-strategy parser for messy, real-world data. Built to handle LLM responses with broken JSON, markdown wrappers, type mismatches, and inconsistent formatting.
Quick Start
[]
= "0.4"
= { = "1.0", = ["derive"] }
use parse;
use Deserialize;
Core Features
With serde::Deserialize
Basic type coercion works out of the box:
use parse;
use Deserialize;
let data: Data = parse.unwrap;
With LlmDeserialize (derive feature)
Advanced features require the derive feature:
[]
= { = "0.4", = ["derive"] }
= "0.4"
Fuzzy field matching - Handles different naming conventions:
use parse_llm;
use LlmDeserialize;
let data: Config = parse_llm.unwrap;
Enum fuzzy matching - Case-insensitive, partial matches:
Internally-tagged enum fuzzy matching - Tag values match fuzzily:
use ;
// All of these work with fuzzy tag matching:
let d1: Decision = parse_llm.unwrap;
let d2: Decision = parse_llm.unwrap;
let d3: Decision = parse_llm.unwrap;
let d4: Decision = parse_llm.unwrap;
let d5: Decision = parse_llm.unwrap;
Union types - Automatically picks the best variant:
// Parses as Number(42)
let v1: Value = parse_llm.unwrap;
// Parses as Text("hello")
let v2: Value = parse_llm.unwrap;
// Parses as List(...)
let v3: Value = parse_llm.unwrap;
Implied key - Single-field structs unwrap values:
// Direct string wraps into the single field
let w: Wrapper = parse_llm.unwrap;
assert_eq!;
API Reference
Basic Parsing
// Parse with serde::Deserialize
Advanced Parsing (requires derive feature)
// Parse with LlmDeserialize trait (fuzzy matching, unions, etc.)
Utilities
// Score a candidate (lower is better)
How It Works
1. Multi-Stage Parsing Pipeline
Input String
↓
┌──────────────────────────────────┐
│ Pre-Processing │
│ • Remove BOM, zero-width chars │
│ • Fix excessive nesting (>50) │
│ • Normalize backslashes │
└────────────┬─────────────────────┘
↓
┌──────────────────────────────────┐
│ Strategy Execution (parallel) │
│ • DirectJson (priority 1) │
│ • Markdown (priority 2) │
│ • YAML (priority 15)│
│ • JsonFixer (priority 20)│
│ • Heuristic (priority 30)│
│ │
│ → Produces Vec<FlexValue> │
└────────────┬─────────────────────┘
↓
┌──────────────────────────────────┐
│ Scoring & Ranking │
│ • Base score by source │
│ • Transformation penalties │
│ • Confidence adjustment │
│ • Sort ascending (best first) │
└────────────┬─────────────────────┘
↓
┌──────────────────────────────────┐
│ Deserialization │
│ • Try candidates in order │
│ • Apply type coercion │
│ • Track transformations │
│ • Return first success │
└──────────────────────────────────┘
2. Parsing Strategies
| Strategy | Priority | Description |
|---|---|---|
| DirectJson | 1 | Direct serde_json::from_str(). Fastest path for valid JSON. |
| Markdown | 2 | Extracts from markdown code blocks. Scores by keywords, position, size. |
| YAML | 15 | Parses YAML, converts to JSON. Requires yaml feature. |
| JsonFixer | 20 | Fixes common JSON errors (see below). |
| Heuristic | 30 | Pattern-based extraction from prose. Last resort. |
3. JSON Fixes Applied
The JsonFixer strategy handles:
- Trailing commas:
{"a": 1,}→{"a": 1} - Unquoted keys:
{name: "x"}→{"name": "x"} - Single quotes:
{'a': 1}→{"a": 1} - Missing commas:
{"a":1 "b":2}→{"a":1,"b":2} - Unclosed braces/brackets:
{"a": 1→{"a": 1} - Comments:
{"a": 1 /* comment */}→{"a": 1} - Smart quotes:
{"a": "value"}→{"a": "value"} - Double-escaped JSON:
"{\"a\":1}"→{"a":1} - Template literals:
{`key`: "value"}→{"key": "value"} - Hex numbers:
{"a": 0xFF}→{"a": 255} - Unescaped newlines in strings
- JavaScript functions: Removed entirely
4. Type Coercion
Applied during deserialization (works with both Deserialize and LlmDeserialize):
| From | To | Example |
|---|---|---|
| String | Number | "42" → 42 |
| String | Bool | "true" → true |
| Number | String | 42 → "42" |
| Float | Int | 42.0 → 42 |
| Single | Array | "item" → ["item"] |
5. Field Matching (LlmDeserialize only)
Normalizes field names to snake_case and matches case-insensitively:
| Struct Field | Matches JSON Keys |
|---|---|
user_name |
userName, UserName, user-name, user.name, USER_NAME, username |
max_count |
maxCount, MaxCount, max-count, max.count, MAX_COUNT |
Note: Does not handle acronyms perfectly. XMLParser becomes x_m_l_parser not xml_parser.
6. Scoring System
Base Scores (by source):
- Direct JSON: 0
- Markdown: 10
- YAML: 15
- Fixed JSON: 20 + (5 × number of fixes)
- Heuristic: 50
Transformation Penalties:
- String→Number: +2
- Float→Int: +3
- Field rename: +4
- Single→Array: +5
- Default inserted: +50
Confidence Modifier:
- Each transformation reduces confidence by 5%
- Final score +=
(1.0 - confidence) × 100
Lower scores win. Direct JSON with no coercion scores 0 (best possible).
Examples
Handling Markdown Responses
let llm_output = r#"
Sure! Here's the user data:
```json
{
"name": "Alice",
"age": 30,
"email": "alice@example.com"
}
Let me know if you need anything else! "#;
#[derive(Deserialize, Debug)] struct User { name: String, age: i64, email: String, }
let user: User = parse(llm_output).unwrap();
### Inspecting Parse Candidates
```rust
use tryparse::{parse_with_candidates, scoring::score_candidate};
let (result, candidates) = parse_with_candidates::<User>(messy_input).unwrap();
println!("Best result: {:?}", result);
println!("\nAll candidates:");
for (i, candidate) in candidates.iter().enumerate() {
println!(" {}: {:?} (score: {})",
i,
candidate.source(),
score_candidate(candidate)
);
// Inspect transformations
for t in candidate.transformations() {
println!(" - {:?}", t);
}
}
Custom Parser Configuration
use FlexibleParser;
use parse_with_parser;
// Use builder pattern for custom configuration
let parser = builder
.without_heuristic // Disable heuristic extraction
.without_markdown // Disable markdown extraction
.build;
let data: User = parse_with_parser.unwrap;
Complex Nested Structures
use HashMap;
let project: Project = parse_llm.unwrap;
Union Types with Scoring
// Automatically picks the variant that best matches the structure
let response: Response = parse_llm.unwrap;
match response
Feature Flags
# Default: includes markdown and yaml
[]
= "0.4"
# Minimal build (core JSON parsing only)
= { = "0.4", = false }
# With derive macros for LlmDeserialize
= { = "0.4", = ["derive"] }
# All features
= { = "0.4", = ["derive", "markdown", "yaml"] }
Available features:
markdown(default) - Markdown code block extractionyaml(default) - YAML parsing supportderive- Derive macro forLlmDeserialize(fuzzy field/enum matching, union types)
Testing
# Unit tests (226 tests in lib)
# All tests (lib + integration + doc tests)
# Minimal build tests
# Run specific test
Performance Considerations
- Parsing is synchronous: No async/await support
- Memory overhead: Tracks all parsing candidates and transformations
- Strategy execution: Some strategies run in parallel
- Regex compilation: Expensive regexes are compiled lazily and cached
- Best for: <1MB inputs, occasional parsing (not high-frequency loops)
Optimizations:
- Direct JSON (valid JSON) takes the fastest path
- Failed strategies short-circuit early
- Scoring is lazy (only computed when needed)
- Copy semantics for small types (enums, etc.)
Debugging
Enable detailed logging:
init;
set_var;
let result = ;
Inspect what went wrong:
match
Known Limitations
- Synchronous only - No async parsing
- No streaming - Requires complete input string
- Memory overhead - Tracks all candidates and transformations
- Acronym handling -
XMLParser→x_m_l_parser(notxml_parser) - Best-effort parsing - May produce unexpected results on ambiguous input
- No custom deserializers - Can't implement custom
Deserializelogic for fields - Internally-tagged enum fields - Tag values support fuzzy matching, but variant fields use standard serde deserialization (exact match or serde's
rename_all)
Contributing
Requirements:
- Rust 1.70.0+
- Run
cargo fmtbefore committing - Pass
cargo clippy --all-targets --all-features - All tests must pass:
cargo test --all-features
Pull request checklist:
- Clear description of what and why
- Tests for new functionality
- Update README if API changes
- No clippy warnings
- All existing tests pass
License
Apache-2.0
Credits
Parsing algorithms inspired by BAML's Schema-Aligned Parsing.