lling-llang 0.1.0

WFST framework for text normalization and grammar correction
# Syntax Recovery Layer

The syntax recovery layer handles structural syntax errors by inserting or deleting tokens to produce a parseable token sequence.

## Recovery Strategies

The layer supports three recovery strategies:

```rust
pub enum RecoveryStrategy {
    /// Insert missing tokens (e.g., missing closing bracket)
    Insertion,
    /// Delete unexpected tokens (e.g., extra semicolon)
    Deletion,
    /// Replace tokens (e.g., wrong bracket type)
    Replacement,
    /// All strategies combined (default)
    All,
}
```

### How Each Strategy Works

```
┌─────────────────────────────────────────────────────────────────────────┐
│                       Recovery Strategies                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  INSERTION - Add missing tokens                                          │
│  ──────────────────────────────────────────────────────────────────────  │
│  Input:  "def foo(x"                                                     │
│  Output: "def foo(x )"  (insert ")")                                    │
│                                                                          │
│  Implementation: Self-loop edges for insertable tokens at each node      │
│      ┌────┐                                                              │
│      ▼    │ ")"                                                          │
│  ○───────○───────○                                                       │
│     "x"                                                                  │
│                                                                          │
│  DELETION - Remove unexpected tokens                                     │
│  ──────────────────────────────────────────────────────────────────────  │
│  Input:  "def;; foo"                                                     │
│  Output: "def foo"  (delete extra ";")                                  │
│                                                                          │
│  Implementation: Skip edges that bypass deletable tokens                 │
│                                                                          │
│     ○──────";"──────○──────"foo"──────○                                 │
│      \              /                                                    │
│       `──── ε ────´                                                     │
│            (skip)                                                        │
│                                                                          │
│  REPLACEMENT - Swap wrong tokens                                         │
│  ──────────────────────────────────────────────────────────────────────  │
│  Input:  "arr[0)"  (wrong bracket type)                                  │
│  Output: "arr[0]"  (replace ")" with "]")                               │
│                                                                          │
│  Implementation: Alternative edges with replacement tokens               │
│                                                                          │
│     ○──────")"──────○    (cost: 0)                                      │
│      \              │                                                    │
│       \─────"]"────-┘    (cost: replacement_cost)                       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

## Configuration

### SyntaxRecoveryConfig

```rust
pub struct SyntaxRecoveryConfig {
    /// Recovery strategies to use
    pub strategies: Vec<RecoveryStrategy>,

    /// Cost for inserting a token
    pub insertion_cost: f64,

    /// Cost for deleting a token
    pub deletion_cost: f64,

    /// Cost for replacing a token
    pub replacement_cost: f64,

    /// Maximum consecutive insertions
    pub max_insertions: usize,

    /// Maximum consecutive deletions
    pub max_deletions: usize,

    /// Tokens that can be inserted
    pub insertable_tokens: HashSet<Arc<str>>,

    /// Tokens that can be deleted
    pub deletable_tokens: HashSet<Arc<str>>,

    /// Bracket pairs (open -> close)
    pub bracket_pairs: HashMap<Arc<str>, Arc<str>>,

    /// Whether to balance brackets
    pub balance_brackets: bool,

    /// Whether to add missing semicolons
    pub add_semicolons: bool,

    /// Language hint
    pub language_hint: Option<String>,
}
```

### Default Configuration

```rust
impl Default for SyntaxRecoveryConfig {
    fn default() -> Self {
        // Default insertable tokens
        let insertable = hashset!["(", ")", "[", "]", "{", "}", ";", ",", ":", "."];

        // Default deletable tokens
        let deletable = hashset![";", ",", ".", "(", ")", "[", "]", "{", "}"];

        // Standard bracket pairs
        let brackets = hashmap![
            "(" => ")",
            "[" => "]",
            "{" => "}",
            "<" => ">",
        ];

        Self {
            strategies: vec![RecoveryStrategy::All],
            insertion_cost: 2.0,
            deletion_cost: 1.5,
            replacement_cost: 1.0,
            max_insertions: 3,
            max_deletions: 2,
            insertable_tokens: insertable,
            deletable_tokens: deletable,
            bracket_pairs: brackets,
            balance_brackets: true,
            add_semicolons: false,
            language_hint: None,
        }
    }
}
```

## Creating a Syntax Recovery Layer

### Basic Usage

```rust
use lling_llang::layers::code_correction::{SyntaxRecoveryLayer, SyntaxRecoveryConfig};

// With default configuration
let layer = SyntaxRecoveryLayer::new(SyntaxRecoveryConfig::default());

// Apply to a lattice
let recovered = layer.apply(&lattice)?;
```

### Custom Configuration

```rust
let config = SyntaxRecoveryConfig::new(vec![RecoveryStrategy::Insertion])
    .with_insertion_cost(1.5)
    .with_max_insertions(5)
    .with_bracket_balancing(true)
    .with_insertable_tokens(vec!["async", "await"])
    .with_bracket_pair("<<", ">>");  // Custom brackets

let layer = SyntaxRecoveryLayer::new(config);
```

### Language-Specific Configuration

```rust
// Python (no semicolons, uses indentation)
let python_config = SyntaxRecoveryConfig::default()
    .with_semicolon_insertion(false)
    .with_language("python");

// Rust (uses semicolons and braces)
let rust_config = SyntaxRecoveryConfig::default()
    .with_semicolon_insertion(true)
    .with_insertable_tokens(vec!["->", "=>", "::"])
    .with_language("rust");

// Rholang (parallel composition)
let rholang_config = SyntaxRecoveryConfig::default()
    .with_insertable_tokens(vec!["|", "<-", "<<", ">>"])
    .with_bracket_pair("{*", "*}")
    .with_language("rholang");
```

## Builder Pattern

```rust
let config = SyntaxRecoveryConfig::new(vec![RecoveryStrategy::All])
    .with_insertion_cost(3.0)      // Higher cost for insertions
    .with_deletion_cost(1.5)       // Lower cost for deletions
    .with_replacement_cost(1.0)    // Lowest cost for replacements
    .with_max_insertions(5)        // Allow up to 5 consecutive inserts
    .with_max_deletions(2)         // Allow up to 2 consecutive deletes
    .with_bracket_balancing(true)  // Balance unclosed brackets
    .with_semicolon_insertion(true) // Add missing semicolons
    .with_language("java");
```

## Bracket Balancing

The layer automatically balances unclosed brackets:

```rust
// Input: "def foo(x:"
// The "(" is never closed

// With balance_brackets = true:
// The layer tracks: bracket_stack = ["("]
// At end of input, it adds closing ")" with insertion_cost

// Result lattice includes path: "def foo(x:)"
```

### Custom Bracket Pairs

```rust
let config = SyntaxRecoveryConfig::default()
    .with_bracket_pair("(*", "*)")     // ML-style comments
    .with_bracket_pair("{|", "|}")     // Set builder notation
    .with_bracket_pair("<%", "%>");    // Template tags
```

## Cost Model

Costs determine which corrections are preferred:

```
Total Cost = original_cost + sum(recovery_costs)

Recovery costs:
- Insertion: insertion_cost per token (default 2.0)
- Deletion: deletion_cost per token (default 1.5)
- Replacement: replacement_cost per token (default 1.0)
```

Lower total cost = more preferred path.

### Cost Tuning

```rust
// Prefer deletions over insertions (for noisy input)
let config = SyntaxRecoveryConfig::default()
    .with_insertion_cost(3.0)
    .with_deletion_cost(1.0);

// Prefer replacements (for bracket mismatches)
let config = SyntaxRecoveryConfig::default()
    .with_replacement_cost(0.5)
    .with_insertion_cost(2.0)
    .with_deletion_cost(2.0);
```

## Examples

### Missing Closing Bracket

```rust
// Input lattice represents: "foo(x, y"

let config = SyntaxRecoveryConfig::default();
let layer = SyntaxRecoveryLayer::new(config);

let recovered = layer.apply(&input)?;
// Recovered lattice includes path: "foo(x, y)"
// with cost = original_cost + insertion_cost(2.0)
```

### Extra Punctuation

```rust
// Input: "return;; x"

let config = SyntaxRecoveryConfig::default();
let layer = SyntaxRecoveryLayer::new(config);

let recovered = layer.apply(&input)?;
// Recovered lattice includes path: "return; x"
// with cost = original_cost + deletion_cost(1.5)
```

### Wrong Bracket Type

```rust
// Input: "arr[0)"

let config = SyntaxRecoveryConfig::default();
let layer = SyntaxRecoveryLayer::new(config);

let recovered = layer.apply(&input)?;
// Recovered lattice includes:
//   "arr[0)" with cost = original_cost
//   "arr[0]" with cost = original_cost + replacement_cost(1.0)
```

### Multiple Errors

```rust
// Input: "fn foo( {"  (missing ) and body)

let config = SyntaxRecoveryConfig::default()
    .with_max_insertions(5);  // Allow multiple insertions

let layer = SyntaxRecoveryLayer::new(config);
let recovered = layer.apply(&input)?;

// Recovered lattice includes path: "fn foo() {"
// with cost = original_cost + insertion_cost * 1
```

## Strategy Selection

### Insertion-Only

For completing partial code:

```rust
let config = SyntaxRecoveryConfig::new(vec![RecoveryStrategy::Insertion]);
```

Use when:
- Code is being typed (IDE completion)
- Input is mostly correct but incomplete
- You want to preserve all original tokens

### Deletion-Only

For cleaning noisy input:

```rust
let config = SyntaxRecoveryConfig::new(vec![RecoveryStrategy::Deletion]);
```

Use when:
- Input may have accidental duplicates
- Copy-paste errors
- Removing debug statements

### All Strategies

For general error recovery:

```rust
let config = SyntaxRecoveryConfig::new(vec![RecoveryStrategy::All]);
// Or just use default
let config = SyntaxRecoveryConfig::default();
```

Use when:
- Unknown error types
- General code repair
- Maximum flexibility needed

## Layer Statistics

The layer reports statistics about its operation:

```rust
let (recovered, stats) = layer.apply_with_stats(&lattice)?;

println!("Input edges: {}", stats.input_edges);
println!("Output edges: {}", stats.output_edges);
println!("Time: {}μs", stats.time_us);

// Expansion factor
let expansion = stats.output_edges as f64 / stats.input_edges as f64;
println!("Expansion: {:.2}x", expansion);
```

## Estimated Expansion

The layer provides an estimate of how many paths it adds:

```rust
let layer = SyntaxRecoveryLayer::new(config);
let expansion = layer.estimated_expansion();
// Typical value: 1.05 - 1.15 (5-15% more edges)
```

This helps with planning pipeline resources.

## See Also

- [Overview]overview.md - Code correction introduction
- [Pattern-Aware Correction]pattern-aware.md - Idiom-based boosting
- [Language Configuration]configuration.md - Per-language settings