lling-llang 0.1.0

WFST framework for text normalization and grammar correction
# Code Correction Layer

lling-llang provides pattern-aware code correction layers that combine syntax error recovery with learned code idioms to produce idiomatic code corrections.

## Why Code Correction?

Programming languages have strict syntax requirements. Common errors include:
- Missing brackets, parentheses, or braces
- Typos in keywords (`funciton` instead of `function`)
- Missing semicolons or colons
- Incorrect punctuation

The code correction layer uses WFST (Weighted Finite-State Transducer) lattices to:
1. Generate correction candidates
2. Score them using grammar knowledge
3. Boost paths matching common code patterns

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                    Code Correction Layer Stack                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Input: Token sequence (possibly with errors)                            │
│         "def foo( x )"  (missing colon)                                 │
│              │                                                           │
│              ▼                                                           │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                    Token Correction Layer                          │  │
│  │  Uses edit distance + vocabulary to generate token corrections     │  │
│  │  "def" → ["def"]       (exact match)                              │  │
│  │  "foo" → ["foo"]       (identifier)                               │  │
│  │  "x"   → ["x", "y"]    (possible corrections)                     │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│              │                                                           │
│              ▼                                                           │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                   Syntax Recovery Layer                            │  │
│  │  Insert/delete tokens to fix parse errors                          │  │
│  │  • Insert missing ":" after ")"                                    │  │
│  │  • Delete extra punctuation                                        │  │
│  │  • Balance brackets                                                │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│              │                                                           │
│              ▼                                                           │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                   Pattern-Aware Layer                              │  │
│  │  Boost paths matching common code idioms                           │  │
│  │  Pattern: ["def", "_", "(", ")", ":"]  → boost=1.0                │  │
│  │  Paths matching this pattern get lower costs                       │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│              │                                                           │
│              ▼                                                           │
│  Output: Corrected token lattice                                        │
│          Best path: "def foo ( x ) :"                                   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
```

## Quick Start

```rust
use lling_llang::layers::{LayerPipeline, CorrectionLayer};
use lling_llang::layers::code_correction::{
    CodeCorrectionLayer, CodeCorrectionConfig,
    SyntaxRecoveryConfig, PatternAwareConfig,
};
use lling_llang::backend::HashMapBackend;
use lling_llang::semiring::TropicalWeight;

// Create a Python code correction layer
let config = CodeCorrectionConfig::new("python")
    .with_syntax_recovery(SyntaxRecoveryConfig::default())
    .with_pattern_aware(PatternAwareConfig::python_patterns())
    .with_max_corrections(5);

let layer: CodeCorrectionLayer<TropicalWeight, HashMapBackend> =
    CodeCorrectionLayer::new(config);

// Apply to a lattice
let corrected = layer.apply(&input_lattice)?;
```

## Supported Languages

The code correction layer supports multiple programming languages:

| Language | Keyword Support | Pattern Support | Bracket Balancing |
|----------|-----------------|-----------------|-------------------|
| Python | Yes | Yes | Yes (uses indentation) |
| Rust | Yes | Yes | Yes |
| JavaScript | Yes | Yes | Yes |
| TypeScript | Yes | Yes | Yes |
| Go | Yes | Yes | Yes |
| Java | Yes | Yes | Yes |
| C/C++ | Yes | Yes | Yes |
| **Rholang** | Yes | Yes | Yes |
| **MeTTa** | Yes | Yes | Yes (no braces) |
| Generic | Basic | No | Yes |

### Language-Specific Features

**Rholang** (F1R3FLY.io):
- Keywords: `new`, `contract`, `for`, `match`, `select`
- Patterns: Process composition, channel operations
- Syntax: `|` for parallel, `<-` for receive

**MeTTa** (F1R3FLY.io):
- Keywords: `!`, `=`, `:`, `match`, `let`, `type`
- Patterns: S-expressions, type annotations
- Syntax: Parentheses-based, no braces

## Core Components

### CodeCorrectionLayer

The main layer that combines all correction stages:

```rust
pub struct CodeCorrectionLayer<W: Semiring, B: LatticeBackend> {
    config: CodeCorrectionConfig,
    syntax_layer: SyntaxRecoveryLayer,
    pattern_layer: Option<PatternAwareLayer>,
}

impl<W, B> CodeCorrectionLayer<W, B>
where
    W: Semiring + From<TropicalWeight>,
    B: LatticeBackend + Clone,
{
    /// Create a new layer with the given configuration.
    pub fn new(config: CodeCorrectionConfig) -> Self;

    /// Create for a specific language with defaults.
    pub fn for_language(language: &str) -> Self;

    /// Check if pattern-aware correction is enabled.
    pub fn has_patterns(&self) -> bool;
}
```

### Layer Pipeline Integration

Code correction integrates with lling-llang's layer pipeline:

```rust
use lling_llang::layers::LayerPipeline;

let mut pipeline = LayerPipeline::new();

// Add token correction (handled elsewhere)
// Add code-specific layers
pipeline.add_layer(CodeCorrectionLayer::for_language("rust"));

// Apply to lattice
let result = pipeline.apply(&input_lattice)?;
```

## Use Cases

### 1. IDE Code Completion

```rust
// User types: "fn main( )"
// Expected: "fn main() {"

let layer = CodeCorrectionLayer::for_language("rust");
let corrected = layer.apply(&partial_code_lattice)?;
// Best path includes missing "{"
```

### 2. Code Repair in Editors

```rust
// Code with syntax error
let code = "def greet(name)
    print(f'Hello {name}')";  // Missing colon

let layer = CodeCorrectionLayer::for_language("python");
// Layer adds ":" after ")" in the correction lattice
```

### 3. Transpiler Error Recovery

```rust
// When parsing fails, generate repair suggestions
let config = CodeCorrectionConfig::new("rholang")
    .with_syntax_recovery(SyntaxRecoveryConfig::default())
    .with_pattern_aware(PatternAwareConfig::rholang_patterns());

let layer = CodeCorrectionLayer::new(config);
```

### 4. Language Learning Tools

```rust
// Help learners with common mistakes
let config = CodeCorrectionConfig::new("python")
    .with_max_corrections(10)  // Show more alternatives
    .with_keyword_boost(2.0);  // Strongly prefer keywords

let layer = CodeCorrectionLayer::new(config);
```

## Configuration Overview

### CodeCorrectionConfig

```rust
pub struct CodeCorrectionConfig {
    /// Target programming language
    pub language: CodeCorrectionLanguage,

    /// Maximum corrections per token
    pub max_corrections_per_token: usize,

    /// Maximum edit distance for corrections
    pub max_edit_distance: usize,

    /// Cost weights
    pub edit_cost: f64,
    pub insertion_cost: f64,
    pub deletion_cost: f64,

    /// Boost for keyword matches
    pub keyword_boost: f64,

    /// Syntax recovery settings (optional)
    pub syntax_config: Option<SyntaxRecoveryConfig>,

    /// Pattern-aware settings (optional)
    pub pattern_config: Option<PatternAwareConfig>,
}
```

### Default Configuration

```rust
// Default settings
CodeCorrectionConfig {
    language: lang,
    max_corrections_per_token: 5,
    max_edit_distance: 2,
    edit_cost: 1.0,
    insertion_cost: 2.0,
    deletion_cost: 1.5,
    keyword_boost: 0.5,
    syntax_config: Some(SyntaxRecoveryConfig::default()),
    pattern_config: None,
    keep_original: true,
    min_token_length: 2,
}
```

## Performance

| Metric | Typical Value |
|--------|---------------|
| Lattice with 100 edges | ~1-2ms |
| Lattice with 1000 edges | ~10-20ms |
| Pattern matching | O(n × p) |
| Memory overhead | ~10KB per layer |

Where n = number of tokens, p = number of patterns.

## See Also

- [Syntax Recovery]syntax-recovery.md - Error recovery layer details
- [Pattern-Aware Correction]pattern-aware.md - Idiom-based boosting
- [Language Configuration]configuration.md - Per-language settings