# Correctors Overview
The correctors module provides concrete implementations of the `CodeCorrector` trait, each specializing in different aspects of code correction.
## Architecture
The correction system uses a three-layer architecture:
```
┌─────────────────────────────────────┐
│ EnsembleCorrector │
│ (Aggregation & Ranking) │
└─────────────────────────────────────┘
│
┌────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ LexicalCorrector │ │ GrammarCorrector │ │ SemanticCorrector│
│ │ │ │ │ │
│ • Fuzzy match │ │ • PCFG rules │ │ • CPG analysis │
│ • Edit distance │ │ • Earley parse │ │ • GNN scoring │
│ • Dictionaries │ │ • Completions │ │ • Data flow │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
▼ ▼ ▼
Spelling Syntax errors Semantic issues
corrections insertions/ variable misuse
deletions type errors
```
## Corrector Types
| `LexicalCorrector` | Spelling | Levenshtein distance | `Lexical` |
| `GrammarCorrector` | Syntax | PCFG + Earley parsing | `Grammar` |
| `SemanticCorrector` | Semantics | CPG + GNN | `Neural`, `DataFlow` |
| `EnsembleCorrector` | Combined | Weighted aggregation | `Combined` |
## Layer Responsibilities
### Layer 1: Lexical
The lexical layer handles token-level spelling errors:
- **Input**: Individual tokens
- **Method**: Fuzzy matching against dictionaries
- **Output**: Alternative spellings within edit distance
```rust
// Example: "retrun" → "return" (edit distance 1)
let corrections = lexical_corrector.correct_token(&token, &context);
```
### Layer 2: Grammar
The grammar layer ensures syntactic validity:
- **Input**: Token sequences
- **Method**: PCFG validation and Earley parsing
- **Output**: Insertions, deletions, replacements
```rust
// Example: Missing ";" → insert ";"
let syntax_errors = grammar_corrector.find_syntax_errors(&tokens);
```
### Layer 3: Semantic
The semantic layer detects contextual issues:
- **Input**: Full AST/CPG
- **Method**: Data flow analysis, GNN scoring
- **Output**: Variable misuse, type errors, unused bindings
```rust
// Example: "count" used where "counter" was intended
let issues = semantic_corrector.analyze_cpg(&cpg);
```
## Correction Flow
The typical correction flow processes errors through each layer:
```
Source Code with Error
│
▼
┌─────────┐
│ Parse │ ──► Tree-sitter AST
└─────────┘
│
▼
┌──────────┐
│ Tokenize │ ──► Error tokens extracted
└──────────┘
│
┌────┴────┬────────────┐
▼ ▼ ▼
Lexical Grammar Semantic
│ │ │
└────┬────┴────────────┘
│
▼
┌──────────┐
│ Ensemble │ ──► Merge, dedupe, rank
└──────────┘
│
▼
Ranked Corrections
```
## Using Correctors
### Single Corrector
Use individual correctors for focused correction:
```rust
use libgrammstein::code::{LexicalCorrector, Python, CodeToken, TokenContext, TokenType};
use std::sync::Arc;
let python = Arc::new(Python::new());
let corrector = LexicalCorrector::with_defaults(python);
let token = CodeToken::new("pritn", 0, 1, 0, TokenType::Identifier, "identifier");
let context = TokenContext::new(TokenType::Identifier);
let corrections = corrector.correct_token(&token, &context);
for c in &corrections {
println!("{} → {} ({:.2})", c.original, c.replacement, c.confidence);
}
```
### Ensemble Corrector
Use the ensemble for comprehensive correction:
```rust
use libgrammstein::code::{EnsembleCorrector, Python};
use std::sync::Arc;
let python = Arc::new(Python::new());
let mut corrector = EnsembleCorrector::with_defaults(python, None);
// Add project-specific identifiers
corrector.add_identifiers(&["calculateTotal", "processData", "handleError"]);
// Register known variables for semantic analysis
corrector.register_variables(&[
("userCount".to_string(), Some("int".to_string())),
("userName".to_string(), Some("string".to_string())),
]);
let corrections = corrector.correct_token(&token, &context);
```
### Builder Pattern
Configure ensemble behavior precisely:
```rust
use libgrammstein::code::{EnsembleCorrectorBuilder, Python};
use std::sync::Arc;
let python = Arc::new(Python::new());
let corrector = EnsembleCorrectorBuilder::new(python)
.lexical_weight(0.5) // Prioritize spelling
.grammar_weight(0.3) // Balance syntax
.semantic_weight(0.2) // Lower semantic weight
.without_grammar() // Disable grammar (no PCFG)
.build();
```
## Configuration Options
Each corrector has specific configuration:
### LexicalCorrectorConfig
| `max_edit_distance` | 2 | Maximum Levenshtein distance |
| `min_token_length` | 2 | Skip tokens shorter than this |
| `max_candidates` | 5 | Maximum suggestions per token |
| `edit_penalty` | 0.15 | Confidence reduction per edit |
### GrammarCorrectorConfig
| `max_candidates` | 5 | Maximum suggestions per error |
| `min_rule_probability` | 0.01 | Minimum rule probability |
| `suggest_insertions` | true | Suggest missing tokens |
| `suggest_deletions` | true | Suggest removing extra tokens |
| `max_lookahead` | 3 | Lookahead for completions |
| `base_confidence` | 0.8 | Base confidence score |
### SemanticCorrectorConfig
| `min_confidence` | 0.5 | Threshold for reporting |
| `max_candidates` | 5 | Maximum suggestions per issue |
| `check_variable_misuse` | true | Detect wrong variables |
| `check_unused_bindings` | true | Detect unused variables |
| `check_type_errors` | true | Detect type mismatches |
### EnsembleCorrectorConfig
| `lexical_weight` | 0.4 | Weight for lexical corrections |
| `grammar_weight` | 0.35 | Weight for grammar corrections |
| `semantic_weight` | 0.25 | Weight for semantic corrections |
| `min_confidence` | 0.3 | Minimum confidence to include |
| `max_candidates` | 10 | Maximum total results |
| `deduplicate` | true | Merge identical suggestions |
| `agreement_boost` | true | Boost when sources agree |
| `agreement_boost_factor` | 1.3 | Boost multiplier |
## Correction Sources
Each correction is tagged with its source:
```rust
pub enum CorrectionSource {
Lexical, // From fuzzy matching
Grammar, // From PCFG/Earley
Neural, // From GNN/embeddings
TypeInference, // From type analysis
ControlFlow, // From CFG analysis
DataFlow, // From DFG analysis
Combined, // From ensemble agreement
Unknown, // Unspecified
}
```
Use the source to filter or debug corrections:
```rust
for correction in corrections {
match correction.source {
CorrectionSource::Lexical => println!("Spelling: {}", correction.replacement),
CorrectionSource::Grammar => println!("Syntax: {}", correction.replacement),
CorrectionSource::Neural => println!("Semantic: {}", correction.replacement),
CorrectionSource::Combined => println!("Multi-source: {}", correction.replacement),
_ => {}
}
}
```
## When to Use Each Corrector
| Typos in keywords | `LexicalCorrector` |
| Missing semicolons/brackets | `GrammarCorrector` |
| Wrong variable names | `SemanticCorrector` |
| General code correction | `EnsembleCorrector` |
| IDE integration | `EnsembleCorrector` |
| Batch processing | `EnsembleCorrector` |
| Performance-critical | `LexicalCorrector` only |
## Thread Safety
All correctors implement `Send + Sync` when their language type does:
```rust
use std::sync::Arc;
use rayon::prelude::*;
let corrector = Arc::new(EnsembleCorrector::with_defaults(python, None));
let results: Vec<_> = tokens.par_iter()
.map(|token| {
let corrector = Arc::clone(&corrector);
corrector.correct_token(token, &token.context)
})
.collect();
```
## Performance
| `LexicalCorrector` | O(d × n) | O(v) |
| `GrammarCorrector` | O(n³) worst case | O(n²) |
| `SemanticCorrector` | O(n + e) | O(n) |
| `EnsembleCorrector` | Sum of above | Sum of above |
Where:
- d = max edit distance
- n = number of tokens/nodes
- e = number of edges in CPG
- v = vocabulary size
## See Also
- [Lexical Corrector](lexical.md) - Fuzzy matching details
- [Grammar Corrector](grammar.md) - PCFG-based correction
- [Semantic Corrector](semantic.md) - GNN/CPG analysis
- [Ensemble Corrector](ensemble.md) - Multi-source aggregation
- [Correction Framework](../correction.md) - Base types
- [Pipeline](../pipeline.md) - End-to-end workflow