# Semantic Corrector
The semantic corrector analyzes code semantics using Code Property Graphs (CPG) and Graph Neural Networks (GNN) to detect contextual issues like variable misuse, unused bindings, and type errors.
## Overview
The `SemanticCorrector` provides:
- **Variable misuse detection**: Wrong variable names in context
- **Unused binding detection**: Variables defined but never used
- **Type error detection**: Type mismatches (when type info available)
- **GNN-based scoring**: Neural network analysis of code patterns
## Architecture
```
┌──────────────────────────────────────────────────────────────────┐
│ SemanticCorrector │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ GnnSemanticScorer │ │
│ │ │ │
│ │ • Feature extraction from CPG nodes │ │
│ │ • Message passing for pattern detection │ │
│ │ • Issue scoring and ranking │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ CodePropertyGraph (CPG) │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ AST │ │ CFG │ │ DFG │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ Parent │ │ Next │ │ Read │ │ │
│ │ │ Child │ │ Branch │ │ Write │ │ │
│ │ │ Sibling │ │ Back │ │ Flow │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴────────────────────────────────┐ │
│ │ Known Variables / Functions │ │
│ │ │ │
│ │ Variables: userCount(int), userName(string), ... │ │
│ │ Functions: calculateTotal(2), processData(1), ... │ │
│ └────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
```
## SemanticCorrectorConfig
Configuration options for the semantic corrector:
```rust
pub struct SemanticCorrectorConfig {
/// Minimum confidence threshold for reporting (default: 0.5)
pub min_confidence: f64,
/// Maximum candidates per issue (default: 5)
pub max_candidates: usize,
/// Whether to check for variable misuse (default: true)
pub check_variable_misuse: bool,
/// Whether to check for unused bindings (default: true)
pub check_unused_bindings: bool,
/// Whether to check for type errors (default: true)
pub check_type_errors: bool,
/// GNN configuration
pub gnn_config: GnnConfig,
}
```
### Configuration Parameters
| `min_confidence` | 0.5 | Threshold for reporting issues |
| `max_candidates` | 5 | Maximum suggestions per issue |
| `check_variable_misuse` | true | Enable variable misuse detection |
| `check_unused_bindings` | true | Enable unused variable detection |
| `check_type_errors` | true | Enable type mismatch detection |
## Creating a Semantic Corrector
### Basic Creation
```rust
use libgrammstein::code::{SemanticCorrector, SemanticCorrectorConfig, Python};
use std::sync::Arc;
let python = Arc::new(Python::new());
// With default configuration
let corrector = SemanticCorrector::with_defaults(python.clone());
// With custom configuration
let config = SemanticCorrectorConfig {
min_confidence: 0.6,
max_candidates: 10,
check_variable_misuse: true,
check_unused_bindings: true,
check_type_errors: false, // Disable type checking
gnn_config: GnnConfig::default(),
};
let corrector = SemanticCorrector::new(python, config);
```
### Registering Project Context
Provide the corrector with knowledge of your project's symbols:
```rust
let mut corrector = SemanticCorrector::with_defaults(python);
// Register known variables with optional type information
corrector.register_variable(
"userCount".to_string(),
Some("int".to_string()),
0, // scope level
);
corrector.register_variable(
"userName".to_string(),
Some("string".to_string()),
0,
);
// Register known functions with arity and return type
corrector.register_function(
"calculateTotal".to_string(),
2, // arity (number of parameters)
Some("float".to_string()), // return type
);
```
## VariableInfo and FunctionInfo
### VariableInfo
Information about a known variable:
```rust
pub struct VariableInfo {
/// Variable name
pub name: String,
/// Inferred or declared type (if known)
pub type_name: Option<String>,
/// Scope level where defined
pub scope_level: usize,
/// Number of times used
pub use_count: usize,
}
```
### FunctionInfo
Information about a known function:
```rust
pub struct FunctionInfo {
/// Function name
pub name: String,
/// Parameter types (if known)
pub param_types: Vec<Option<String>>,
/// Return type (if known)
pub return_type: Option<String>,
/// Number of parameters
pub arity: usize,
}
```
## Analyzing Code Property Graphs
### Basic CPG Analysis
```rust
use libgrammstein::code::{CodeParser, CodePropertyGraph, Python};
let python = Arc::new(Python::new());
let mut parser = CodeParser::new(python.clone()).unwrap();
let corrector = SemanticCorrector::with_defaults(python);
let source = r#"
def calculate(x, y):
result = x + y
return resutl # Typo: should be 'result'
"#;
let parsed = parser.parse(source).unwrap();
let cpg = CodePropertyGraph::from_parsed_code(&parsed);
// Analyze CPG for semantic issues
let issues = corrector.analyze_cpg(&cpg);
for issue in &issues {
println!("Issue at node {}: {:?} (confidence: {:.2})",
issue.node_idx, issue.issue_type, issue.confidence);
}
```
### Full Analysis with Corrections
```rust
// Get corrections from parsed code and CPG
let corrections = corrector.analyze_parsed(&parsed, &cpg);
for correction in &corrections {
println!("{} → {} ({:?}, confidence: {:.2})",
correction.original,
correction.replacement,
correction.kind,
correction.confidence
);
}
```
## Issue Types
The semantic corrector detects several issue types:
```rust
pub enum IssueType {
VariableMisuse, // Wrong variable in context
UnusedBinding, // Variable defined but never used
TypeError, // Type mismatch
MissingErrorHandling, // Unhandled error case
// ... other types
}
```
### Variable Misuse
Detects when a variable name is likely wrong:
```rust
// Source: "return resutl" (should be "result")
// Issue: VariableMisuse at "resutl" node
// Suggestion: "result" based on data flow and name similarity
```
### Unused Binding
Detects variables that are defined but never read:
```rust
// Source:
// def foo():
// unused = 42 # Never read
// return 0
// Issue: UnusedBinding at "unused" definition
// Suggestion: Remove or use the variable
```
### Type Error
Detects type mismatches when type information is available:
```rust
// Source (with type annotations):
// def add(x: int, y: int) -> int:
// return x + "hello" # Type error
// Issue: TypeError at string literal
// Suggestion: Expected int
```
## Finding Variable Misuse
Find candidates for variable replacement:
```rust
// Find alternatives for a potentially misused variable
let candidates = corrector.find_variable_misuse(&cpg, "resutl", node_idx);
for (name, score) in &candidates {
println!(" {} (score: {:.2})", name, score);
}
// Output:
// result (score: 0.85)
// results (score: 0.65)
```
## Name Similarity
The corrector uses Levenshtein distance for name similarity:
```rust
// Similarity calculation
fn name_similarity(a: &str, b: &str) -> f64 {
if a == b { return 1.0; }
let distance = levenshtein_distance(a, b);
let max_len = a.len().max(b.len());
1.0 - (distance as f64 / max_len as f64)
}
// Examples:
// "result" vs "resutl" → similarity ~0.83
// "count" vs "counter" → similarity ~0.71
// "foo" vs "bar" → similarity ~0.0
```
## Token-Level Correction
The semantic corrector can also correct individual tokens:
```rust
use libgrammstein::code::{CodeToken, TokenContext, TokenType, CodeCorrector};
let mut corrector = SemanticCorrector::with_defaults(python);
// Register known identifiers
corrector.register_variable("calculateTotal".to_string(), None, 0);
corrector.register_variable("calculateAverage".to_string(), None, 0);
// Correct an unknown identifier
let token = CodeToken::new(
"calulateTotal", // Misspelled
0, 1, 0,
TokenType::Identifier,
"identifier",
);
let context = TokenContext::new(TokenType::Identifier);
let corrections = corrector.correct_token(&token, &context);
// Suggests: "calculateTotal" (high similarity)
```
## Correction Sources
Semantic corrections are tagged with their analysis source:
```rust
pub enum CorrectionSource {
Neural, // From GNN analysis
TypeInference, // From type checking
ControlFlow, // From CFG analysis
DataFlow, // From DFG analysis
// ...
}
```
Usage:
```rust
for correction in corrections {
match correction.source {
CorrectionSource::Neural => {
println!("GNN detected: {}", correction.context.as_deref().unwrap_or(""));
}
CorrectionSource::DataFlow => {
println!("Data flow issue: {}", correction.context.as_deref().unwrap_or(""));
}
CorrectionSource::TypeInference => {
println!("Type error: {}", correction.context.as_deref().unwrap_or(""));
}
_ => {}
}
}
```
## Integration Example
Complete example using the semantic corrector:
```rust
use libgrammstein::code::{
CodeParser, CodeTokenizer, CodePropertyGraph,
SemanticCorrector, Python, CorrectionKind
};
use std::sync::Arc;
fn analyze_semantics(source: &str) -> Vec<String> {
let python = Arc::new(Python::new());
let mut parser = CodeParser::new(python.clone()).unwrap();
let mut corrector = SemanticCorrector::with_defaults(python.clone());
// Parse and build CPG
let parsed = parser.parse(source).unwrap();
let cpg = CodePropertyGraph::from_parsed_code(&parsed);
// Extract and register known variables from the code
let tokenizer = CodeTokenizer::new(python.as_ref());
let tokens = tokenizer.tokenize(&parsed.tree, source);
for token in &tokens {
if token.token_type == TokenType::Identifier {
// Check if this is a definition (simplified check)
if let Some(parent) = &token.context.parent_node_type {
if parent.contains("assignment") || parent.contains("parameter") {
corrector.register_variable(token.text.clone(), None, 0);
}
}
}
}
// Analyze for semantic issues
let corrections = corrector.analyze_parsed(&parsed, &cpg);
let mut messages = Vec::new();
for c in &corrections {
let msg = match c.kind {
CorrectionKind::VariableMisuse => {
format!("Variable '{}' might be '{}' (line {})",
c.original, c.replacement,
// Would need line mapping
0
)
}
CorrectionKind::Deletion => {
format!("'{}' appears to be unused", c.original)
}
CorrectionKind::TypeError => {
format!("Type error at '{}': {}", c.original,
c.context.as_deref().unwrap_or(""))
}
_ => format!("Issue at '{}': {:?}", c.original, c.kind),
};
messages.push(msg);
}
messages
}
let source = r#"
def process_data(items):
total = 0
for item in items:
totla += item.value # Typo: should be 'total'
return total
"#;
let issues = analyze_semantics(source);
for issue in issues {
println!(" {}", issue);
}
```
## GNN Integration
The semantic corrector uses `GnnSemanticScorer` for pattern detection:
```rust
// Access the GNN scorer
let scorer = corrector.gnn_scorer();
// The scorer extracts features from CPG nodes
// and uses message passing to detect semantic patterns
```
### GnnConfig
Configure the GNN behavior:
```rust
pub struct GnnConfig {
/// Number of message passing layers
pub num_layers: usize,
/// Hidden dimension size
pub hidden_dim: usize,
/// Dropout rate
pub dropout: f64,
// ...
}
```
## Performance
| CPG analysis | O(n + e) | n = nodes, e = edges |
| Variable misuse | O(v) | v = known variables |
| Name similarity | O(len²) | Levenshtein distance |
| GNN inference | O(L × n) | L = layers, n = nodes |
### Optimization Tips
1. **Register variables early**: Populate known variables at project load
2. **Use confidence threshold**: Set `min_confidence` appropriately
3. **Disable unused checks**: Turn off `check_unused_bindings` if not needed
4. **Cache CPG**: Build CPG once and reuse for multiple analyses
## Thread Safety
`SemanticCorrector` is `Send + Sync` when its language type is:
```rust
use std::sync::Arc;
let corrector = Arc::new(SemanticCorrector::with_defaults(python));
// Share across threads for read-only analysis
let results: Vec<_> = cpgs.par_iter()
.map(|cpg| corrector.analyze_cpg(cpg))
.collect();
```
Note: Modifying the corrector (registering variables) requires mutable access.
## Limitations
1. **Requires full AST**: Token-level correction is limited
2. **Type inference**: Depends on available type annotations
3. **Scope tracking**: Simplified scope model
4. **Cross-file analysis**: Limited to single-file context
## See Also
- [Correctors Overview](overview.md) - Architecture and comparison
- [Lexical Corrector](lexical.md) - Fuzzy matching
- [Grammar Corrector](grammar.md) - Syntax-based correction
- [Ensemble Corrector](ensemble.md) - Multi-source aggregation
- [CPG](../cpg.md) - Code Property Graphs
- [GNN](../gnn.md) - Graph Neural Networks